Learn Data Loading and Unloading (COF-C02) with Interactive Flashcards
Master key concepts in Data Loading and Unloading through our interactive flashcard system. Click on each card to reveal detailed explanations and enhance your understanding.
COPY INTO command for bulk loading
The COPY INTO command is Snowflake's primary method for bulk loading data from external sources into tables. This powerful command enables efficient data ingestion from staged files located in internal stages, external stages (AWS S3, Azure Blob Storage, Google Cloud Storage), or named stages.
The basic syntax follows: COPY INTO <table_name> FROM <stage_location>. The command supports various file formats including CSV, JSON, Avro, Parquet, ORC, and XML. You specify the format using the FILE_FORMAT parameter, either referencing a named file format object or defining format options inline.
Key parameters include:
1. FILES - Specifies exact file names to load
2. PATTERN - Uses regex patterns to match specific files
3. ON_ERROR - Controls behavior when errors occur (CONTINUE, SKIP_FILE, ABORT_STATEMENT)
4. VALIDATION_MODE - Validates data before loading using RETURN_ERRORS or RETURN_n_ROWS
5. PURGE - Automatically removes files after successful loading when set to TRUE
6. FORCE - Reloads files even if previously loaded
Snowflake maintains load metadata for 64 days, preventing duplicate data loading from the same files. The COPY INTO command processes files in parallel, leveraging Snowflake's multi-cluster architecture for optimal performance.
Transformation capabilities allow column reordering, omitting columns, casting data types, and applying functions during the load process. You can select specific columns using SELECT statements within the COPY command.
Best practices include:
- Organizing data files between 100-250 MB compressed for optimal parallel processing
- Using appropriate file formats for your data structure
- Implementing error handling strategies based on data quality requirements
- Utilizing stages for secure and organized file management
The COPY INTO command tracks which files have been loaded through metadata, ensuring data integrity and preventing accidental duplicate loads. This makes it ideal for batch processing scenarios where reliability and performance are critical requirements.
Snowpipe for continuous loading
Snowpipe is Snowflake's continuous data ingestion service that enables loading data automatically as soon as files arrive in a staging location. Unlike traditional batch loading methods that require scheduled jobs, Snowpipe provides a serverless, event-driven approach to data ingestion.<br><br>Snowpipe operates using two primary methods for detecting new files. The first method uses cloud messaging services - Amazon SNS for AWS, Azure Event Grid for Azure, and GCP Pub/Sub for Google Cloud. When files land in cloud storage, these services trigger notifications that prompt Snowpipe to load the data. The second method involves REST API calls, where applications can programmatically notify Snowpipe about new files ready for ingestion.<br><br>The architecture of Snowpipe utilizes Snowflake-supplied compute resources rather than customer virtual warehouses. This means you only pay for the actual compute time used during file loading, making it cost-effective for continuous small-batch loads. The billing model is based on compute seconds consumed.<br><br>To implement Snowpipe, you first create a pipe object using the CREATE PIPE statement, which references a COPY INTO command defining the source stage, target table, and file format. The pipe definition includes the AUTO_INGEST parameter when using cloud event notifications.<br><br>Key benefits of Snowpipe include near real-time data availability, reduced latency compared to batch processing, automatic scaling based on file volume, and simplified pipeline management. Snowpipe tracks which files have been loaded, preventing duplicate data ingestion.<br><br>Best practices include keeping individual files between 100-250 MB compressed for optimal performance, organizing files in logical paths, and monitoring pipe status using the PIPE_STATUS function and COPY_HISTORY table function. Error handling can be configured to manage problematic files through validation options.<br><br>Snowpipe is ideal for streaming scenarios, IoT data collection, log processing, and any use case requiring continuous data freshness with minimal operational overhead.
Snowpipe Streaming
Snowpipe Streaming is a powerful feature in Snowflake designed for low-latency data ingestion, enabling real-time data loading into Snowflake tables. Unlike traditional Snowpipe which relies on micro-batching through staged files, Snowpipe Streaming allows applications to send data rows via the Snowflake Ingest SDK using a streaming API approach.
Key characteristics of Snowpipe Streaming include:
1. **Low Latency**: Data becomes available for querying within seconds of being sent, making it ideal for real-time analytics and time-sensitive applications.
2. **No File Staging Required**: Unlike standard Snowpipe, Snowpipe Streaming eliminates the need to stage files in cloud storage before loading. Data flows through the API and lands in tables efficiently.
3. **Cost Efficiency**: Since there is no file staging overhead, you save on storage costs and reduce the compute resources needed for file management operations.
4. **Ingest SDK**: Applications use the Snowflake Ingest SDK (available in Java) to establish channels and push rows of data programmatically. Each channel represents a connection to a specific table.
5. **Exactly-Once Semantics**: Snowpipe Streaming provides offset token management, allowing applications to track which records have been successfully ingested and ensuring data integrity.
6. **Use Cases**: Common scenarios include IoT sensor data, clickstream analytics, log ingestion, and any application requiring near real-time data availability.
7. **Billing Model**: Charges are based on compute resources consumed during ingestion, measured in credits per second of compute time used.
8. **Integration with Kafka**: Snowpipe Streaming works seamlessly with the Kafka connector for Snowflake, providing an alternative to the standard Snowpipe method for Kafka-based data pipelines.
For the SnowPro Core exam, understand that Snowpipe Streaming complements traditional batch loading and standard Snowpipe, offering the lowest latency option when real-time data availability is critical for your analytical workloads.
Data loading best practices
Data loading best practices in Snowflake are essential for optimizing performance and ensuring efficient data ingestion. Here are key recommendations for the SnowPro Core Certification:
**File Sizing and Preparation**
Aim for compressed file sizes between 100-250 MB. This allows Snowflake to parallelize the load effectively across multiple compute resources. Split larger files into smaller chunks before loading to maximize throughput.
**File Format Optimization**
Use compressed file formats like GZIP, BZIP2, or ZSTD to reduce storage costs and improve load times. Snowflake supports various formats including CSV, JSON, Parquet, Avro, and ORC. Choose the format that best matches your source system.
**Staging Data**
Utilize Snowflake stages (internal or external) to organize your data files. Internal stages provide convenience, while external stages (S3, Azure Blob, GCS) offer flexibility for existing cloud storage infrastructure.
**COPY Command Best Practices**
Use the COPY INTO command for bulk loading operations. Leverage the VALIDATION_MODE parameter to test loads before executing them. Enable ON_ERROR options appropriately - use CONTINUE for fault-tolerant loads or ABORT_STATEMENT for strict validation.
**Warehouse Sizing**
Select an appropriately sized virtual warehouse based on your data volume. Larger warehouses provide more compute resources for faster loading. Consider using dedicated warehouses for loading operations to avoid resource contention.
**Metadata and Load History**
Snowflake maintains 64 days of load metadata to prevent duplicate file loading. Use the FORCE option cautiously when reloading previously processed files.
**Snowpipe for Continuous Loading**
Implement Snowpipe for automated, continuous micro-batch loading from cloud storage. This serverless feature automatically ingests data as files arrive, ideal for streaming scenarios.
**Semi-Structured Data Handling**
For JSON, Avro, or Parquet data, load into VARIANT columns and use Snowflakes native semi-structured data functions for querying and transformation.
Following these practices ensures optimal performance, cost efficiency, and reliable data pipelines in your Snowflake environment.
Internal stages (user, table, named)
Internal stages in Snowflake are storage locations within your Snowflake account used for loading and unloading data. There are three types of internal stages: user stages, table stages, and named stages.
**User Stages:**
Every Snowflake user is automatically allocated a personal stage referenced using @~. This stage is private to the user and cannot be accessed by other users. User stages are convenient for files that only one user needs to access. You cannot alter or drop user stages, and they do not support setting file format options at the stage level.
**Table Stages:**
Each table in Snowflake automatically has an associated table stage, referenced using @%table_name. Table stages are useful when multiple users need to access files but those files only need to be loaded into a single table. Like user stages, table stages cannot be altered or dropped, and they do not support transformations during data loading. File format options must be specified in the COPY INTO command rather than at the stage level.
**Named Stages:**
Named stages are database objects created explicitly using the CREATE STAGE command. They offer the most flexibility among internal stage types. Named stages can be configured with specific file format options, making them reusable across multiple COPY operations. They support role-based access control through privileges, allowing fine-grained security management. Named stages can be altered, renamed, and dropped as needed. They are ideal for production environments where consistent data loading patterns are required.
**Key Considerations:**
All internal stages use Snowflake-managed cloud storage. Data files in internal stages are automatically encrypted. The PUT command uploads files to internal stages, while the GET command downloads files from them. The COPY INTO command loads data from staged files into tables. Internal stages are included in your Snowflake storage costs, so managing staged files after loading is important for cost optimization.
External stages (S3, Azure Blob, GCS)
External stages in Snowflake are named database objects that reference cloud storage locations outside of Snowflake where data files are stored. They serve as pointers to external cloud storage services, enabling seamless data loading and unloading operations. Snowflake supports three major cloud storage platforms: Amazon S3, Microsoft Azure Blob Storage, and Google Cloud Storage (GCS).
For Amazon S3, external stages reference S3 buckets using URLs in the format 's3://bucket-name/path/'. Authentication can be configured using AWS IAM credentials, storage integrations, or IAM roles. Storage integrations are the recommended approach as they provide secure, credential-free access through Snowflake's managed identity.
Azure Blob Storage stages use URLs formatted as 'azure://account.blob.core.windows.net/container/path/'. Authentication options include SAS tokens or storage integrations that leverage Azure service principals for secure access management.
Google Cloud Storage stages reference GCS buckets using 'gcs://bucket-name/path/' URLs. Similar to other platforms, storage integrations using GCP service accounts provide the most secure authentication method.
When creating external stages, you can specify file formats, encryption settings, and credentials. The CREATE STAGE command defines these configurations. For example: CREATE STAGE my_s3_stage URL='s3://mybucket/data/' STORAGE_INTEGRATION = my_integration;
External stages offer several advantages: they allow data to remain in existing cloud storage locations, support both bulk loading via COPY INTO and continuous loading through Snowpipe, and enable data unloading for export purposes. They also facilitate data sharing across organizations and support various file formats including CSV, JSON, Parquet, Avro, and ORC.
Best practices include using storage integrations over embedded credentials, organizing data in logical folder structures, and implementing appropriate access controls on both the Snowflake and cloud storage sides to ensure data security and governance compliance.
Stage file operations
Stage file operations in Snowflake are essential components for managing data movement between external sources and Snowflake tables. A stage is a location where data files are stored temporarily before loading into tables or after unloading from tables.
Snowflake supports three types of stages: Internal stages (managed by Snowflake), External stages (referencing cloud storage like AWS S3, Azure Blob, or GCP buckets), and Table stages (automatically created for each table).
Key stage file operations include:
**PUT Command**: Used to upload files from a local file system to an internal stage. This operation compresses files by default and supports parallel uploads for better performance. Example: PUT file:///data/myfile.csv @mystage;
**GET Command**: Downloads files from an internal stage to a local directory. This is useful for retrieving unloaded data or accessing staged files locally.
**LIST Command**: Displays all files currently stored in a stage. This helps verify successful uploads and check file metadata like size and last modified dates. Example: LIST @mystage;
**REMOVE Command**: Deletes files from a stage after successful data loading, helping manage storage costs and maintain clean staging areas.
**COPY INTO**: While not exclusively a stage operation, this command works with stages to load data into tables or unload data from tables to stages.
Best practices for stage operations include:
- Organizing files in logical directory structures within stages
- Using appropriate file formats (CSV, JSON, Parquet, Avro, ORC)
- Implementing proper access controls using stage grants
- Cleaning up staged files after successful loads to reduce storage costs
- Leveraging file format objects for consistent parsing configurations
Stage operations support pattern matching for selective file processing and can handle compressed files automatically. Understanding these operations is crucial for efficient ETL pipelines and data integration workflows in Snowflake environments.
Directory tables
Directory tables in Snowflake are a powerful feature designed to manage and catalog files stored in stages, making it easier to work with unstructured and semi-structured data during data loading and unloading operations.
A directory table is essentially a built-in, read-only table that automatically catalogs all files contained within a stage. When you enable a directory table on a stage, Snowflake creates metadata entries for each file, including important attributes such as the file path, size, last modified timestamp, and an MD5 checksum for data integrity verification.
To enable a directory table, you can set the DIRECTORY parameter to TRUE when creating or altering a stage. Once enabled, you can query the directory table using the DIRECTORY() table function, which returns comprehensive file metadata. This allows you to programmatically discover and filter files before loading them into Snowflake tables.
Directory tables are particularly useful for several scenarios. First, they help you identify new or modified files in a stage, enabling incremental data loading patterns. Second, they support data lake architectures where files are continuously added to cloud storage locations. Third, they facilitate the management of unstructured data like images, PDFs, and audio files that need to be processed or analyzed.
The metadata in directory tables must be refreshed to reflect changes in the stage. You can manually refresh using the ALTER STAGE command with REFRESH option, or configure automatic refresh for stages pointing to cloud storage locations that support event notifications.
Key columns available in directory tables include RELATIVE_PATH for the file location, SIZE for file dimensions in bytes, LAST_MODIFIED for timestamp information, MD5 for checksum values, and ETAG for cloud provider-specific identifiers.
Understanding directory tables is essential for the SnowPro Core exam as they represent a modern approach to file management and are integral to building robust data pipelines in Snowflake.
File format objects
File format objects in Snowflake are named database objects that define the format specifications for data files used during loading and unloading operations. These objects encapsulate all the configuration settings needed to properly parse incoming data or structure outgoing data, making data operations more efficient and reusable.
Snowflake supports several file format types including CSV, JSON, AVRO, ORC, PARQUET, and XML. Each format type has specific options that control how data is interpreted. For CSV files, you can specify field delimiters, record delimiters, skip header rows, null string representations, and escape characters. For semi-structured formats like JSON, you can configure strip outer array settings and date/time format specifications.
File format objects are created using the CREATE FILE FORMAT command and can be stored at the database or schema level. Once created, they can be referenced in COPY INTO statements, stage definitions, and external table configurations. This eliminates the need to repeatedly specify format options in each data operation.
Key benefits of using file format objects include consistency across multiple load operations, simplified SQL statements, easier maintenance when format specifications change, and the ability to share format definitions across teams. When you modify a file format object, all future operations referencing it will use the updated specifications.
You can associate file format objects with stages, which means any data operation using that stage will automatically apply the format settings. Alternatively, you can specify the file format at the time of the COPY command, which will override any stage-level format settings.
File format objects support various compression options including GZIP, BZ2, BROTLI, ZSTD, and DEFLATE. Snowflake can also auto-detect compression when loading data. Understanding file format objects is essential for the SnowPro Core exam as they form a fundamental component of Snowflakes data loading and unloading architecture.
CSV file handling
CSV (Comma-Separated Values) file handling is a fundamental aspect of data loading and unloading in Snowflake. Understanding how to work with CSV files is essential for the SnowPro Core Certification.
When loading CSV files into Snowflake, you use the COPY INTO command along with a file format specification. Snowflake supports various CSV options including field delimiters, record delimiters, escape characters, and handling of null values. You can define these settings using the CREATE FILE FORMAT command or specify them inline during the COPY operation.
Key CSV file format options include:
1. FIELD_DELIMITER: Specifies the character separating fields (default is comma)
2. RECORD_DELIMITER: Defines row separators (default is newline)
3. SKIP_HEADER: Indicates how many header rows to exclude during loading
4. FIELD_OPTIONALLY_ENCLOSED_BY: Specifies quote characters for text fields
5. NULL_IF: Defines strings that should be converted to NULL values
6. ERROR_ON_COLUMN_COUNT_MISMATCH: Controls behavior when column counts differ
7. ESCAPE: Specifies the escape character for special characters
8. TRIM_SPACE: Removes leading and trailing whitespace from fields
For unloading data to CSV format, you use COPY INTO with a stage location. Snowflake can compress output files automatically and partition large datasets into multiple files for efficient handling.
Best practices for CSV handling include:
- Creating reusable file format objects for consistency
- Using staging areas (internal or external) for file management
- Validating data with VALIDATION_MODE before actual loading
- Handling encoding properly (UTF-8 is recommended)
- Managing error records using ON_ERROR options like CONTINUE, SKIP_FILE, or ABORT_STATEMENT
Snowflake also supports loading CSV files from cloud storage platforms such as AWS S3, Azure Blob Storage, and Google Cloud Storage through external stages. This integration enables seamless data pipeline construction for enterprise data workflows.
JSON file handling
JSON file handling in Snowflake is a powerful feature that enables seamless data loading and unloading operations for semi-structured data. Snowflake natively supports JSON format, treating it as a VARIANT data type that can store up to 16 MB of data per row. When loading JSON files, you can use the COPY INTO command with the FILE_FORMAT option specifying TYPE = JSON. Key parameters include STRIP_OUTER_ARRAY, which removes the outer brackets from JSON arrays, and STRIP_NULL_VALUES, which eliminates null value pairs from the data. For loading, JSON files can be staged in internal stages (user, table, or named stages) or external stages (AWS S3, Azure Blob, or Google Cloud Storage). The PUT command uploads local JSON files to internal stages, while external stages reference cloud storage locations. During the load process, Snowflake can automatically detect and handle nested JSON structures. You can query JSON data using dot notation or bracket notation to access nested elements, and the FLATTEN function helps transform nested arrays into relational rows. For unloading JSON data, the COPY INTO location command exports table data to JSON format. You can customize the output using FILE_FORMAT options and specify compression types like GZIP or BZIP2 to reduce storage costs and transfer times. Best practices include validating JSON structure before loading using VALIDATION_MODE parameter, which checks data for errors before actual ingestion. The ON_ERROR option controls behavior when encountering malformed records - options include CONTINUE, SKIP_FILE, or ABORT_STATEMENT. Snowflake also supports querying staged JSON files before loading using the SELECT statement with the dollar sign notation, allowing you to preview and transform data during the loading process. This capability makes data exploration and schema inference straightforward for JSON datasets.
Parquet and ORC file handling
Parquet and ORC are columnar file formats widely supported in Snowflake for efficient data loading and unloading operations. Both formats offer significant advantages for analytical workloads due to their columnar storage architecture.
Parquet is an open-source columnar format developed by Apache. Snowflake provides native support for Parquet files, allowing users to query staged Parquet data using the INFER_SCHEMA function to automatically detect column definitions. When loading Parquet files, Snowflake can leverage the embedded schema metadata to simplify table creation. The COPY INTO command supports Parquet with various options including MATCH_BY_COLUMN_NAME, which maps source columns to target table columns based on names rather than position.
ORC (Optimized Row Columnar) is another Apache format originally designed for Hadoop ecosystems. Snowflake handles ORC files similarly to Parquet, supporting schema detection and efficient data extraction. Both formats benefit from built-in compression and encoding schemes that reduce storage requirements and improve query performance.
For loading these formats, users typically stage files in internal or external stages, then execute COPY INTO statements. The FILE_FORMAT parameter should specify TYPE = PARQUET or TYPE = ORC accordingly. Snowflake can also query these files in place using external tables, enabling schema-on-read patterns.
When unloading data, Snowflake supports writing to Parquet format using COPY INTO location commands with appropriate file format specifications. This enables seamless data exchange with other analytics platforms and data lakes.
Key considerations include understanding that Parquet and ORC store data in binary columnar format, which provides excellent compression ratios and allows Snowflake to read only required columns during queries. Semi-structured data handling in these formats integrates with Snowflakes VARIANT data type, enabling flexible schema evolution. Both formats support nested data structures, which Snowflake can flatten or preserve based on loading configurations.
Avro file handling
Avro file handling in Snowflake is a powerful feature for working with semi-structured data. Avro is a row-based binary format developed within Apache's Hadoop project that stores both schema and data together, making it self-describing and compact.
When loading Avro files into Snowflake, the data is automatically parsed and stored in a VARIANT column. Snowflake can infer the schema from Avro files, which simplifies the loading process. You can create a file format specifically for Avro using CREATE FILE FORMAT with TYPE = AVRO.
Key considerations for Avro file handling include:
1. **Schema Detection**: Snowflake reads the embedded schema from Avro files automatically. This eliminates the need for manual schema definition during the loading process.
2. **Compression**: Avro files support various compression codecs including deflate, snappy, and brotli. Snowflake handles these compression types natively when loading data.
3. **COPY INTO Command**: Use COPY INTO to load Avro data from stages. The MATCH_BY_COLUMN_NAME option allows mapping Avro fields to table columns based on matching names.
4. **Querying Staged Files**: You can query Avro files in stages using the $1 notation to access the VARIANT data before loading it into tables.
5. **Data Transformation**: During loading, you can transform Avro data using SELECT statements within COPY INTO, extracting specific fields or applying functions.
6. **Unloading**: When unloading data to Avro format, Snowflake generates files with proper Avro structure. Use COPY INTO location with FILE_FORMAT specifying Avro.
7. **NULL Handling**: Avro supports nullable types through union schemas, and Snowflake properly interprets these during data loading operations.
8. **Performance**: Avro's binary format and compression capabilities make it efficient for large-scale data transfers compared to text-based formats.
Understanding Avro handling is essential for the SnowPro Core exam, particularly when dealing with data pipelines and semi-structured data scenarios.
COPY INTO location for data unloading
The COPY INTO <location> command in Snowflake is used for unloading data from tables or query results into external storage locations. This powerful feature enables you to export data from Snowflake to various destinations for backup, sharing, or integration purposes.
**Supported Locations:**
Data can be unloaded to internal stages (user, table, or named stages) or external stages pointing to cloud storage services like Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage.
**Basic Syntax:**
COPY INTO @stage_name/path/ FROM table_name FILE_FORMAT = (TYPE = CSV);
**Key Parameters:**
- **FILE_FORMAT**: Specifies the output format including CSV, JSON, and PARQUET. You can define compression, field delimiters, and other formatting options.
- **PARTITION BY**: Allows organizing output files into separate paths based on column values, useful for creating partitioned data structures.
- **MAX_FILE_SIZE**: Controls the maximum size of each output file, helping manage file distribution.
- **SINGLE**: When set to TRUE, produces one consolidated output file rather than multiple files.
- **HEADER**: Includes column headers in the output for CSV files.
- **OVERWRITE**: Determines whether existing files in the location should be replaced.
**Compression Options:**
Supported compression methods include GZIP, BZ2, BROTLI, ZSTD, and others, which reduce storage costs and transfer times.
**Best Practices:**
1. Use appropriate file sizes for downstream processing requirements
2. Leverage partitioning for better data organization
3. Choose suitable compression based on the consuming application
4. Consider using named file formats for consistency across operations
**Query-Based Unloading:**
You can unload query results by replacing the table name with a SELECT statement enclosed in parentheses, enabling data transformation during export.
The command returns metadata about the unloading operation, including the number of rows unloaded and files created, making it easy to verify successful data export operations.
Unloading to external stages
Unloading data to external stages in Snowflake is a powerful feature that allows you to export data from Snowflake tables to cloud storage locations such as Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage. An external stage is a named database object that references a cloud storage location outside of Snowflake where data files can be stored.
To unload data to an external stage, you use the COPY INTO <location> command. This command exports query results or table data into one or more files in the specified external stage. The basic syntax involves specifying the source data (table or query), the target stage location, and optional file format specifications.
Before unloading, you must first create an external stage using the CREATE STAGE command, which requires appropriate cloud storage credentials and permissions. These credentials can be configured using storage integrations for enhanced security, eliminating the need to embed sensitive credentials in stage definitions.
Key considerations when unloading to external stages include file format options (CSV, JSON, Parquet, etc.), compression settings (GZIP, BZIP2, etc.), and partitioning strategies. You can control the output file size using MAX_FILE_SIZE parameter and organize files into logical partitions using the PARTITION BY clause for better data management.
The SINGLE parameter determines whether output goes to one file or multiple files. For large datasets, Snowflake automatically splits output across multiple files for parallel processing efficiency. You can also specify a file prefix and include headers in the output files.
Unloading supports query-based exports, allowing you to transform or filter data during the export process rather than exporting entire tables. This flexibility enables you to export only the required subset of data.
Common use cases include data archival, sharing data with external systems, creating backups, and integrating with data lakes or other analytics platforms that consume data from cloud storage.
Data export file formats and options
Snowflake provides robust data export capabilities through the COPY INTO <location> command, supporting multiple file formats and configuration options for unloading data efficiently.
**Supported File Formats:**
1. **CSV (Comma-Separated Values)**: The default format for data export. You can customize delimiters, enclosure characters, escape characters, and null string representations. Options include FIELD_DELIMITER, RECORD_DELIMITER, and FIELD_OPTIONALLY_ENCLOSED_BY.
2. **JSON (JavaScript Object Notation)**: Ideal for semi-structured data export. Snowflake can preserve variant data types and nested structures. The STRIP_OUTER_ARRAY option controls array formatting.
3. **Parquet**: A columnar storage format offering excellent compression and query performance. Snowflake supports exporting to Parquet with SNAPPY compression by default.
**Key Export Options:**
- **COMPRESSION**: Supports GZIP, BZ2, BROTLI, ZSTD, DEFLATE, RAW_DEFLATE, and NONE. This reduces file sizes and transfer costs.
- **SINGLE**: When set to TRUE, produces one output file rather than multiple parallel files. Useful for smaller datasets requiring a single file output.
- **MAX_FILE_SIZE**: Controls the maximum size of each output file (default 16MB for cloud storage). Larger values create fewer files but may impact parallel processing.
- **OVERWRITE**: When TRUE, replaces existing files in the target location with the same names.
- **HEADER**: For CSV exports, setting this to TRUE includes column headers in the output file.
- **FILE_EXTENSION**: Allows custom file extensions for exported files.
**Export Destinations:**
Data can be exported to internal stages (user, table, or named stages) or external stages connected to cloud storage services like Amazon S3, Azure Blob Storage, or Google Cloud Storage.
**Best Practices:**
Use appropriate compression for cost optimization, leverage parallel file generation for large datasets, and select file formats compatible with downstream systems. Partitioning output by date or other columns using PARTITION BY improves data organization in cloud storage.