Stage File Operations in Snowflake
Why Stage File Operations Matter
Stage file operations are fundamental to data loading and unloading in Snowflake. Understanding how to manage files within stages is essential for data engineers and administrators who need to efficiently move data in and out of Snowflake. These operations form the backbone of ETL processes and data pipeline management.
What Are Stage File Operations?
Stage file operations refer to the commands and processes used to interact with files stored in Snowflake stages. Stages are storage locations that hold data files before loading into tables or after unloading from tables. There are three types of stages:
- Internal Stages: Storage managed by Snowflake (user stages, table stages, named stages)
- External Stages: Cloud storage locations (S3, Azure Blob, GCS) referenced by Snowflake
Key Stage File Operation Commands
PUT Command:
- Uploads files from a local machine to an internal stage
- Only works with internal stages, not external stages
- Can only be executed from SnowSQL, Snowflake connectors, or drivers (not the web UI)
- Syntax: PUT file://path/to/file @stage_name
GET Command:
- Downloads files from an internal stage to a local machine
- Only works with internal stages
- Can only be executed from SnowSQL, Snowflake connectors, or drivers
- Syntax: GET @stage_name file://local/path
LIST Command:
- Displays files in a stage
- Works with both internal and external stages
- Syntax: LIST @stage_name
- Can use pattern matching to filter results
REMOVE Command:
- Deletes files from a stage
- Works with both internal and external stages
- Syntax: REMOVE @stage_name/filename or REMOVE @stage_name PATTERN='pattern'
How Stage File Operations Work
1. File Upload Process: Files are compressed and encrypted during PUT operations. Auto-compression is enabled by default, and files are split into chunks for parallel processing.
2. File Staging: Once in a stage, files can be queried using the $1, $2 notation for column references before loading, allowing data validation.
3. File Metadata: Snowflake tracks metadata about staged files, including file paths, sizes, and modification timestamps accessible through LIST command output.
4. Purge Options: During COPY INTO operations, the PURGE option can automatically remove files after successful loading.
Exam Tips: Answering Questions on Stage File Operations
Tip 1: Remember that PUT and GET commands only work with internal stages. External stage files must be managed through the cloud provider's tools or REMOVE command in Snowflake.
Tip 2: Know that PUT and GET cannot be executed through the Snowflake web interface. They require SnowSQL, JDBC, ODBC, or other connectors.
Tip 3: Understand the difference between table stages (@%tablename), user stages (@~), and named stages (@stagename) for file operation context.
Tip 4: Be aware that the LIST command shows file metadata but does not show the actual content of files. To preview data, you need to query the stage using SELECT statements.
Tip 5: Remember that REMOVE command can use PATTERN parameter for bulk deletion of files matching a specific pattern.
Tip 6: Know that auto-compression during PUT uses gzip by default, and you can control this behavior with the AUTO_COMPRESS and SOURCE_COMPRESSION parameters.
Tip 7: For questions about file formats, remember that stages can have default file format options defined, which apply to all operations on that stage unless overridden.
Tip 8: When questions reference cleaning up after data loading, the PURGE=TRUE option with COPY INTO is the recommended approach rather than separate REMOVE commands.