Data Loading and Unloading
Load data into Snowflake and export data using various methods and file formats (12% of exam).
Data Loading and Unloading are essential operations in Snowflake for moving data between external sources and Snowflake tables. Understanding these concepts is crucial for the SnowPro Core Certification. **Data Loading** refers to importing data into Snowflake tables from various sources. Snowflak…
Concepts covered: COPY INTO command for bulk loading, Snowpipe for continuous loading, Snowpipe Streaming, Data loading best practices, Internal stages (user, table, named), External stages (S3, Azure Blob, GCS), Stage file operations, Directory tables, File format objects, CSV file handling, JSON file handling, Parquet and ORC file handling, Avro file handling, COPY INTO location for data unloading, Unloading to external stages, Data export file formats and options
COF-C02 - Data Loading and Unloading Example Questions
Test your knowledge of Data Loading and Unloading
Question 1
A data analytics team at an insurance company is implementing a new claims processing workflow in Snowflake. The workflow involves three distinct phases: (1) raw claims files are uploaded by an automated job running under a service account, (2) data quality analysts from multiple departments need to inspect the staged files before loading, and (3) the validated files are loaded into a CLAIMS_PROCESSING table. The team lead notices that when using '@%CLAIMS_PROCESSING' for staging, the data quality analysts cannot access the files even though they have SELECT privileges on the table. When switching to '@~', the files are only visible to the service account that uploaded them. The team needs a staging solution where the service account can upload files, multiple analysts can inspect them using LIST commands, and the file access can be controlled through standard Snowflake RBAC. Which implementation approach should the team adopt to satisfy all three requirements?
Question 2
A data engineering team at an e-commerce company has set up Snowpipe to ingest clickstream data from their Google Cloud Storage bucket. The pipeline processes approximately 500 files per hour during business hours. After three weeks of successful operation, the team receives alerts that data freshness has degraded significantly. Investigation reveals that SYSTEM$PIPE_STATUS shows 'executionState' as 'STALLED' and the last successful load was 6 hours ago. The GCS bucket continues to receive new files normally, and the Pub/Sub subscription shows messages are being delivered. The team confirms that no schema changes were made to the source files or target table. What should the team investigate first to resolve this Snowpipe stalled state?
Question 3
A manufacturing company stores quality control reports from 15 production facilities in an external stage on Google Cloud Storage. The data operations team has enabled a directory table on the stage and configured AUTO_REFRESH to TRUE. Their ETL pipeline needs to load files incrementally based on when they were added to the stage. During a performance review, the team notices that some files appear in directory table queries with LAST_MODIFIED timestamps that differ from the actual upload time by several hours. After investigation, they discover that files are being transferred from legacy on-premises systems using a backup tool that preserves original file timestamps. The business requires tracking both the original file timestamp and when files became available in the stage. The team needs to design a solution that captures the Snowflake detection time while retaining access to the preserved LAST_MODIFIED metadata. Which implementation approach allows the team to track when files were first detected by the directory table while maintaining the original modification timestamps?