Data Loading and Unloading

Load data into Snowflake and export data using various methods and file formats (12% of exam).

5 minutes 5 Questions

Data Loading and Unloading are essential operations in Snowflake for moving data between external sources and Snowflake tables. Understanding these concepts is crucial for the SnowPro Core Certification. **Data Loading** refers to importing data into Snowflake tables from various sources. Snowflak…

Concepts covered

COPY INTO command for bulk loading Snowpipe for continuous loading Snowpipe Streaming Data loading best practices Internal stages (user, table, named)External stages (S3, Azure Blob, GCS)Stage file operations Directory tables File format objects CSV file handling JSON file handling Parquet and ORC file handling Avro file handling COPY INTO location for data unloading Unloading to external stages Data export file formats and options

Test mode:

Exam (Timed)

Practice (With explanations)

Start practice test

COF-C02 - Data Loading and Unloading Example Questions

Test your knowledge of Data Loading and Unloading

Question 1

A data analytics team at an insurance company is implementing a new claims processing workflow in Snowflake. The workflow involves three distinct phases: (1) raw claims files are uploaded by an automated job running under a service account, (2) data quality analysts from multiple departments need to inspect the staged files before loading, and (3) the validated files are loaded into a CLAIMS_PROCESSING table. The team lead notices that when using '@%CLAIMS_PROCESSING' for staging, the data quality analysts cannot access the files even though they have SELECT privileges on the table. When switching to '@~', the files are only visible to the service account that uploaded them. The team needs a staging solution where the service account can upload files, multiple analysts can inspect them using LIST commands, and the file access can be controlled through standard Snowflake RBAC. Which implementation approach should the team adopt to satisfy all three requirements?

Create a named internal stage with CREATE STAGE and grant USAGE privilege to the analyst role, allowing centralized access control through RBAC Configure the user stage @~ with a shared storage location parameter that allows cross-user file visibility for designated analyst accounts Create a named internal stage and configure the SERVICE_ACCOUNT_SHARING property to enable file access inheritance from the uploading user to analysts Use the table stage @%CLAIMS_PROCESSING and grant READ privilege on the table stage object to enable analysts to list and access staged files

Correct Answer: Create a named internal stage with CREATE STAGE and grant USAGE privilege to the analyst role, allowing centralized access control through RBAC

The correct answer is to create a named internal stage with CREATE STAGE and grant USAGE privilege to the analyst role.

In Snowflake, there are three types of internal stages:

User Stage (@~): This is private to each user and cannot be shared. Files uploaded to a user stage are only accessible by that specific user. This explains why when using @~, only the service account that uploaded the files could see them - user stages do not support RBAC or cross-user access.
Table Stage (@%table_name): This stage is automatically created for each table and is tied to the table itself. However, table stages have limited privilege control - you cannot grant separate privileges on table stages. Having SELECT on the table does NOT grant access to files in the table stage. Table stages are primarily designed for simple data loading scenarios, not multi-user collaboration workflows.
Named Internal Stage: This is the solution that supports full RBAC. You can create a named stage using CREATE STAGE, and then grant USAGE privilege on that stage to specific roles. This allows:
The service account role to have WRITE access to upload files
Analyst roles to have USAGE privilege to LIST and GET files
Centralized access control through Snowflake's standard role-based access control model

The other options are incorrect because:

There is no 'READ' privilege that can be granted on table stages. Table stages do not support granular privilege grants - they inherit from table ownership but not in the way described.
User stages (@~) cannot be configured for cross-user visibility. They are fundamentally designed to be private to individual users and have no sharing parameters.
There is no 'SERVICE_ACCOUNT_SHARING' property in Snowflake. This is a fabricated configuration option that does not exist.

Question 2

A data engineering team at an e-commerce company has set up Snowpipe to ingest clickstream data from their Google Cloud Storage bucket. The pipeline processes approximately 500 files per hour during business hours. After three weeks of successful operation, the team receives alerts that data freshness has degraded significantly. Investigation reveals that SYSTEM$PIPE_STATUS shows 'executionState' as 'STALLED' and the last successful load was 6 hours ago. The GCS bucket continues to receive new files normally, and the Pub/Sub subscription shows messages are being delivered. The team confirms that no schema changes were made to the source files or target table. What should the team investigate first to resolve this Snowpipe stalled state?

Examine the storage integration permissions and refresh the GCS service account credentials that may have expired after the initial setup period Review the warehouse size allocation for Snowpipe operations and consider scaling up the compute resources to handle the accumulated file backlog effectively Check for files with parsing errors that may have caused the pipe to repeatedly fail and enter a stalled state, then review COPY_HISTORY for error details Verify that the Pub/Sub subscription has sufficient message retention configured and increase the acknowledgment deadline to allow more processing time

Correct Answer: Check for files with parsing errors that may have caused the pipe to repeatedly fail and enter a stalled state, then review COPY_HISTORY for error details

When a Snowpipe enters a 'STALLED' state, the most common cause is repeated failures during file processing, typically due to data parsing errors or file format issues. The correct approach is to check for files with parsing errors that caused the pipe to fail repeatedly.

Snowpipe will enter a stalled state when it encounters consistent failures processing files. Even though the scenario mentions no schema changes were made, there could still be malformed data, encoding issues, or subtle format problems in specific files that cause parsing failures. The COPY_HISTORY table function provides detailed error information about failed loads, including specific error messages and which files failed.

The other options are incorrect for the following reasons:

Adjusting Pub/Sub subscription settings (message retention or acknowledgment deadline) would not cause a STALLED state. The question already confirms that Pub/Sub messages are being delivered successfully, so the notification mechanism is working properly.
Storage integration permissions and GCS service account credentials expiring is unlikely to cause a STALLED state. If credentials had expired, the pipe would show authentication errors rather than entering a stalled state. Additionally, credential expiration typically happens on predictable schedules and would produce different error symptoms.
Warehouse size is not relevant to Snowpipe operations. Snowpipe uses Snowflake-managed compute resources (serverless), not user-provisioned warehouses. Users do not allocate or configure warehouse resources for Snowpipe - this is handled automatically by Snowflake's infrastructure.

Checking COPY_HISTORY for error details is the appropriate first step to identify the root cause of the stalled state and determine which files are causing problems.

Question 3

A manufacturing company stores quality control reports from 15 production facilities in an external stage on Google Cloud Storage. The data operations team has enabled a directory table on the stage and configured AUTO_REFRESH to TRUE. Their ETL pipeline needs to load files incrementally based on when they were added to the stage. During a performance review, the team notices that some files appear in directory table queries with LAST_MODIFIED timestamps that differ from the actual upload time by several hours. After investigation, they discover that files are being transferred from legacy on-premises systems using a backup tool that preserves original file timestamps. The business requires tracking both the original file timestamp and when files became available in the stage. The team needs to design a solution that captures the Snowflake detection time while retaining access to the preserved LAST_MODIFIED metadata. Which implementation approach allows the team to track when files were first detected by the directory table while maintaining the original modification timestamps?

Configure the directory table to use Snowflake's internal detection timestamp by setting PRESERVE_FILE_TIMESTAMPS to FALSE during stage creation Use the STAGE_FILE_TIMESTAMP metadata column in directory table queries to retrieve the Snowflake ingestion time alongside the LAST_MODIFIED value Enable the DETECTION_TIMESTAMP feature flag on the external stage which adds a secondary timestamp column to the directory table for cloud event tracking Create a tracking table that captures current timestamp when files first appear in directory table queries, and store LAST_MODIFIED separately as the original file timestamp

Correct Answer: Create a tracking table that captures current timestamp when files first appear in directory table queries, and store LAST_MODIFIED separately as the original file timestamp

The correct approach is to create a tracking table that captures the current timestamp when files first appear in directory table queries, while storing the LAST_MODIFIED value separately as the original file timestamp.

This is the correct solution because:

Directory tables in Snowflake expose the LAST_MODIFIED column which reflects the file's modification timestamp from the cloud storage provider. When files are transferred with preserved timestamps (as in this scenario with the backup tool), this value will show the original timestamp, not when the file arrived in the stage.
Snowflake directory tables do not automatically provide a separate 'detection time' or 'ingestion time' column. The only timestamp available is LAST_MODIFIED from the source files.
By implementing a tracking table pattern, the team can:
Query the directory table periodically or as part of their ETL pipeline
Record CURRENT_TIMESTAMP() when a new file is first detected (not previously seen)
Preserve the original LAST_MODIFIED value for business requirements
Have both timestamps available for incremental loading logic

The other options are incorrect:

The option mentioning PRESERVE_FILE_TIMESTAMPS parameter is fictional. This is not a valid Snowflake stage parameter.
The option referencing STAGE_FILE_TIMESTAMP metadata column is incorrect. While Snowflake has metadata columns like METADATA$FILE_LAST_MODIFIED for COPY INTO operations, directory tables use LAST_MODIFIED which reflects the cloud storage file timestamp, not a Snowflake-specific ingestion time.
The option about DETECTION_TIMESTAMP feature flag is fabricated. No such feature flag exists in Snowflake for external stages or directory tables.

Unlock Premium Access

SnowPro Core Certification

Access to ALL Certifications: Study for any certification on our platform with one subscription
2935 Superior-grade SnowPro Core Certification practice questions
Unlimited practice tests across all certifications
Detailed explanations for every question
COF-C02: 5 full exams plus all other certification exams
100% Satisfaction Guaranteed: Full refund if unsatisfied
Risk-Free: 7-day free trial with all premium features!

Start Your Free 7-Day Trial

More Data Loading and Unloading questions

468 questions (total)

Start 100 question test