Streams in Snowflake are powerful objects designed for Change Data Capture (CDC), enabling you to track and process data modifications in tables over time. A stream acts as a change tracking mechanism that records Data Manipulation Language (DML) changes including inserts, updates, and deletes made…Streams in Snowflake are powerful objects designed for Change Data Capture (CDC), enabling you to track and process data modifications in tables over time. A stream acts as a change tracking mechanism that records Data Manipulation Language (DML) changes including inserts, updates, and deletes made to a source table.
When you create a stream on a table, Snowflake establishes an offset pointer that marks a specific point in time. As transactions modify the underlying table, the stream captures these changes and makes them available for querying. The stream itself does not store actual data; instead, it maintains metadata about what has changed since the last consumption.
Streams contain special metadata columns: METADATA$ACTION indicates whether a row was inserted or deleted, METADATA$ISUPDATE shows if the change was part of an update operation, and METADATA$ROW_ID provides a unique identifier for each row.
There are three types of streams in Snowflake. Standard streams track all DML changes and are the most comprehensive option. Append-only streams capture only insert operations, making them ideal for scenarios where you only need to process new records. Insert-only streams work specifically with external tables.
A key concept is stream staleness. If the retention period for the source table's change tracking data expires before the stream is consumed, the stream becomes stale and unusable. You must ensure regular consumption of stream data to prevent this issue.
Streams are commonly used in ETL pipelines and data transformation workflows. When combined with tasks, they enable automated incremental data processing. You can query a stream like a regular table, and once changes are consumed within a DML transaction, the stream offset advances, showing only subsequent changes.
Streams provide an efficient, cost-effective method for implementing CDC patterns in Snowflake, eliminating the need for timestamp-based comparisons or full table scans to identify modified records.
Streams for Change Data Capture (CDC) in Snowflake
Why Streams for CDC are Important
Change Data Capture (CDC) is a critical capability for modern data architectures. Snowflake Streams enable organizations to track data changes incrementally, which is essential for:
• Real-time data pipelines - Processing only changed data rather than full table scans • Data synchronization - Keeping downstream systems updated with source changes • Audit and compliance - Tracking what data changed and when • Cost optimization - Reducing compute costs by processing only delta changes
What are Snowflake Streams?
A Stream is a Snowflake object that records data manipulation language (DML) changes made to a table, including inserts, updates, and deletes. Streams provide a change table that shows what has changed since a specific point in time.
Key characteristics: • Streams are schema-level objects • They track changes at the row level • They do not contain actual data, only pointers to changed records • They have zero additional storage cost (leveraging Time Travel)
How Streams Work
When you create a stream on a table, Snowflake tracks all DML changes using these metadata columns:
• METADATA$ACTION - Shows INSERT or DELETE • METADATA$ISUPDATE - TRUE if the action is part of an UPDATE (shown as DELETE + INSERT pair) • METADATA$ROW_ID - Unique identifier for the row
Stream Types:
1. Standard (Delta) Stream - Tracks all DML changes; default type 2. Append-only Stream - Tracks only INSERT operations; more efficient for append-only workloads 3. Insert-only Stream - Used for external tables; tracks inserts only
Creating a Stream:
CREATE STREAM my_stream ON TABLE my_table;
CREATE STREAM append_stream ON TABLE my_table APPEND_ONLY = TRUE;
Stream Offset and Consumption
Streams maintain an offset that advances when stream data is consumed in a DML transaction. Once consumed, the changes are no longer visible in the stream. This happens when: • The stream is queried within a successful DML statement • The transaction commits successfully
Staleness: A stream becomes stale when its offset falls outside the data retention period (Time Travel). Stale streams cannot be queried and must be recreated.
Using Streams with Tasks
Streams are commonly paired with Tasks for automated CDC pipelines:
CREATE TASK process_changes WAREHOUSE = my_wh SCHEDULE = '5 MINUTE' WHEN SYSTEM$STREAM_HAS_DATA('my_stream') AS INSERT INTO target_table SELECT * FROM my_stream;
The SYSTEM$STREAM_HAS_DATA() function checks if there are pending changes, preventing unnecessary task runs.
Exam Tips: Answering Questions on Streams for CDC
1. Remember the metadata columns - Know METADATA$ACTION, METADATA$ISUPDATE, and METADATA$ROW_ID and their purposes
2. Understand UPDATE representation - Updates appear as a DELETE/INSERT pair with METADATA$ISUPDATE = TRUE
3. Know stream types - Standard streams track all changes; append-only streams track only inserts and are more performant for those scenarios
4. Offset consumption - Stream offset advances only after a successful DML transaction that reads from the stream; SELECT alone does not consume the stream
5. Staleness concept - Streams become stale when their offset is older than the table's data retention period
6. Storage costs - Streams themselves have no additional storage costs; they leverage existing Time Travel data
7. Common exam scenarios: • Identifying when to use append-only vs standard streams • Understanding how SYSTEM$STREAM_HAS_DATA() works with tasks • Knowing that streams track DML changes only (not DDL) • Recognizing that multiple streams can exist on the same table
8. Key limitations to remember: • Streams cannot be created on views (except materialized views) • Streams do not track TRUNCATE operations • Changes from Time Travel queries are not reflected in streams