Task dependencies and DAGs (Directed Acyclic Graphs) are fundamental concepts in Snowflake for orchestrating complex data transformation workflows.
A Task in Snowflake is a scheduled object that executes a single SQL statement or calls a stored procedure. Tasks can be linked together to create sop…Task dependencies and DAGs (Directed Acyclic Graphs) are fundamental concepts in Snowflake for orchestrating complex data transformation workflows.
A Task in Snowflake is a scheduled object that executes a single SQL statement or calls a stored procedure. Tasks can be linked together to create sophisticated data pipelines where the completion of one task triggers the execution of subsequent tasks.
Task Dependencies define the relationships between tasks, establishing which tasks must complete before others can begin. When you create a task with the AFTER clause, you specify its predecessor task. For example: CREATE TASK child_task AFTER parent_task AS SELECT... This creates a parent-child relationship where child_task runs only after parent_task completes successfully.
A DAG (Directed Acyclic Graph) represents the entire network of task dependencies. The term 'directed' means tasks flow in one direction from predecessors to successors. 'Acyclic' means there are no circular dependencies - a task cannot eventually depend on itself. The root task sits at the top of the DAG and has no predecessors, while leaf tasks have no successors.
Key characteristics of Snowflake DAGs include:
1. A single root task that initiates the entire workflow on a defined schedule
2. Up to 1000 tasks per DAG
3. Support for multiple predecessors (up to 100) allowing complex branching and merging patterns
4. All tasks in a DAG share the same owner
5. Only the root task has a schedule; dependent tasks execute based on predecessor completion
To activate a DAG, you must resume the root task using ALTER TASK root_task RESUME. Child tasks must also be resumed for execution. The SYSTEM$TASK_DEPENDENTS_ENABLE function can activate all tasks in a DAG simultaneously.
DAGs enable incremental data processing, error handling through task failure notifications, and efficient resource utilization by executing tasks only when predecessors complete successfully. This orchestration capability is essential for building reliable, maintainable data transformation pipelines in Snowflake.
Task Dependencies and DAGs in Snowflake
Why Task Dependencies and DAGs Are Important
Task dependencies and Directed Acyclic Graphs (DAGs) are fundamental concepts for orchestrating complex data pipelines in Snowflake. Understanding these concepts is crucial for the SnowPro Core exam because they enable you to automate multi-step data transformations, ensure tasks execute in the correct order, and build reliable ETL/ELT workflows.
What Are Task Dependencies and DAGs?
A Task in Snowflake is a scheduled object that executes a single SQL statement or calls a stored procedure. When you need multiple tasks to run in a specific sequence, you create task dependencies.
A Directed Acyclic Graph (DAG) is a collection of tasks organized in a tree-like structure where: - Directed: Tasks flow in one direction from parent to child - Acyclic: No circular dependencies exist (a task cannot eventually trigger itself) - Graph: Multiple tasks connected through defined relationships
How Task Dependencies Work
1. Root Task: The first task in a DAG that has a defined schedule. Only the root task has a SCHEDULE parameter.
2. Child Tasks: Tasks that depend on other tasks. They use the AFTER clause to specify their predecessor(s).
3. Execution Flow: When the root task runs, it triggers its child tasks upon successful completion. Each child can have multiple predecessors and successors.
Creating Task Dependencies - Example:
-- Create root task with schedule CREATE TASK root_task WAREHOUSE = my_warehouse SCHEDULE = 'USING CRON 0 9 * * * UTC' AS INSERT INTO staging_table SELECT * FROM raw_data;
-- Create child task CREATE TASK child_task WAREHOUSE = my_warehouse AFTER root_task AS INSERT INTO final_table SELECT * FROM staging_table;
Key Characteristics: - A DAG can have up to 1000 tasks including the root task - A child task can have up to 100 predecessor tasks - All tasks in a DAG must be in the same database and schema - Tasks are created in a suspended state by default - You must resume the root task to activate the entire DAG
Managing DAGs:
- Use ALTER TASK task_name RESUME to activate tasks - Use ALTER TASK task_name SUSPEND to pause tasks - The root task must be suspended before modifying any task in the DAG - Use SYSTEM$TASK_DEPENDENTS_ENABLE('root_task_name') to resume all tasks in a DAG
Exam Tips: Answering Questions on Task Dependencies and DAGs
1. Remember the Root Task Rule: Only the root task has a SCHEDULE. Child tasks use AFTER to define dependencies.
2. Know the Limits: Maximum 1000 tasks per DAG, maximum 100 predecessors per child task.
3. Understand Task States: Tasks are suspended by default. The root task controls the entire DAG's activation.
4. Modification Order: Always suspend the root task before making changes to any task in the DAG.
5. Same Schema Requirement: All tasks in a DAG must reside in the same database and schema.
6. No Cycles: Remember that DAGs cannot have circular dependencies - this is what makes them 'acyclic.'
7. Execution Triggers: Child tasks run only after their predecessor tasks complete successfully.
8. Common Question Patterns: Look for questions about scheduling (only root tasks), dependency syntax (AFTER clause), and DAG limitations.