Tasks in Snowflake are powerful objects designed to schedule and automate SQL statements or stored procedures. They enable users to execute data transformation workflows on a defined schedule, making them essential for ETL/ELT processes and maintaining data pipelines.
A Task can execute a single S…Tasks in Snowflake are powerful objects designed to schedule and automate SQL statements or stored procedures. They enable users to execute data transformation workflows on a defined schedule, making them essential for ETL/ELT processes and maintaining data pipelines.
A Task can execute a single SQL statement, call a stored procedure, or leverage Snowflake Scripting. Tasks operate using either a cron-based schedule (using standard cron syntax) or a fixed interval specified in minutes. For example, you can schedule a task to run every hour or at specific times like midnight daily.
Tasks support a tree-like dependency structure through parent-child relationships. A root task initiates the workflow, and child tasks execute after their predecessor completes successfully. This creates Directed Acyclic Graphs (DAGs) allowing complex multi-step transformations. Each DAG can have up to 1000 tasks, with a maximum of 100 levels of dependency.
Key properties of Tasks include:
- WAREHOUSE: Specifies the compute resource for execution (or use serverless tasks)
- SCHEDULE: Defines when the task runs
- ALLOW_OVERLAPPING_EXECUTION: Controls whether concurrent runs are permitted
- SUSPEND_TASK_AFTER_NUM_FAILURES: Automatically suspends after specified consecutive failures
Serverless Tasks represent a convenient option where Snowflake manages the compute resources automatically, eliminating the need to specify or manage a warehouse.
Tasks are created in a suspended state by default and must be explicitly resumed using ALTER TASK RESUME. The TASKADMIN role or appropriate privileges are required to manage tasks. You can monitor task execution through the TASK_HISTORY table function in the Information Schema or Account Usage views.
Best practices include setting appropriate error handling, monitoring task runs regularly, and organizing related tasks into logical DAGs. Tasks integrate seamlessly with Streams for change data capture, enabling efficient incremental data processing and transformation workflows in your Snowflake environment.
Tasks Scheduling in Snowflake - Complete Guide
What are Snowflake Tasks?
Snowflake Tasks are objects that allow you to schedule the execution of SQL statements, including calls to stored procedures. Tasks enable automation of recurring data transformation workloads, data loading operations, and maintenance activities within Snowflake.
Why Tasks are Important
Tasks are essential for: • Automating ETL/ELT pipelines - Schedule regular data transformations • Data freshness - Keep dashboards and reports updated • Operational efficiency - Reduce manual intervention in data workflows • Cost management - Schedule workloads during off-peak hours • Building data pipelines - Chain multiple tasks together using task trees
How Tasks Work
1. Task Creation: Tasks are created using CREATE TASK statement with a defined schedule or predecessor task
3. Task States: • Tasks are created in a suspended state by default • Must be explicitly resumed using ALTER TASK ... RESUME • Can be suspended using ALTER TASK ... SUSPEND
4. Task Trees (DAGs): • Root task - The top-level task with a schedule • Child tasks - Tasks that run after their predecessor completes • Only the root task has a schedule; child tasks use AFTER clause
5. Compute Resources: • Tasks can use a specified virtual warehouse • Alternatively, use serverless compute (managed by Snowflake)
Key Task Properties
• WAREHOUSE - Specifies which warehouse executes the task • SCHEDULE - Defines when the task runs (root tasks only) • AFTER - Specifies predecessor task(s) for child tasks • WHEN - Conditional expression to determine if task should run • USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE - Size for serverless tasks
Task Ownership and Privileges
• The EXECUTE TASK privilege is required to run tasks • ACCOUNTADMIN can grant EXECUTE TASK to roles • Task owner must have USAGE on the warehouse • OWNERSHIP privilege needed to modify tasks
Monitoring Tasks
• TASK_HISTORY table function shows execution history • INFORMATION_SCHEMA.TASK_HISTORY provides task run details • SHOW TASKS displays task metadata and state
Exam Tips: Answering Questions on Tasks Scheduling
Key Points to Remember:
1. Default State: Tasks are always created in a SUSPENDED state - you must run ALTER TASK ... RESUME to activate them
2. Root vs Child Tasks: Only root tasks have schedules; child tasks use the AFTER clause to define dependencies
3. Resuming Task Trees: When resuming a task tree, resume child tasks FIRST, then the root task. When suspending, suspend the root task FIRST
4. TASKADMIN Role: Best practice is to create a custom role (like TASKADMIN) with EXECUTE TASK privilege for task management
5. Serverless Tasks: When no warehouse is specified, tasks use Snowflake-managed serverless compute
6. WHEN Clause: The WHEN condition is evaluated before the task runs - commonly used with SYSTEM$STREAM_HAS_DATA() to check if streams have new data
7. Minimum Interval: The minimum scheduled interval for tasks is 1 minute
8. Task Failures: If a task fails, subsequent scheduled runs still occur unless the task is suspended
9. CRON Format: Understand the CRON expression format: minute, hour, day of month, month, day of week, timezone
10. Task Graphs: A single task can have multiple predecessor tasks (dependencies), enabling complex DAG structures
Common Exam Scenarios: • Questions about the order of resuming/suspending task trees • Understanding the difference between scheduled and predecessor-based execution • Identifying required privileges for task operations • Recognizing when to use SYSTEM$STREAM_HAS_DATA() with tasks