Performance and Cost Optimization Concepts

Optimize query performance and manage virtual warehouse costs effectively in Snowflake (16% of exam).

5 minutes 5 Questions

Performance and Cost Optimization in Snowflake focuses on maximizing efficiency while minimizing expenses. Here are the key concepts: **Virtual Warehouse Sizing**: Selecting appropriate warehouse sizes (XS to 6XL) based on workload requirements is crucial. Larger warehouses process queries faster …

Concepts covered

Virtual warehouse sizing and scaling Multi-cluster warehouses Warehouse auto-suspend and auto-resume Warehouse resource monitors Result cache Metadata cache Warehouse cache (local disk cache)Query result reuse Query profiling and optimization EXPLAIN plan analysis Query history and performance analysis Clustering keys Search optimization service Credit usage and billing Cost monitoring and optimization strategies Materialized views for performance

Test mode:

Exam (Timed)

Practice (With explanations)

Start practice test

COF-C02 - Performance and Cost Optimization Concepts Example Questions

Test your knowledge of Performance and Cost Optimization Concepts

Question 1

A pharmaceutical research company maintains a clinical_trials table with 1.8 billion rows containing detailed patient outcome data across multiple studies. Their regulatory reporting team generates weekly compliance reports that aggregate adverse_event_counts and efficacy_scores by study_id, treatment_arm, and site_location. These reports currently take 6-7 minutes to generate, causing delays in submission deadlines. The underlying clinical data is updated through daily ETL processes that run overnight. The team implements a materialized view with the required aggregations. Two months later, a junior analyst accidentally runs a DELETE statement on several thousand rows in the base clinical_trials table and then re-inserts corrected data. The senior data engineer is concerned about the materialized view's state after these DML operations. What is the expected behavior of the materialized view following these base table modifications?

The materialized view will enter a stale state and continue serving outdated results until the data engineer explicitly rebuilds the view using ALTER MATERIALIZED VIEW REBUILD command The materialized view will require a manual REFRESH command because DELETE operations are tracked differently than INSERT operations in the maintenance queue Snowflake will suspend the materialized view after detecting DELETE operations and require explicit revalidation before resuming automatic maintenance processes Snowflake will automatically track the changes and refresh the materialized view to reflect the updated base table data during subsequent queries or background maintenance

Correct Answer: Snowflake will automatically track the changes and refresh the materialized view to reflect the updated base table data during subsequent queries or background maintenance

Snowflake materialized views are designed to be automatically maintained by Snowflake's background services. When DML operations (INSERT, UPDATE, DELETE) are performed on the base table, Snowflake automatically tracks these changes and refreshes the materialized view to keep it synchronized with the underlying data.

This automatic maintenance is one of the key benefits of materialized views in Snowflake - users don't need to manually refresh them. The maintenance service runs in the background and ensures the materialized view reflects the current state of the base table data. This happens transparently, either during query execution or through background maintenance processes.

The other options are incorrect for the following reasons:

Materialized views do NOT require manual REFRESH commands after DELETE operations. Snowflake handles all DML operations (INSERT, UPDATE, DELETE) uniformly through its automatic maintenance process. There is no distinction in how different DML operation types are tracked.
Snowflake does NOT suspend materialized views after detecting DELETE operations. The system is designed to handle all types of data modifications automatically without requiring manual intervention or revalidation.
Materialized views do NOT enter a permanent stale state requiring an explicit REBUILD command. While a materialized view may temporarily be slightly behind the base table during high-change periods, Snowflake's automatic maintenance will catch up and synchronize the data. There is no ALTER MATERIALIZED VIEW REBUILD command needed for normal DML operations on base tables.

The automatic maintenance feature is what makes materialized views in Snowflake particularly useful for scenarios like the one described, where base table data changes regularly through ETL processes or corrections.

Question 2

A data warehouse administrator is analyzing a query that joins a customer_profiles table with a purchases table. The EXPLAIN output reveals the following for the purchases table: partitionsTotal=600, partitionsAssigned=580. The join operation shows 'joinType=BROADCAST' with the customer_profiles table being broadcasted (50,000 rows). The administrator notices that the WHERE clause filters purchases by customer_tier='PLATINUM', but this column is not part of the clustering key. The clustering key is defined as (purchase_date, store_id). When presenting findings to the optimization team, what conclusion should the administrator draw from this EXPLAIN analysis?

The query is scanning 97% of partitions because the filter on customer_tier cannot leverage the existing clustering key, indicating a potential need for clustering key modification or addition of customer_tier The high partition assignment ratio suggests the clustering on purchase_date and store_id is working effectively, and adding customer_tier to the clustering key would cause unnecessary re-clustering overhead The EXPLAIN output indicates that changing the join type from BROADCAST to HASH would reduce the partition scanning ratio and improve overall query performance significantly The query performance is optimal because the broadcast join strategy was selected for the smaller customer_profiles table, and partition scanning of 580 out of 600 partitions indicates efficient data distribution

Correct Answer: The query is scanning 97% of partitions because the filter on customer_tier cannot leverage the existing clustering key, indicating a potential need for clustering key modification or addition of customer_tier

The correct answer identifies the core issue revealed by the EXPLAIN output: scanning 580 out of 600 partitions (approximately 97%) indicates very poor partition pruning efficiency.

In Snowflake, when a clustering key is defined on specific columns (in this case, purchase_date and store_id), the micro-partitions are organized to optimize queries that filter on those columns. However, when filtering on a different column like customer_tier='PLATINUM', Snowflake cannot effectively prune partitions because the data is not physically organized by that column.

The partitionsTotal=600 and partitionsAssigned=580 metrics clearly show that the WHERE clause filter on customer_tier is forcing the query to scan nearly all partitions. This is a classic indicator that either:
1. The clustering key should be modified to include customer_tier if this is a common query pattern
2. A separate clustered table or materialized view could be created for this access pattern

The second answer is incorrect because scanning 97% of partitions is NOT efficient - it indicates poor partition pruning. While the broadcast join choice for the smaller table is reasonable, this doesn't address the partition scanning inefficiency.

The third answer is incorrect because the high partition assignment ratio (97%) actually demonstrates that the current clustering is NOT effective for this particular query pattern. It's measuring the wrong thing - high partition scanning means the clustering isn't helping this query.

The fourth answer is incorrect because the join type (BROADCAST vs HASH) is a separate consideration from partition pruning. The partition scanning issue is caused by the WHERE clause filter, not the join strategy. Changing join types would not address the fundamental clustering mismatch.

Question 3

A telecommunications company has a call_detail_records table with 4.8 billion rows tracking customer call data. Their analytics team creates a materialized view that aggregates call duration and count by customer_segment and region_code to support billing reconciliation dashboards. The materialized view is defined as: CREATE MATERIALIZED VIEW mv_billing_summary AS SELECT customer_segment, region_code, SUM(call_duration_seconds) as total_duration, COUNT(*) as call_count FROM call_detail_records WHERE call_status = 'COMPLETED' GROUP BY customer_segment, region_code. After deployment, the team notices that queries filtering on customer_segment and region_code with additional predicates like call_date = '2024-01-15' are not being automatically rewritten to use the materialized view. What is the most accurate explanation for this behavior?

The materialized view includes a WHERE clause filter on call_status, which prevents automatic query rewriting when additional date-based predicates are present in user queries The call_date column must be included in the materialized view's SELECT list as a non-aggregated column for the query optimizer to consider it during rewrite evaluation Snowflake requires explicit hints in the query to enable automatic rewriting when the materialized view contains aggregation functions like SUM and COUNT together The materialized view aggregation does not include call_date in its GROUP BY clause, so queries with that additional filter cannot be satisfied by the precomputed results

Correct Answer: The materialized view aggregation does not include call_date in its GROUP BY clause, so queries with that additional filter cannot be satisfied by the precomputed results

The correct answer explains the fundamental principle of materialized view query rewriting in Snowflake. When a materialized view is created with specific GROUP BY columns, the precomputed aggregated results can only satisfy queries that align with those grouping dimensions.

In this scenario, the materialized view aggregates data by customer_segment and region_code only. The precomputed results contain total_duration and call_count at the customer_segment + region_code granularity level. When a query adds a filter like call_date = '2024-01-15', the query is asking for data at a finer granularity than what the materialized view stores.

The materialized view has already rolled up all the call_date information into the aggregate - it cannot 'unroll' or filter the aggregated results by a date that wasn't included in the GROUP BY clause. The SUM of call durations for a customer segment includes ALL dates, so there's no way to extract just the January 15th portion from that precomputed sum.

The second answer is incorrect because having a WHERE clause filter on call_status doesn't inherently prevent query rewriting - the issue is about dimensionality of the aggregation, not the filtering condition in the view definition.

The third answer is incorrect because Snowflake does not require explicit hints for materialized view query rewriting. The optimizer automatically considers materialized views when appropriate - no manual hints are needed.

The fourth answer is partially related but misses the key point. While including call_date in the SELECT would help, it specifically needs to be in the GROUP BY clause to enable filtering at that granularity. Simply selecting it without grouping would not solve the problem.

Unlock Premium Access

SnowPro Core Certification

Access to ALL Certifications: Study for any certification on our platform with one subscription
2935 Superior-grade SnowPro Core Certification practice questions
Unlimited practice tests across all certifications
Detailed explanations for every question
COF-C02: 5 full exams plus all other certification exams
100% Satisfaction Guaranteed: Full refund if unsatisfied
Risk-Free: 7-day free trial with all premium features!

Start Your Free 7-Day Trial

More Performance and Cost Optimization Concepts questions

463 questions (total)

Start 100 question test