Interactive vs Batch Query Jobs in Google Cloud Platform
Understanding Interactive vs Batch Query Jobs for the GCP Professional Data Engineer Exam
Why Is This Important?
Understanding the distinction between interactive and batch query jobs is a fundamental concept for any data engineer working with Google BigQuery. This topic is critical for the GCP Professional Data Engineer certification exam because it directly impacts cost optimization, resource management, and workload scheduling — all core competencies that Google expects certified professionals to master. Choosing the wrong query priority can lead to unnecessary costs, exceeded quotas, or degraded performance for time-sensitive workloads. As a data engineer, you must be able to recommend the appropriate query mode based on business requirements, SLAs, and budget constraints.
What Are Interactive and Batch Query Jobs?
In Google BigQuery, every query you run is executed as a job. These jobs can be categorized into two priority modes:
1. Interactive (Default) Query Jobs
Interactive queries are executed as soon as possible. This is the default behavior when you run a query in BigQuery. Key characteristics include:
- Queries are executed immediately when resources are available.
- They count against your project's concurrent query quota (which is typically around 100 concurrent queries for on-demand pricing).
- Results are returned as quickly as the system can process them.
- Ideal for ad-hoc analysis, dashboards, and time-sensitive reporting where users expect near-real-time responses.
- If the concurrent rate limit is reached, additional interactive queries may fail with a quotaExceeded error.
2. Batch Query Jobs
Batch queries are queued and executed when idle resources become available. Key characteristics include:
- BigQuery queues the query and starts it when spare capacity is available, typically within minutes.
- If the query is not started within 24 hours, BigQuery automatically upgrades it to interactive priority.
- Batch queries do not count against the concurrent query quota, making them ideal for heavy workloads that might otherwise exceed rate limits.
- They cost the same as interactive queries under on-demand pricing (you pay per bytes scanned regardless of priority).
- Ideal for non-urgent ETL pipelines, large analytical jobs, and scheduled overnight processing.
How It Works
When you submit a query to BigQuery, you can specify the priority level through several interfaces:
Using the bq command-line tool:
You can set the priority flag:
bq query --priority=BATCH "SELECT * FROM my_dataset.my_table"
or
bq query --priority=INTERACTIVE "SELECT * FROM my_dataset.my_table"
Using the BigQuery REST API:
In the job configuration, you set the priority field within the query configuration object to either "INTERACTIVE" or "BATCH".
Using the BigQuery Console (UI):
In the BigQuery web UI, you can navigate to Query Settings (More → Query Settings) and select the query priority before running your query.
Using Client Libraries:
When using Python, Java, or other client libraries, you can set the QueryJobConfig priority property. For example, in Python:
job_config = bigquery.QueryJobConfig(priority=bigquery.QueryPriority.BATCH)
Key Technical Details:
- Quota Impact: Interactive queries count against the concurrent rate limit (default ~100 concurrent queries). Batch queries are queued separately and do not impact this limit.
- Cost: Under on-demand pricing, both interactive and batch queries cost the same — you are billed for the bytes processed. There is no discount for choosing batch. However, if you use flat-rate (slot-based) pricing, batch queries can be useful for managing slot utilization more effectively.
- 24-Hour Guarantee: If a batch query hasn't been picked up within 24 hours, it is automatically promoted to interactive priority to ensure it eventually runs.
- Scheduling: BigQuery scheduled queries can be configured to run as either interactive or batch. For regular ETL jobs that don't need immediate results, batch is often the preferred choice.
- Reservations and Slots: When using BigQuery Reservations (flat-rate pricing), you can assign batch queries to specific reservations. Batch jobs with BATCH priority in a reservation will use idle slots from other reservations, maximizing slot utilization.
When to Use Each Mode
Use Interactive When:
- Users need immediate results (dashboards, exploratory analysis).
- The application has strict latency requirements.
- You are within your concurrent query quota limits.
- Real-time or near-real-time data access is required.
Use Batch When:
- The workload is not time-sensitive (overnight ETL, periodic reporting).
- You are approaching or exceeding concurrent query quotas.
- You want to take advantage of idle compute resources.
- You are running large-scale data transformations as part of a pipeline.
- You want to avoid quota-exceeded errors during peak times.
Common Scenarios on the Exam
Scenario 1: A company is running hundreds of queries simultaneously during peak hours and receiving quotaExceeded errors. Non-urgent analytical queries should be converted to batch priority to reduce the concurrent query load.
Scenario 2: A nightly ETL pipeline runs at 2:00 AM and does not need results until 6:00 AM. Batch queries are the appropriate choice since latency is not a concern and they won't compete with daytime interactive workloads.
Scenario 3: A data analyst needs to run an ad-hoc query and get results immediately to present in a meeting. Interactive priority is the correct choice because the user needs immediate results.
Scenario 4: An organization uses flat-rate pricing with BigQuery Reservations. They want to maximize slot utilization. Batch queries can leverage idle slots from other reservations, making batch the optimal choice for non-urgent workloads in this setup.
Exam Tips: Answering Questions on Interactive vs Batch Query Jobs
1. Remember the Default: Interactive is the default priority. If a question does not mention priority settings, assume the query runs as interactive.
2. Cost is the Same for On-Demand: A common trap in exam questions is implying that batch queries are cheaper. Under on-demand pricing, both modes cost the same per byte scanned. Do not select answers that claim batch saves money on per-query costs under on-demand pricing.
3. Quota is the Key Differentiator: The primary operational advantage of batch queries is that they do not count against the concurrent query rate limit. If a question mentions quota issues or concurrent query limits, batch is likely part of the answer.
4. 24-Hour Auto-Promotion: Remember that batch queries are guaranteed to start within 24 hours. If a question asks about reliability or guarantees for batch jobs, this is the key fact.
5. Look for Time-Sensitivity Clues: If the scenario describes urgency, real-time needs, or user-facing dashboards, the answer involves interactive queries. If it mentions scheduled jobs, ETL, non-urgent, or overnight processing, think batch.
6. Flat-Rate Pricing and Reservations: In flat-rate scenarios, batch queries can utilize idle slots from other reservations. This is a more advanced concept but may appear on the exam. Understand that batch priority helps maximize slot utilization in reservation-based pricing models.
7. Don't Confuse with Streaming: Batch query jobs are different from batch data loading or streaming inserts. Make sure you distinguish between query execution priority and data ingestion methods. A question about query execution should be answered in terms of interactive vs batch priority, not in terms of streaming vs batch loading.
8. Scheduled Queries: Know that BigQuery scheduled queries support both priority modes. For scheduled ETL that can tolerate some delay, batch is the recommended priority to avoid impacting other workloads.
9. Elimination Strategy: When facing a question about optimizing BigQuery workloads, eliminate answers that suggest batch queries reduce cost (under on-demand pricing) and focus on answers related to quota management, resource utilization, and workload scheduling.
10. Read the Full Scenario: Exam questions often provide detailed scenarios. Pay attention to details like pricing model (on-demand vs flat-rate), time constraints, user expectations, and error messages (quotaExceeded). These clues will guide you to the correct answer regarding query priority selection.