Federated Queries and Materialized Views in Redshift
**Federated Queries in Amazon Redshift** allow you to query and analyze data across operational databases, data warehouses, and data lakes without the need to physically move or copy the data. With federated queries, Redshift can directly query live data in external sources such as Amazon RDS (Post… **Federated Queries in Amazon Redshift** allow you to query and analyze data across operational databases, data warehouses, and data lakes without the need to physically move or copy the data. With federated queries, Redshift can directly query live data in external sources such as Amazon RDS (PostgreSQL and MySQL) and Amazon Aurora. This is achieved by creating external schemas that reference these remote databases using JDBC connections. Federated queries are particularly useful for real-time analytics on transactional data, joining operational data with warehouse data, and performing ETL-like operations without building complex pipelines. They leverage Redshift's query optimizer to push down predicates to the source databases, minimizing data transfer and improving performance. To set up federated queries, you define an external schema with connection details (host, port, database, credentials stored in AWS Secrets Manager), then query remote tables as if they were local. **Materialized Views in Amazon Redshift** are precomputed result sets stored physically in the database, derived from one or more base tables. They significantly improve query performance for complex, repetitive queries involving aggregations, joins, and filters by caching the results rather than recomputing them each time. Redshift supports automatic and manual refresh of materialized views to keep them up to date as underlying data changes. The AUTO REFRESH option enables Redshift to incrementally refresh views in the background. Materialized views can also be used with federated queries and external tables (Spectrum), making them versatile for hybrid architectures. Redshift's query optimizer can automatically rewrite incoming queries to use materialized views even if the user doesn't explicitly reference them, further boosting performance transparently. Together, federated queries and materialized views form a powerful combination: federated queries eliminate data silos by querying remote sources in place, while materialized views cache frequently accessed results for faster retrieval. This reduces data movement, lowers latency, and simplifies architecture for data engineers building modern analytics solutions on AWS.
Federated Queries and Materialized Views in Amazon Redshift
Why This Topic Is Important
Federated Queries and Materialized Views are two powerful features in Amazon Redshift that frequently appear on the AWS Data Engineer Associate exam. They represent key capabilities for optimizing data access patterns, reducing data movement, and improving query performance — all of which are central concerns for data engineers working at scale. Understanding these features is critical because they enable you to query data across multiple data sources without physically moving it (federated queries) and to pre-compute and cache expensive query results for faster retrieval (materialized views). Together, they form a significant part of Redshift's data store management strategy.
What Are Federated Queries?
Federated queries in Amazon Redshift allow you to query and analyze data across operational databases, data warehouses, and data lakes — all without needing to perform ETL to move the data into Redshift first. Redshift supports federated queries to:
• Amazon RDS for PostgreSQL
• Amazon Aurora PostgreSQL
• Amazon RDS for MySQL
• Amazon Aurora MySQL
With federated queries, you create an external schema in Redshift that references a remote database through a connection defined using AWS Secrets Manager for credential management. Once the external schema is created, you can join data from the remote operational database with data already residing in your Redshift cluster or with data in Amazon S3 via Redshift Spectrum.
How Federated Queries Work
1. Create a Secret in AWS Secrets Manager: Store the credentials (username, password) for the remote RDS or Aurora database in Secrets Manager.
2. Create an External Schema: Use the CREATE EXTERNAL SCHEMA command in Redshift, specifying the connection details including the secret ARN, the database engine (postgres or mysql), the database name, the host URI, and the port.
3. Grant Permissions: Ensure the Redshift cluster's IAM role has permission to access the secret in Secrets Manager and to make network calls to the remote database. The remote database's security group must allow inbound traffic from the Redshift cluster.
4. Query the Data: Once the external schema is configured, you can write standard SQL that references tables in the external schema as if they were local Redshift tables. You can join federated tables with native Redshift tables, Spectrum external tables, or other federated tables.
5. Query Pushdown: Redshift optimizes federated queries by pushing down predicates, aggregations, and other operations to the remote database when possible, minimizing the amount of data transferred over the network.
Key Considerations for Federated Queries:
• Federated queries are best suited for small to moderate result sets from operational databases. They are not intended for large-scale data migration or heavy analytical workloads against the remote source.
• Network latency and the performance of the remote database directly impact federated query performance.
• Federated queries require VPC networking — the Redshift cluster must be able to reach the RDS/Aurora instance, typically through the same VPC or VPC peering.
• The Redshift cluster must have an enhanced VPC routing configuration if needed, and proper security group rules must be in place.
• You can use federated queries to perform live lookups, enrich warehouse data with operational data, or perform ad-hoc analysis without ETL.
What Are Materialized Views?
Materialized views in Amazon Redshift are pre-computed result sets that are stored physically in the database. Unlike regular views (which re-execute the underlying query every time they are accessed), materialized views store the query results and can be refreshed on demand or automatically. This dramatically speeds up queries that involve complex joins, aggregations, or transformations on large datasets.
How Materialized Views Work
1. Creation: You create a materialized view using the CREATE MATERIALIZED VIEW statement with a SQL query that defines the data to be pre-computed. For example:
CREATE MATERIALIZED VIEW mv_sales_summary AS SELECT region, SUM(amount) AS total_sales FROM sales GROUP BY region;
2. Storage: The results of the query are computed and stored physically in Redshift, just like a regular table. Redshift uses its columnar storage and compression to store the materialized view efficiently.
3. Querying: When users query the materialized view, Redshift returns results directly from the stored data rather than re-executing the underlying complex query. This provides significantly faster response times.
4. Automatic Query Rewriting: Redshift can automatically rewrite queries to use a materialized view even when the user queries the base tables directly. If the optimizer detects that a materialized view can satisfy the query, it will route the query to the materialized view instead. This is a key feature for transparent performance optimization.
5. Refreshing: Materialized views need to be refreshed to reflect changes in the underlying base tables. Redshift supports:
• Manual Refresh: Using REFRESH MATERIALIZED VIEW mv_name;
• Auto Refresh: You can enable auto-refresh when creating the materialized view (AUTO REFRESH YES). Redshift will automatically refresh the materialized view in the background when it detects changes to the base tables.
• Incremental Refresh: Redshift supports incremental refresh for materialized views, meaning it only processes the changes (deltas) in the base tables rather than recomputing the entire result set. This is significantly more efficient for large datasets.
Key Considerations for Materialized Views:
• Materialized views consume additional storage in your Redshift cluster because the pre-computed results are physically stored.
• They are most beneficial for frequently executed, expensive queries — especially those involving aggregations, joins across large tables, or complex transformations.
• Materialized views can be built on top of local Redshift tables, external tables (Redshift Spectrum/S3), and even federated query results (from RDS/Aurora). This is a powerful combination — you can materialize the results of a federated query to avoid repeatedly querying the remote database.
• Stale data: Between refreshes, the materialized view may contain stale data. The auto-refresh feature mitigates this but does not guarantee real-time freshness.
• Materialized views support cascading: You can create a materialized view on top of another materialized view.
• There are some SQL limitations on what can be used inside a materialized view definition (e.g., certain window functions or subqueries may have restrictions).
Federated Queries + Materialized Views Together
A powerful architectural pattern is to combine federated queries with materialized views. For example:
• You set up a federated query to access transactional data in Aurora PostgreSQL.
• You create a materialized view in Redshift that pre-computes an aggregation of that federated data.
• Dashboard users query the materialized view for fast, interactive analytics — without ever hitting the Aurora database directly after the initial materialization.
• The materialized view is refreshed periodically (manually or auto-refresh) to pick up new transactional data.
This pattern reduces load on operational databases, minimizes network traffic, and provides fast analytical query performance.
Exam Tips: Answering Questions on Federated Queries and Materialized Views in Redshift
1. Know when to use federated queries vs. ETL: If the exam describes a scenario where you need to occasionally join a small amount of operational data (from RDS/Aurora) with Redshift warehouse data, federated queries are the answer. If the scenario involves large-scale, regular data movement, traditional ETL (using Glue, Data Pipeline, or COPY commands) is more appropriate.
2. Remember the supported sources: Federated queries work with Amazon RDS PostgreSQL, Aurora PostgreSQL, Amazon RDS MySQL, and Aurora MySQL. They do not work with DynamoDB, Oracle, SQL Server, or other databases directly. If you see those databases in a question, federated queries are not the right answer.
3. Secrets Manager is key: Federated queries use AWS Secrets Manager to store and retrieve database credentials. If an exam question mentions storing credentials for a federated query, Secrets Manager is the correct answer — not IAM roles directly, not parameter store (though Parameter Store can store secrets, the federated query feature specifically integrates with Secrets Manager).
4. Understand automatic query rewriting: If a question asks how to improve dashboard performance without changing existing queries, materialized views with automatic query rewriting is the answer. Redshift can transparently redirect queries to use materialized views.
5. Auto refresh vs. manual refresh: If the question emphasizes near-real-time data freshness, remember that auto-refresh provides background refreshes but is not truly real-time. For the freshest data, a manual REFRESH MATERIALIZED VIEW can be triggered before critical queries or reports.
6. Incremental refresh for efficiency: If a question asks about efficiently updating materialized views on large datasets, incremental refresh is the key concept. Redshift processes only the changed rows rather than recomputing the entire view.
7. Materialized views on external tables: Remember that materialized views can be created on Redshift Spectrum external tables (data in S3). This is a common pattern to accelerate queries against data lake data without loading it into Redshift.
8. Storage trade-off: Materialized views consume storage. If a question presents a scenario with limited cluster storage, be mindful that materialized views add to storage requirements.
9. Combining features: Watch for scenarios that combine federated queries, Redshift Spectrum, and materialized views. A common exam pattern is to describe a data engineering solution that queries across S3, RDS, and local Redshift tables — and then asks you to identify the best approach to optimize performance. The answer often involves materializing federated or Spectrum query results.
10. VPC and networking: If a question involves connectivity issues with federated queries, think about VPC configuration, security groups, enhanced VPC routing, and whether the Redshift cluster can reach the RDS/Aurora endpoint.
11. Distinguish from Redshift Spectrum: Federated queries access operational relational databases (RDS/Aurora), while Redshift Spectrum queries data in S3. Both use external schemas, but they serve different purposes. The exam may try to confuse these two concepts.
12. Performance optimization hierarchy: For exam questions about Redshift query performance improvement, the typical hierarchy is: (a) distribution keys and sort keys, (b) materialized views for repeated complex queries, (c) Redshift Spectrum for offloading cold data to S3, and (d) federated queries for live access to operational data. Know where each optimization fits.
By mastering these concepts and their practical applications, you will be well-prepared to handle any exam question related to Federated Queries and Materialized Views in Amazon Redshift.
Unlock Premium Access
AWS Certified Data Engineer - Associate + ALL Certifications
- Access to ALL Certifications: Study for any certification on our platform with one subscription
- 2970 Superior-grade AWS Certified Data Engineer - Associate practice questions
- Unlimited practice tests across all certifications
- Detailed explanations for every question
- AWS DEA-C01: 5 full exams plus all other certification exams
- 100% Satisfaction Guaranteed: Full refund if unsatisfied
- Risk-Free: 7-day free trial with all premium features!