Back to Reliability and Business Continuity

Multi-site active-active DR

5 minutes 5 Questions

Multi-site active-active disaster recovery (DR) is the most comprehensive and robust DR strategy available in AWS, designed for mission-critical applications requiring near-zero downtime and minimal data loss. This approach involves running fully functional workloads simultaneously across two or mo…

Multi-Site Active-Active DR: Complete Guide for AWS SysOps Administrator Associate

What is Multi-Site Active-Active DR?

Multi-site active-active disaster recovery is the most comprehensive and robust DR strategy available in AWS. In this architecture, your application runs simultaneously in two or more AWS Regions (or a combination of on-premises and AWS), with both sites actively serving production traffic at all times. Unlike other DR strategies where a secondary site remains dormant, active-active means all sites are fully operational and handling real user requests.

Why is Multi-Site Active-Active DR Important?

This strategy is critical for organizations that require:
• Near-zero RTO (Recovery Time Objective) - Failover happens in seconds since all sites are already running
• Near-zero RPO (Recovery Point Objective) - Data is synchronized in real-time across all sites
• Maximum availability - Even during a complete regional failure, users experience minimal disruption
• Geographic load distribution - Users connect to the nearest region, reducing latency
• Regulatory compliance - Some industries mandate this level of resilience

How Multi-Site Active-Active DR Works

Architecture Components:

1. Global Traffic Management
Amazon Route 53 uses health checks and routing policies (latency-based, geolocation, or weighted) to distribute traffic across all active sites. If one region fails health checks, Route 53 automatically routes traffic to healthy regions.

2. Data Replication
• Amazon DynamoDB Global Tables - Provides multi-region, multi-master replication
• Amazon Aurora Global Database - Offers cross-region replication with read replicas that can be promoted
• Amazon S3 Cross-Region Replication - Keeps objects synchronized across regions

3. Compute Resources
• Auto Scaling groups in each region handle local traffic
• EC2 instances, containers (ECS/EKS), or Lambda functions run in all regions
• Each region maintains full processing capacity

4. Application Layer
• Stateless application design is essential
• Session data stored in distributed caches like ElastiCache Global Datastore
• Application code deployed identically across all regions

Traffic Flow During Normal Operations:
Users are routed to the closest or best-performing region based on Route 53 routing policies. All regions process requests and write data that is then replicated to other regions.

Traffic Flow During a Regional Failure:
Route 53 health checks detect the failure and stop routing traffic to the affected region. Users are seamlessly redirected to remaining healthy regions with no manual intervention required.

Cost Considerations

Multi-site active-active is the most expensive DR strategy because:
• Full infrastructure runs in multiple regions simultaneously
• Data transfer costs for cross-region replication
• Requires sophisticated monitoring and management

However, for mission-critical applications, the cost is justified by the business continuity it provides.

Key AWS Services for Multi-Site Active-Active

• Route 53 - Global DNS with health checks and routing policies
• Global Accelerator - Improves availability and performance using AWS global network
• DynamoDB Global Tables - Multi-region, multi-master database
• Aurora Global Database - Cross-region relational database replication
• S3 Cross-Region Replication - Object storage synchronization
• ElastiCache Global Datastore - Cross-region Redis replication

Exam Tips: Answering Questions on Multi-Site Active-Active DR

Recognize These Scenario Indicators:
• Questions mentioning near-zero RTO and RPO
• Requirements for no downtime during regional failures
• Scenarios where both sites must serve traffic simultaneously
• Mission-critical applications requiring maximum availability
• Global user bases needing low-latency access

Key Differentiators from Other DR Strategies:
• Pilot Light - Only core infrastructure runs in DR region; requires scaling during failover
• Warm Standby - Scaled-down version runs in DR region; requires scaling during failover
• Multi-Site Active-Active - Full capacity runs in all regions; no scaling needed during failover

Common Exam Traps to Avoid:
• Do not confuse active-active with active-passive (warm standby)
• Remember that active-active has the highest cost but lowest RTO/RPO
• Route 53 health checks are essential for automatic failover
• Data consistency across regions requires careful consideration of replication lag

When to Choose Multi-Site Active-Active:
• Cost is not the primary concern
• Business cannot tolerate any downtime
• Real-time data synchronization is required
• Users are distributed globally
• Regulatory requirements demand maximum resilience

Remember These Facts for the Exam:
• RTO: Near-zero (seconds to minutes)
• RPO: Near-zero (minimal data loss)
• Cost: Highest among all DR strategies
• Complexity: Most complex to implement and manage
• Route 53 is the primary mechanism for traffic distribution and failover

Test mode:

Exam (Timed)

Practice (With explanations)

Start practice test

Unlock Premium Access

AWS Certified SysOps Administrator - Associate

Access to ALL Certifications: Study for any certification on our platform with one subscription
4584 Superior-grade AWS Certified SysOps Administrator - Associate practice questions
Unlimited practice tests across all certifications
Detailed explanations for every question
SOA-C02: 5 full exams plus all other certification exams
100% Satisfaction Guaranteed: Full refund if unsatisfied
Risk-Free: 7-day free trial with all premium features!

More Multi-site active-active DR questions

19 questions (total)

Start 19 question test