The Pilot Light Disaster Recovery (DR) pattern is a cost-effective strategy used in AWS to maintain business continuity during outages. This approach keeps a minimal version of your critical infrastructure running continuously in a secondary AWS region, similar to how a pilot light in a gas furnace…The Pilot Light Disaster Recovery (DR) pattern is a cost-effective strategy used in AWS to maintain business continuity during outages. This approach keeps a minimal version of your critical infrastructure running continuously in a secondary AWS region, similar to how a pilot light in a gas furnace stays lit to quickly ignite the main burner when needed.
In this pattern, you replicate your most essential core components to the DR region. Typically, this includes database servers with continuous data replication using services like Amazon RDS with cross-region read replicas or AWS Database Migration Service. The compute resources such as EC2 instances remain stopped or at minimal capacity until a disaster occurs.
Key components of the Pilot Light pattern include:
1. **Data Replication**: Critical databases and data stores are continuously synchronized to the DR region using asynchronous replication methods.
2. **AMI Management**: Amazon Machine Images are regularly updated and stored in the DR region, ready for rapid deployment.
3. **Infrastructure as Code**: AWS CloudFormation or Terraform templates are maintained to provision additional resources quickly during failover.
4. **DNS Configuration**: Route 53 health checks and failover routing policies enable automatic traffic redirection when the primary region becomes unavailable.
During a disaster, the recovery process involves scaling up the pre-configured resources in the DR region. This includes starting stopped EC2 instances, scaling Auto Scaling groups, and updating DNS records to redirect traffic.
The Recovery Time Objective (RTO) for Pilot Light typically ranges from minutes to hours, while the Recovery Point Objective (RPO) depends on replication frequency. This pattern offers a balance between cost efficiency and recovery speed, making it suitable for organizations that can tolerate brief downtime but require faster recovery than backup-and-restore approaches.
Pilot Light is ideal for production workloads where some downtime is acceptable but full warm standby costs are not justified.
Pilot Light DR Pattern - Complete Guide
What is the Pilot Light DR Pattern?
The Pilot Light disaster recovery pattern is a strategy where you maintain a minimal version of your environment running continuously in a secondary AWS region. Just like a pilot light in a gas furnace that stays lit and ready to ignite the full system when needed, this DR approach keeps core critical components active and ready to scale up rapidly during a disaster.
Key Components of Pilot Light: - Database replication: Your databases are continuously replicated to the DR region (e.g., RDS read replicas, Aurora Global Database) - AMIs and configurations: Pre-configured Amazon Machine Images stored and ready to launch - Minimal infrastructure: Only essential components like databases run continuously - Dormant compute resources: EC2 instances, Auto Scaling groups are configured but not running
Why is Pilot Light Important?
1. Cost Efficiency: You only pay for minimal running resources (mainly database replication) rather than a full duplicate environment
2. Faster Recovery Than Backup/Restore: Since databases are already synchronized, recovery time is significantly reduced compared to restoring from backups
3. Business Continuity: Provides a reliable way to resume operations after regional failures or major disasters
4. Balanced Approach: Offers a middle ground between expensive warm standby and slower backup/restore methods
How Pilot Light Works:
During Normal Operations: - Primary region handles all production traffic - Databases replicate continuously to the DR region - AMIs and launch configurations are kept updated - Route 53 health checks monitor primary region
During Failover: 1. Detect the disaster (automated or manual) 2. Start EC2 instances from pre-configured AMIs 3. Scale up Auto Scaling groups 4. Promote database replicas to primary 5. Update DNS records via Route 53 to point to DR region 6. Verify application functionality
Recovery Objectives: - RTO (Recovery Time Objective): Typically 10 minutes to several hours - RPO (Recovery Point Objective): Usually minutes, depending on replication lag
Comparison with Other DR Strategies:
Backup and Restore: Lower cost but higher RTO/RPO Pilot Light: Moderate cost with faster recovery than backup/restore Warm Standby: Higher cost with faster recovery than pilot light Multi-Site Active/Active: Highest cost with near-zero downtime
Exam Tips: Answering Questions on Pilot Light DR Pattern
1. Identify the Scenario: Look for keywords like 'cost-effective DR,' 'minimal running resources,' or 'faster than backup restore but cheaper than warm standby'
2. Understand Cost vs. Speed Trade-offs: Pilot light is the answer when the question asks for balance between cost savings and reasonable recovery time
3. Know What Runs Continuously: In pilot light, only databases and data replication run at all times. Compute resources are launched during failover
4. Distinguish from Warm Standby: Warm standby has scaled-down but running compute resources. Pilot light has compute resources configured but stopped
5. RTO Expectations: If the question requires RTO of minutes to hours and mentions cost consciousness, pilot light is likely correct
6. Database Focus: Questions mentioning database replication with minimal other infrastructure point to pilot light
7. Common Services Used: Look for mentions of RDS read replicas, Aurora Global Database, Route 53 failover routing, and pre-configured AMIs
8. Scaling Requirements: If the scenario mentions needing to 'scale up' or 'provision resources' during failover, this indicates pilot light rather than warm standby
Common Exam Traps to Avoid: - Do not confuse pilot light with warm standby - pilot light has stopped compute resources - Remember that pilot light requires manual or automated intervention to scale up resources - Pilot light is NOT suitable for near-zero RTO requirements - consider multi-site for those scenarios