Availability Policy for Compute Engine
Why Availability Policy is Important
Availability policies in Google Compute Engine determine how your virtual machine instances behave during maintenance events and unexpected failures. Understanding these policies is crucial for designing resilient applications that meet your uptime requirements and service level agreements (SLAs). Proper configuration ensures your workloads remain accessible and minimizes disruption to your users.
What is Availability Policy?
An availability policy defines two key behaviors for your Compute Engine instances:
1. On Host Maintenance: This setting determines what happens when Google needs to perform maintenance on the physical host running your VM. Options include:
- Migrate (default): Live migrates your instance to another host
- Terminate: Stops the instance during maintenance
2. Automatic Restart: This setting controls whether your instance automatically restarts if it crashes or is terminated due to a non-user-initiated event. Options include:
- On (default): Instance restarts automatically
- Off: Instance remains stopped
How Availability Policy Works
When you create a VM instance, Google Cloud assigns it to a physical host in your chosen zone. Periodically, Google must perform maintenance on these hosts for security patches, hardware upgrades, or repairs.
With live migration enabled, Google transparently moves your running instance to a different host with minimal interruption, typically just a brief network latency increase. Your applications continue running, and users experience little to no downtime.
If you choose terminate for maintenance behavior, your instance will be stopped when maintenance occurs. Combined with automatic restart enabled, the instance will boot back up once maintenance completes.
Preemptible and Spot VMs have different availability policies. These instances can be terminated at any time with 30 seconds notice and do not support live migration. They are ideal for fault-tolerant batch processing workloads.
Sole-tenant nodes provide dedicated physical servers where only your VMs run, offering additional control over maintenance timing through maintenance windows.
Exam Tips: Answering Questions on Availability Policy
1. Default Settings: Remember that the defaults are live migration ON and automatic restart ON. Questions often test whether you know the default behavior.
2. Preemptible and Spot VMs: These CANNOT use live migration and may be terminated at any time. They are significantly cheaper but less reliable.
3. GPU and Local SSD Instances: VMs with GPUs or local SSDs have limitations on live migration. Know that local SSD data is lost when an instance is terminated.
4. Scenario-Based Questions: When a question describes needing maximum uptime for a standard workload, live migration is the answer. When cost savings is prioritized over reliability, consider preemptible or spot VMs.
5. Managed Instance Groups: These provide additional availability through autohealing, which replaces unhealthy instances automatically. This complements VM-level availability policies.
6. Regional vs Zonal: For highest availability, use regional managed instance groups that distribute instances across multiple zones.
7. Key Terminology: Understand the difference between host maintenance events (planned by Google) and instance failures (unexpected crashes).
8. Configuration Location: Availability policies are set per instance at creation time or can be modified later through the console, gcloud, or API.