To effectively troubleshoot Azure Load Balancer issues within the context of the Azure Administrator Associate certification, follow a structured approach focusing on connectivity, configuration, and health monitoring.
First, verify the **Health Probes**. The load balancer relies on these probes t…To effectively troubleshoot Azure Load Balancer issues within the context of the Azure Administrator Associate certification, follow a structured approach focusing on connectivity, configuration, and health monitoring.
First, verify the **Health Probes**. The load balancer relies on these probes to determine which backend instances are healthy. If a probe fails, the load balancer stops sending new flows to that specific instance. Check that the configured protocol (TCP/HTTP/HTTPS) and port match the application listening port on the VM. For HTTP/HTTPS probes, ensure the specific URL path is valid and returns a 200 OK status.
Second, examine **Data Path Connectivity**. Use Azure Monitor metrics to analyze 'Data Path Availability' and 'Health Probe Status'. If the availability is zero, validate the backend pool configuration. Ensure the VMs or Virtual Machine Scale Sets are in a 'Running' state and their network interfaces are correctly associated.
Third, investigate **Network Security Groups (NSGs)** and **Firewalls**. Traffic must be allowed on the backend subnet NSG for both the service port and the probe port. A critical step is identifying the default source tag `AzureLoadBalancer` in the NSG inbound rules; if this is blocked, probes will fail regardless of application health. Additionally, check the VM's local OS firewall (Windows Firewall or iptables) to ensure it allows ingress traffic on the probe port.
Fourth, analyze **SKU and VNet Configuration**. Mismatches between Basic and Standard SKUs are common pitfalls; usually, the Load Balancer, Public IP, and backend resources must share the same SKU type. Ensure all backend instances reside in the same Virtual Network.
Finally, utilize **Diagnostics and Logging**. Enable diagnostic settings to send logs to a Log Analytics workspace. Query `LoadBalancerProbeHealthStatus` alerts to pinpoint identifying failures. Use Network Watcher tools like **IP Flow Verify** to confirm rule precedence is not inadvertently dropping traffic.
Troubleshooting Azure Load Balancing (AZ-104 Study Guide)
Why it is important Load Balancers are the gatekeepers of high availability in Azure. When a Load Balancer fails or works incorrectly, applications become legally unavailable to users, even if the backend Virtual Machines (VMs) are perfectly healthy. For an Azure Administrator, the ability to quickly distinguish between a network configuration error, a backend OS issue, or a health probe failure is critical to maintaining service SLAs and ensuring business continuity.
What it is Troubleshooting Load Balancing involves diagnosing issues that prevent traffic from being correctly distributed to the Backend Pool. In the context of Azure, this rarely means the Load Balancer service itself is broken (as it is a managed service); rather, it usually implies a misconfiguration in the network path, security rules, or health monitoring logic. It requires analyzing the flow of traffic from the Frontend IP, through the Load Balancing Rules, to the Backend Instances.
How it works To effectively troubleshoot, you must follow the data path and isolate the failure point. The process generally functions as follows:
1. Diagnostic Metrics & Health Probes: The first step is checking Azure Monitor metrics. The most critical metric is Health Probe Status. The Load Balancer constantly pings backend VMs. If a probe fails, the LB stops sending new connections to that VM. If all probes fail, the service goes down.
2. Checking Network Security Groups (NSGs): Traffic must be allowed at two points: the Load Balancer probe IP (168.63.129.16) implies the customized probe port must be open on the VM's NSG, and the actual application traffic port (e.g., port 80 or 443) must be open.
3. Backend IP Configuration: Ensure the VMs are actually in the Backend Pool and that their NICs are associated with the correct IP configurations. A common issue is a VM being removed from the pool or a mismatch in IP versions (IPv4 vs IPv6).
4. Operating System Firewalls: Unless traffic is blocked by an NSG, it reaches the VM. If the local Windows Firewall or Linux iptables blocks the probe port or application port, the Load Balancer treats the instance as unhealthy.
5. Snat Port Exhaustion: For outbound connectivity, if too many outbound flows are initiated, SNAT ports may run out, causing connection failures.
How to answer questions regarding Troubleshoot load balancing in an exam When faced with a scenario-based question on the AZ-104 exam, follow this logical deduction path:
Step 1: Check the Health Probe configuration. Is the protocol (TCP/HTTP) and port correct? Step 2: Check network security. Look for an NSG applied to the subnet or NIC that might be blocking the source tag AzureLoadBalancer. Step 3: Check the standard IP address. Remember that the IP 168.63.129.16 is used by Azure to probe health. If this is blocked inside the VM or on the NSG, the LB will fail. Step 4: Check the SKU. Standard Load Balancers offer multi-dimensional metrics in Azure Monitor; Basic Load Balancers provide very limited logs.
Exam Tips: Answering Questions on Troubleshoot load balancing
1. The "All Probes Failed" Scenario: If a question states that traffic is not reaching any VM, look for a shared configuration error, such as a restrictive NSG blocking the probe port on the entire subnet, or a misconfigured Load Balancing Rule.
2. The "One VM Failed" Scenario: If only one VM is not receiving traffic, the issue is likely local to that VM (e.g., the web service is service stopped, or the local firewall is on).
3. Protocol Mismatches: Pay attention to HTTP vs. TCP probes. If an HTTP probe is configured for path /health, but the backend application doesn't have that path configured, the probe returns a 404, and the LB marks the instance unhealthy. A TCP probe would succeed as long as the port is open.
4. Asymmetric Routing: Watch for questions where response traffic tries to leave via a different path (e.g., a secondary NIC/Instance Level Public IP) than it entered. Azure Load Balancers require symmetric return traffic.
5. Diagnostics Availability: Remember: Standard Load Balancers support Azure Monitor metrics. Basic Load Balancers use Log Analytics/Storage Accounts for logs and are less real-time.