AWS Elastic Load Balancing - ELB Test

Elastic Load Balancing distributes incoming application traffic across multiple EC2 instances, in multiple Availability Zones.

A load balancer accepts incoming traffic from clients and routes requests to its registered targets (such as EC2 instances) in one or more Availability Zones. The load balancer also monitors the health of its registered targets and ensures that it routes traffic only to healthy targets. When the load balancer detects an unhealthy target, it stops routing traffic to that target, and then resumes routing traffic to that target when it detects that the target is healthy again.

You can add and remove instances from your load balancer as your needs change, without disrupting the overall flow of requests to your application. Elastic Load Balancing scales your load balancer as traffic to your application changes over time, and can scale to the vast majority of workloads automatically.

This way, Elastic Load Balancing increases the fault tolerance of your applications and improves overall application performance. By keeping an eye out for issues in Elastic Load Balancing, administrators can ensure the prompt detection and swift resolution of issues, and can thus prevent application performance degradation. For this, administrators can take the help of the AWS Elastic Load Balancing - ELB Test.

By default, this test reports metrics for each load balancer that is configured. Flaky connection between a load balancer and its backend instances, latent communication between a load balancer and its instances, and HTTP errors (if any) encountered during load balancing are promptly captured by the test and reported. This enables administrators to be forewarned of issues in load balancing, so that they can initiate measures to avert the issues before they impact application performance.

Optionally, you can configure the test to report metrics per Availability Zone. The zone-level insight will help administrators understand if instances in a particular zone experience more latencies/errors than instances in other zones.

Target of the test: Amazon EC2 Cloud

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each load balancer / Availability Zone

First-level descriptor: AWS Region

Second-level descriptor: Load balancer / Availability Zone, depending upon the option chosen from the ELB Filter Name parameter of this test

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

AWS Access Key, AWS Secret Key, Confirm AWS Access Key, Confirm AWS Secret Key

To monitor an Amazon EC2 instance, the eG agent has to be configured with the access key and secret key of a user with a valid AWS account. For this purpose, we recommend that you create a special user on the AWS cloud, obtain the access and secret keys of this user, and configure this test with these keys. The procedure for this has been detailed in the Obtaining an Access key and Secret key topic. Make sure you reconfirm the access and secret keys you provide here by retyping it in the corresponding Confirm text boxes.

Proxy Host and Proxy Port

In some environments, all communication with the AWS EC2 cloud and its regions could be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none , indicating that the eG agent is not configured to communicate via a proxy, by default.

Proxy User Name, Proxy Password, and Confirm Password

If the proxy server requires authentication, then, specify a valid proxy user name and password in the Proxy User Name and Proxy Password parameters, respectively. Then, confirm the password by retyping it in the Confirm Password text box. By default, these parameters are set to none, indicating that the proxy sever does not require authentication by default.

Proxy Domain and Proxy Workstation

If a Windows NTLM proxy is to be configured for use, then additionally, you will have to configure the Windows domain name and the Windows workstation name required for the same against the Proxy Domain and Proxy Workstation parameters. If the environment does not support a Windows NTLM proxy, set these parameters to none.

Exclude Region

Here, you can provide a comma-separated list of region names or patterns of region names that you do not want to monitor. For instance, to exclude regions with names that contain 'east' and 'west' from monitoring, your specification should be: *east*,*west*

ELB Filter Name

By default, this parameter is set to LoadBalancerName. This means that by default, this test will report metrics for each load balancer.

If required, you can override this default setting by setting the ELB Filter Name parameter to Availability Zone. In this case, this test will report metrics for every Availability Zone in which the instances interacting with the load balancer reside.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Unestablished connections

By default, this measure represents the number of connections that were attempted but failed between this load balancer and a seemingly healthy backend instance.

If the ELB Filter Name is set to AvailabilityZone, then this measure represents the number of connection attempts that were attempted but failed between a load balancer and the seemingly healthy backend instances in this Availability Zone.

Number

Ideally, the value of this measure should be 0.

Connection errors between ELB and your servers occur when ELB attempts to connect to a backend, but cannot successfully do so. This type of error is usually due to network issues or backend instances that are not running properly.

Healthy instances

By default, this measure represents the number of healthy instances that are registered with this load balancer.

If the ELB Filter Name is set to AvailabilityZone, then this measure represents the number of healthy instances in this Availability Zone that are registered with a load balancer.

Number

A newly registered instance is considered healthy after it passes the first health check. If cross-zone load balancing is enabled, the number of healthy instances for a load balancer is calculated across all Availability Zones. Otherwise, it is calculated per Availability Zone.

Unhealthy instances

By default, this measure represents the number of unhealthy instances that are registered with this load balancer.

If the ELB Filter Name is set to AvailabilityZone, then this measure represents the number of unhealthy instances in this Availability Zone.

Number

If an instance exceeds the unhealthy threshold defined for the health checks, ELB flags it and stops sending requests to that instance. The most common cause is the health check exceeding the load balancer’s timeout. Make sure to always have enough healthy backend instances in each availability zone to ensure good performance. You should also correlate this metric with Latency and Pending submission requests to make sure you have enough instances to support the volume of incoming requests without substantially slowing down the response time.

Latency

By default, this measure represents the time that elapsed from when this load balancer sent the request to a registered instance till when the instance started to send the response headers.

If the ELB Filter Name is set to AvailabilityZone, then this measure represents the time that elapsed from when a load balancer sent a request to a registered instance in this Availability Zone, till when the instance started to send the response headers.

Secs

This metric measures your application latency due to request processing by your backend instances, not latency from the load balancer itself. Tracking backend latency gives you good insight on your application performance. If it’s high, requests might be dropped due to timeouts, which can lead to frustrated users. High latency can be caused by network issues, overloaded backend hosts, or non-optimized configuration provided by AWS to troubleshoot high latency.

HTTP 2xx response codes

By default, this measure represents the number of HTTP 2xx (success) codes currently returned by the registered backend instances to this load balancer.

If the ELB Filter Name is set to AvailabilityZone, then this measure represents the number of HTTP 2xx (success) codes currently returned by the registered backend instances in this Availability Zone.

Number

This count does not include any response codes generated by the load balancer.

HTTP 3xx response codes

By default, this measure represents the number of HTTP 3xx (redirection) codes currently returned by the registered backend instances to this load balancer.

If the ELB Filter Name is set to AvailabilityZone, then this measure represents the number of HTTP 3xx (redirection) codes currently returned by the registered backend instances in this Availability Zone.

Number

This count does not include any response codes generated by the load balancer.

HTTP 4xx response codes

By default, this measure represents the number of HTTP 4xx (client error) codes currently returned by the registered backend instances to this load balancer.

If the ELB Filter Name is set to AvailabilityZone, then this measure represents the number of HTTP 4xx (client error) codes currently returned by the registered backend instances in this Availability Zone.

Number

This count does not include any response codes generated by the load balancer.

HTTP 5xx response codes

By default, this measure represents the number of HTTP 5xx (server error) codes currently returned by the registered backend instances to this load balancer.

If the ELB Filter Name is set to AvailabilityZone, then this measure represents the number of HTTP 5xx (server error) codes currently returned by the registered backend instances in this Availability Zone.

Number

This count does not include any response codes generated by the load balancer.

Completed requests

By default, this measure represents the number of requests this load balancer received and sent to the registered EC2 instances.

If the ELB Filter Name is set to AvailabilityZone, then this measure represents the number of requests a load balancer received and sent to the registered EC2 instances in this Availability Zone.

Number

This metric measures the amount of traffic your load balancer is handling. Keeping an eye on peaks and drops allows you to alert on drastic changes which might indicate a problem with AWS or upstream issues like DNS. If you are not using Auto Scaling then knowing when your request count changes significantly can also help you know when to adjust the number of instances backing your load balancer.

Rejected requests

By default, this measure represents the number of requests that have been rejected by this load balancer due to a full surge queue.

If the ELB Filter Name is set to AvailabilityZone, then this measure represents the number of requests that have been rejected by the load balancer due to a full surge queue for backend instances in this Availability Zone.

Number

When the Pending submission requests reaches the maximum of 1,024 queued requests, new requests are dropped, the user receives a 503 error, and the spillover count metric is incremented. In a healthy system, this metric is always equal to zero.

Pending submission requests

By default, this measure represents the number of inbound requests currently queued by this load balancer waiting to be accepted and processed by a backend instance.

If the ELB Filter Name is set to AvailabilityZone, then this measure represents the number of inbound requests currently queued by a load balancer waiting to be accepted and processed by backend instances in this Availability Zone.

Number

When your backend instances are fully loaded and can’t process any more requests, incoming requests are queued, which can increase latency leading to slow user navigation or timeout errors. That’s why this metric should remain as low as possible, ideally at zero. Backend instances may refuse new requests for many reasons, but it’s often due to too many open connections. In that case you should consider tuning your backend or adding more backend capacity. The “max” statistic is the most relevant view of this metric so that peaks of queued requests are visible. Crucially, make sure the queue length always remains substantially smaller than the maximum queue capacity, currently capped to 1,024 requests, so you can avoid dropped requests.

HTTP 4XX client error

By default, this measure represents the number of HTTP 4xx errors (client error) currently returned by this load balancer.

If the ELB Filter Name is set to AvailabilityZone, then this measure represents the number of HTTP 4xx errors (client error) currently returned by the load balancer that is routing requests to the backend instances in this Availability Zone.

Number

This is usually not much you can do about 4xx errors, since this metric basically measures the number of erroneous requests sent to ELB (which returns a 4xx code). If you want to investigate, you can check in the ELB access logs to determine which code has been returned.

HTTP 5XX server error

By default, this measure represents the number of HTTP 5xx errors (server error) currently returned by this load balancer.

If the ELB Filter Name is set to AvailabilityZone, then this measure represents the number of HTTP 5xx errors (server error) currently returned by the load balancer that is routing requests to the backend instances in this Availability Zone.

Number

The metric is reported if there are no healthy instances registered to the load balancer, or if the request rate exceeds the capacity of the instances (spillover) or the load balancer.

This metric counts the number of requests that could not be properly handled. It can have different root causes:

  • If the error code is 502 (Bad Gateway), the backend instance returned a response, but the load balancer couldn’t parse it because the load balancer was not working properly or the response was malformed.
  • If it’s 503 (Service Unavailable), the error comes from your backend instances or the load balancer, which may not have had enough capacity to handle the request. Make sure your instances are healthy and registered with your load balancer.
  • If a 504 error (Gateway Timeout) is returned, the response time exceeded ELB’s idle timeout. You can confirm it by checking if latency (see table below) is high and 5xx errors are returned by ELB. In that case, consider scaling up your backend, tuning it, or increasing the idle timeout to support slow operations such as file uploads. If your instances are closing connections with ELB, you should enable keep-alive with a timeout higher than the ELB idle timeout.