AWS Network Load Balancers Test
A load balancer serves as the single point of contact for clients. The load balancer distributes incoming traffic across multiple targets, such as Amazon EC2 instances. This increases the availability of your application. You add one or more listeners to your load balancer.
A listener checks for connection requests from clients, using the protocol and port that you configure, and forwards requests to a target group.
A target group routes requests to one or more registered targets, such as EC2 instances, using the protocol and the port number that you specify. Network Load Balancer target groups support the TCP, UDP, TCP_UDP, and TLS protocols. You can register a target with multiple target groups. You can configure health checks on a per target group basis. Health checks are performed on all targets registered to a target group that is specified in a listener rule for your load balancer.
If the load balancer is unavailable or error-prone, then applications will be denied access to critical resources. This can adversely impact user experience with the applications. Likewise, if connections from clients and targets are timing out often, or if targets are frequently becoming unhealthy, load balancing irregularities are a given.This again can degrade application performance. Furthermore, if you do not size the load balancer with capacity units or count of listeners in keeping with the volume of network traffic, the processing power of the load balancer will be severely compromised.
Therefore, to ensure the continuous availability and peak performance of the applications, administrators should ensure that the AWS network load balancers are up and running at all times, are configured right, and process requests in a swift, uniform, and error-free manner. The AWS Network Load Balancers test helps administrators achieve all of the above!
For each AWS Network Load Balancer, this test reports the current state of that load balancer. The load balancers in an abnormal state can thus be isolated. Reset packets are tracked for the clients, targets, and the load balancer, so that administrators can proactively identify unhealthy targets. Similarly, improper timeout configurations also surface in the bargain. The routing rules and configuration of such a load balancer should then be scrutinized and reset (if required) to improve target responsiveness.
Target of the test: Amazon Cloud
Agent deploying the test: A remote agent
Output of the test: One set of results for each network load balancer
Parameter | Description |
---|---|
Test Period |
How often should the test be executed. |
Host |
The host for which the test is to be configured. |
Access Type |
eG Enterprise monitors the AWS cloud using AWS API. By default, the eG agent accesses the AWS API using a valid AWS account ID, which is assigned a special role that is specifically created for monitoring purposes. Accordingly, the Access Type parameter is set to Role by default. Furthermore, to enable the eG agent to use this default access approach, you will have to configure the eG tests with a valid AWS Account ID to Monitor and the special AWS Role Name you created for monitoring purposes. Some AWS cloud environments however, may not support the role-based approach. Instead, they may allow cloud API requests only if such requests are signed by a valid Access Key and Secret Key. When monitoring such a cloud environment therefore, you should change the Access Type to Secret. Then, you should configure the eG tests with a valid AWS Access Key and AWS Secret Key. Note that the Secret option may not be ideal when monitoring high-security cloud environments. This is because, such environments may issue a security mandate, which would require administrators to change the Access Key and Secret Key, often. Because of the dynamicity of the key-based approach, Amazon recommends the Role-based approach for accessing the AWS API. |
AWS Account ID to Monitor |
This parameter appears only when the Access Type parameter is set to Role. Specify the AWS Account ID that the eG agent should use for connecting and making requests to the AWS API. To determine your AWS Account ID, follow the steps below:
|
AWS Role Name |
This parameter appears when the Access Type parameter is set to Role. Specify the name of the role that you have specifically created on the AWS cloud for monitoring purposes. The eG agent uses this role and the configured Account ID to connect to the AWS Cloud and pull the required metrics. To know how to create such a role, refer to Creating a New Role. |
AWS Access Key, AWS Secret Key, Confirm AWS Access Key, Confirm AWS Secret Key |
These parameters appear only when the Access Type parameter is set to Secret.To monitor an Amazon instance, the eG agent has to be configured with the access key and secret key of a user with a valid AWS account. For this purpose, we recommend that you create a special user on the AWS cloud, obtain the access and secret keys of this user, and configure this test with these keys. The procedure for this has been detailed in the Obtaining an Access key and Secret key topic. Make sure you reconfirm the access and secret keys you provide here by retyping it in the corresponding Confirm text boxes. |
Proxy Host and Proxy Port |
In some environments, all communication with the AWS cloud and its regions could be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none , indicating that the eG agent is not configured to communicate via a proxy, by default. |
Proxy User Name, Proxy Password, and Confirm Password |
If the proxy server requires authentication, then, specify a valid proxy user name and password in the proxy user name and proxy password parameters, respectively. Then, confirm the password by retyping it in the CONFIRM PASSWORD text box. By default, these parameters are set to none, indicating that the proxy sever does not require authentication by default. |
Proxy Domain and Proxy Workstation |
If a Windows NTLM proxy is to be configured for use, then additionally, you will have to configure the Windows domain name and the Windows workstation name required for the same against the proxy domain and proxy workstation parameters. If the environment does not support a Windows NTLM proxy, set these parameters to none. |
Exclude Region |
Here, you can provide a comma-separated list of region names or patterns of region names that you do not want to monitor. For instance, to exclude regions with names that contain 'east' and 'west' from |
DD Frequency |
Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency. |
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement |
Description |
Measurement Unit |
Interpretation |
|||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Listeners |
Indicates the number of listeners configured for this load balancer. |
Number |
A listener checks for connection requests, using the protocol and port that you configure. The rules that you define for a listener determine how the load balancer routes requests to its registered targets. By default, you can configure a maximum of 50 listener for a load balancer. You can however increase this count based on the current and anticipated connection load on the load balancer. Use the detailed diagnosis to know which listeners are configured on the load balancer, the security policies and actions defined for every listener, and the SSL certificate of each listener. |
|||||||||||||||
Healthy instances |
Indicates the number of targets of this load balancer that are healthy. |
Number |
Each load balancer node periodically sends requests to its registered targets to test their status. These tests are called health checks. These checks are conducted using the health check settings that are configured for the target groups with which a target is registered. After your target is registered, it must pass one health check to be considered healthy. A non-zero value for this measure means one/more registered targets have passed at least one health check. Ideally therefore, the value of this measure should be high. Use the detailed diagnosis of this measure to identify the healthy targets and the target groups they belong to. |
|||||||||||||||
Unhealthy instances |
Indicates the number of targets of this load balancer that are unhealthy. |
Number |
Each load balancer node periodically sends requests to its registered targets to test their status. These tests are called health checks. These checks are conducted using the health check settings that are configured for the target groups with which a target is registered. After your target is registered, it must pass one health check to be considered healthy. Likewise, if a target fails a configured number of consecutive health checks, then that target will be considered 'Unhealthy'. A non-zero value for this measure hence means that one/more targets registered with a load balancer are consistently failing health checks. This is a cause for concern. In such a scenario, use the detailed diagnosis of this measure to know which targets are unhealthy, which target groups they belong to, and why their health checks failed. This should help you troubleshoot the failures quickly and effectively. Here are some possible reasons for health check failures and how to fix them:
|
|||||||||||||||
Active TCP connections |
Indicates the total number of concurrent TCP flows (or connections) from clients to targets. This metric includes connections in the SYN_SENT and ESTABLISHED state. |
Number |
This is a good indicator of the connection load on the load balancer. In the event of an overload condition, you can use the detailed diagnosis of this measure to know which availability zone is generating the maximum load. |
|||||||||||||||
New TCP connections |
Indicates the total number of new TCP connections established from clients to targets during the last measurement period. |
Number |
Use the detailed diagnosis of this measure to know the number of new TCP connections that clients established with targets in each availability zone. In the event of a sudden spike in connections over the load balancer, you can use the detailed metrics to identify which availability zone is generating the maximum load. |
|||||||||||||||
Consumed capacity |
Indicates the number of Network load balancer capacity units (NLCU) used by this load balancer. |
Number |
Typically, AWS charges you for each hour or partial hour that a Network Load Balancer is running, and the number of Network Load Balancer Capacity Units (NLCU) used per hour. An NLCU measures the dimensions on which the Network Load Balancer processes your traffic (averaged over an hour). The three dimensions measured are:
You are charged only on one of the three dimensions that has the highest usage for the hour. For Transmission Control Protocol (TCP) traffic, an NLCU contains:
For User Datagram Protocol (UDP) traffic, an NLCU contains:
For Transport Layer Security (TLS) traffic, an NLCU contains:
TCP and UDP traffic refers to the traffic destined for any TCP/UDP listener on your Network Load Balancer while TLS traffic refers to the traffic destined for any TLS listener on your Network Load Balancer. You may want to keep an eye on the value of this measure to understand your Network load balancer usage. Sudden, significant spikes in this measure value may require investigation. Under such circumstances, its best to track changes to the values of the Active connections, New connections, and Processed data by load balancer measures as well. As these measures influence the LCU, knowledge of these measure dynamics may provide pointers to where the major resource spend is. |
|||||||||||||||
Processed data |
Indicates the total number of bytes processed by the load balancer, including TCP/IP headers. This count includes traffic to and from targets, minus health check traffic. |
MB |
The value of this measure is considered for computing the capacity units (LCUs) consumed by the load balancer. These capacity units are in turn used for calculating your hourly Network Load Balancer usage costs. Use the detailed diagnosis of this measure to know the amount of data processed by the load balancer for each availability zone. |
|||||||||||||||
Reset packets sent from client |
Indicates the total number of reset (RST) packets sent from a client to a target via this load balancer. |
Number |
For each TCP request that a client makes through a Network Load Balancer, the state of that connection is tracked. If no data is sent through the connection by either the client or the target for longer than the idle timeout, the connection is closed. If a client or a target sends data after the idle timeout period elapses, it receives a TCP RST packet to indicate that the connection is no longer valid. A non-zero value of this measure indicates that resets are generated by the client and forwarded by the load balancer. A high value for this measure indicates that clients have frequently lost connection to targets. This is a cause for concern and should be investigated. In this case, use the detailed diagnosis to know how many reset packets were sent by clients to targets in each availability zone. |
|||||||||||||||
Reset packets generated by load balancer |
Indicates the total number of reset (RST) packets generated by this load balancer. |
Number |
If a target becomes unhealthy, the load balancer sends a TCP RST for packets received on the client connections associated with the target, unless the unhealthy target triggers the load balancer to fail open. If you see a spike in the value of this measure just before or just as the value of the Unhealthy instances measure increases, it is likely that the TCP RST packets were sent because the target was starting to fail but hadn't been marked unhealthy. If you see persistent increases in the value of this measure without targets being marked unhealthy, you can check the VPC flow logs for clients sending data on expired flows. You can also use the detailed diagnosis of this measure to know how many reset packets generated by the load balancer per availability zone. |
|||||||||||||||
Reset packets sent from target |
Indicates the total number of reset (RST) packets sent from a target to a client via this load balancer. |
Number |
For each TCP request that a client makes through a Network Load Balancer, the state of that connection is tracked. If no data is sent through the connection by either the client or the target for longer than the idle timeout, the connection is closed. If a client or a target sends data after the idle timeout period elapses, it receives a TCP RST packet to indicate that the connection is no longer valid. A non-zero value of this measure indicates that resets are generated by the target and forwarded by the load balancer. A high value for this measure indicates frequent connection timeouts between targets and clients. This is a cause for concern and should be investigated. In this case, use the detailed diagnosis to know how many reset packets were sent by targets in each availability zone. |
|||||||||||||||
Total target groups |
Indicates the total number of target groups managed by this load balancer. |
Number |
Use the detailed diagnosis of this measure to know which target groups are managed by this load balancer. |
|||||||||||||||
State |
Indicates the current state of this load balancer. |
Number |
The values that this measure can report and the state they represent are detailed in the table below:
|
|||||||||||||||
Other targets |
Indicates the number of targets of this load balancer that are neither in a healthy nor in an unhealthy state. |
Number |
If this measure reports a non-zero value, it could imply that one/more targets are in one of the following states:
You can use the detailed diagnosis of this measure to determine the exact state of the targets. |