Server Load Balancer - SLB Test

SLB (Server Load Balancer) distributes inbound network traffic across multiple Elastic Compute Service (ECS) instances that act as backend servers based on forwarding rules. You can use SLB to improve the responsiveness and availability of your applications.

After you add ECS instances that are deployed in the same region to a SLB instance, SLB uses virtual IP addresses (VIPs) to virtualize these ECS instances into backend servers in a high-performance server pool that ensures high availability. Client requests are distributed to the ECS instances based on forwarding rules.

SLB checks the health status of the ECS instances and automatically removes unhealthy ones from the server pool to eliminate single points of failure (SPOFs). This enhances the resilience of your applications. You can also use SLB to defend your applications against distributed denial of service (DDoS) attacks.

If an SLB instance is inaccessible to clients, then applications that overlay the backend ECS instances will be unable to receive or process requests, thereby impacting user productivity and damaging user experience with the Alibaba cloud. Similarly, if many backend ECS instances are unhealthy, then the healthy servers will be overloaded with requests, thereby affecting application responsiveness. To avoid this, administrators need to track the status of each SLB instance and the health of ECS instances managed by each instance. Additionally, the administrators also need to study the incoming and outgoing traffic of every SLB instance to ascertain whether a single SLB instance would suffice to cater to the availability and load requirements of the application, or whether multiple SLB instances are required. Likewise, the connection load on an instance, its responsiveness to requests, and the quality of these responses should also be monitored, so that overload conditions and processing latencies can be captured and addressed rapidly. The Server Load Balancer - SLB test helps administrators perform all of the above!

For an SLB instance, this test reports the status of that instance and the count of healthy and unhealthy servers managed by that instance. This way, administrators can figure out if too many unhealthy instances exist, diagnose the reason for the 'bad health' of the instances, and restore those instances to normalcy, so that they can be added back to the SLB server pool to ensure the high availability and responsiveness of the applications they support. Additionally, the test also tracks the request and response traffic to and from the SLB instance and the time taken by the instance to service the requests; in the process, the test draws administrator attention to potential bottlenecks in request processing. Administrators are also alerted if data, packets, and connections are dropped frequently. These metrics prompt administrators to figure out how the capacity of the SLB instance can be optimized, so it can handle additional load without dropping them.

Target of the test : An Alibaba Cloud Account

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each SLB instance

Configurable parameters for the test
Parameters Description

Test period

How often should the test be executed

Host

The host for which the test is to be configured.

Alibaba Access Key and Alibaba Secret Key

This test makes REST API requests to the Alibaba cloud to pull the metrics. For this purpose, the test needs to be configured with an AccessKey pair. An AccessKey pair is typically used to call an operation of an Alibaba Cloud service. It is also used to initiate an API request or use a cloud service SDK to manager cloud resources. An AccessKey pair is characterized by an AccessKey ID and an AccessKey Secret. The AccessKey ID is used to identify a user/cloud account. The AccessKey Secret is used to verify a user/cloud account.

The first step to configuring the eG agent with an AccessKey pair is to create an AccessKey pair for the target cloud acount. To achieve this, follow the steps below:

  1. Log on to the RAM console by using an Alibaba Cloud account.
  2. In the left-side navigation pane, click Users under Identities.
  3. On the Users page, click the username of the RAM user for which you want to create an AccessKey pair in the User Logon Name/Display Name column.
  4. On the page that appears, click Create AccessKey in the User AccessKeys section.

    Note:

    You must enter a verification code if you create an AccessKey pair for the first time.

  5. Click Close.

    Note:

    • The AccessKey secret is displayed only when you create an AccessKey pair.
    • If the AccessKey pair is leaked or lost, you must create a new one. You can create a maximum of two AccessKey pairs.

  6. Make note of the AccessKey ID and AccessKey secret, once they are displayed.
  7. Then, configure the Alibaba Access Key parameter of the test with the AccessKey ID, and the Alibaba Secret Key parameter with the AccessKey Secret you made note of.

If you failed to make note of the AccessKey ID and AccessKey Secret at the time of creating the AccessKey pair, then you can obtain the same at a later point in time. Similarly, if an AccessKey pair pre-exists for the target cloud account, then you do not have to create another one. Instead, you can obtain the AccessKey ID and AccessKey Secret of the existing AccessKey pair and configure the eG agent with the same. For this, follow the steps below:

  1. Use an Alibaba Cloud account to log on to the Alibaba Cloud Management console.
  2. Move the pointer over the profile picture in the upper-right corner, and click AccessKey.
  3. In the Security Tips message that appears, click Continue to manage AccessKey. AccessKey ID and AccessKey Secret are displayed. 
  4. Make note of the displayed ID and secret.
  5. Then, configure the Alibaba Access Key parameter of the test with the AccessKey ID, and the Alibaba Secret Key parameter with the AccessKey Secret you made note of.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Instance status

Indicates the status of this SLB instance.

 

The values that this measure reports and their corresponding numeric values are listed below:

Measure Value Numeric Value
Active 1
Inactive 2
Locked 3

Note:

This measure reports the Measure Values listed in the table above to indicate the current status of an SLB instance. In the graph of this measure however, the same is indicated using the numeric equivalents only.

Use the detailed diagnosis of this measure to view more information about the target SLB instance - this includes the IP address of the instance, the address type, when the SLB instance was created, what is its network type, its listener port and protocols, its vSwitch and VPC, and the its master and slave zone.

Healthy backend ECS instances

Indicates the number of ECS instances managed by this SLB instance that are healthy.

Number

 

Faulty backend ECS instances

Indicates the number of ECS instances managed by this SLB instance that are unhealthy.

Number

If the value of this measure is higher than the value of the Healthy backend ECS instances measure, it means that the SLB instance is managing more faulty servers than healthy ones. This is a cause for concern, as it implies that the SLB instance is not backed by enough healthy servers to ensure the high availability and responsiveness of applications. In such circumstances, you may want to figure out the reason for the server faults and resolve them, so that there are more healthy servers in the server pool for handing application requests.

Inbound traffic

Indicates the rate at which data is received by this SLB instance from an external network.

Kbps

These are good indicators of the data/packet load on the instance.

 

 

Outbound traffic

Indicates the rate at which data is sent by this SLB instance to an external network.

Kbps

Incoming packets

Indicates the rate at which request packets were received by this SLB instance.

Packets/Sec

Outgoing packets

Indicates the rate at which response packets were sent by this SLB instance.

Packets/Sec

Active connections

Indicates the number of established TCP connections with this SLB instance.

Number

This is a good indicator of the connection load on the SLB instance. If you expect the load to grow in the future, you can add more SLB instances. Also, you can configure the SLB instances as guaranteed-performance instances rather than as shared-performance instances. Then, you can increase the maximum number of connections an SLB instance can support, so that it can handle additional connections without dropping them.

Inactive connections

Indicates the number of TCP connections to the SLB instance that are not in an established state currently - i.e., the count of idle connections.

Number

The value 0 is desired for this measure. If many connections are idle, you may want to identify such connections and terminate them, so that they do not unnecessarily overload the SLB instance, causing new connections to drop.

Maximum concurrent connections

Indicates the maximum number of concurrent connections to this SLB instance.

Number

 

Dropped inbound traffic

Indicates the rate at which inbound traffic to this SLB instance was dropped.

Kbps

Ideally, the value of these measures should be 0 or very low.

Dropped outbound traffic

Indicates the rate at which outbound traffic from this SLB instance was dropped.

Kbps

Dropped incoming packets

Indicates the rate at which this SLB instance dropped incoming packets.

Packets/Sec

Dropped outgoing packets

Indicates the rate at which this SLB instance dropped outgoing packets.

Packets/Sec

Dropped connections

Indicates the number of connections that this SLB instance dropped per second.

Number

Ideally, the value of this measure should be 0. A non-zero value or a high value indicates that the SLB instance has dropped many connections. In a guaranteed-performance instance, this can typically happen if the number of connections to that instance has violated the Maximum Connection specification of that instance. To avoid this, then you can configure more SLB instances, increase the Maximum Connection limit set for the SLB instance, and/or increase the Queries Per Second an SLB instance can process. You can also identify idle connections are terminate them.

Average query rate

Indicates the rate at which this SLB instance processes HTTP/S queries.

Queries/Sec

This measure is available only for SLB instances configured with Layer-7 listeners.

If the value of this measure exceeds the Queries Per Second (QPS) limit set for an SLB instance, then new connections will be dropped.

Response time

Indicates the average response time of this SLB instance.

Seconds

This measure is available only for SLB instances configured with Layer-7 listeners.

If this measure reports a high value, it implies that the SLB instance is slow in responding to requests.

HTTP status code 2xx

Indicates the number of 2xx status codes returned by this SLB instance to the client through a port.

Number

This measure is available only for SLB instances configured with Layer-7 listeners.

A high value for this measure indicates that the SLB instance has returned many successful responses.

HTTP status code 3xx

Indicates the number of 3xx status codes returned by this SLB instance to the client through a port.

Number

This measure is available only for SLB instances configured with Layer-7 listeners.

HTTP responses with 3xx status codes denote redirects.

HTTP status code 4xx

Indicates the number of 4xx status codes returned by this SLB instance to the client through a port.

Number

This measure is available only for SLB instances configured with Layer-7 listeners.

A low value is desired for this measure as responses with 4xx status codes denote client errors.

HTTP status code 5xx

Indicates the number of 5xx status codes returned by this SLB instance to the client through a port.

Number

This measure is available only for SLB instances configured with Layer-7 listeners.

A low value is desired for this measure as responses with 5xx status codes denote server errors.

Other HTTP status codes

Indicates the number of status codes other than 2xx, 3xx, 4xx, and 5xx returned by this SLB instance to the client through a port.

Number

This measure is available only for SLB instances configured with Layer-7 listeners.

Upstream status code 4xx

Indicates the number of 4xx status codes returned by the backend server to this SLB instance through a port.

Number

This measure is available only for SLB instances configured with Layer-7 listeners.

A non-zero value for this measure indicates that one/more backend servers (i.e., ECS instances) experienced client errors when responding to requests forwarded to them by the SLB instance.

Upstream status code 5xx

Indicates the number of 5xx status codes returned by the backend server to this SLB instance through a port.

Number

This measure is available only for SLB instances configured with Layer-7 listeners.

A non-zero value for this measure indicates that one/more backend servers (i.e., ECS instances) experienced server errors when responding to requests forwarded to them by the SLB instance.

Upstream response time

Indicates the average latency of requests sent by the backend server to the proxy through a port.

Secs

This measure is available only for SLB instances configured with Layer-7 listeners.

A high value is indicative of poor responsiveness of the backend servers.