Elastic Compute Service - ECS Test
Elastic Compute Service (ECS) is a high-performance, stable, reliable, and scalable IaaS-level service provided by Alibaba Cloud. ECS eliminates the need to invest in IT hardware up front and allows you to quickly scale computing resources on demand. This makes ECS more convenient and efficient than physical servers.
An ECS instance is a virtual machine that contains basic computing components such as the vCPU, memory, operating system, network, and disk. You can fully customize and modify all configurations of an ECS instance. After you log on to the Alibaba Cloud Management console, you can manage resources and configure the environment of your ECS instances.
The lifecycle of an ECS instance begins when the instance is created and ends when the instance is released. During this lifecycle, an ECS instances goes through many states. Tracking these states can help administrators quickly and easily resolve user complaints regarding the unavailability/inaccessibility of an instance, which in turn helps in elevating the user experience with that instance.
ECS instances are categorized into different instance families based on business scenarios. An instance family contains different instance types based on their vCPU and memory specifications. Instance types can have different vCPU and memory specifications, such as the CPU model and clock speed. As business requirements change, organizations may want to switch to an instance type that better suits their requirements. It is the responsibility of an administrator to monitor how an instance uses its vCPU and memory specification over time, spot potential resource contentions , and urge the organization to upgrade/downgrade to an appropriate instance type, so as to ensure smooth and uninterrupted transaction of business.
An ECS instance must contain a system disk to store the operating system and core configurations. An image is used to initialize a system disk and determines the operating system and initial software configurations of an ECS instance. Typically, the capacity of system disks is small. Therefore, it is good practice for administrators to continuously track the usage of and I/O activity on the system disks of every instance, and identify those instances with storage space that is insufficient for their needs. By adding more disks to such instances, administrators can enable the instances to boot up without a glitch, thus allowing end-users on-demand access.
Besides vCPU, memory, and disk usage, administrators should also pay attention to the bandwidth usage of instances, so that bandwidth-hungry instances can be identified
With the help of the Elastic Compute Service - ECS test, administrators can achieve all of the above! This test auto-discovers the ECS instances deployed in an Alibaba cloud account. For each instance, this test reports the state of that instance, and alerts administrators if any instance is in an abnormal state (eg., expired, expiring, locked etc.). When instance owners complaint of being unable to access their instances, administrators can instantly figure out if the inaccessibility can be attributed to the abnormal state of the instances. In addition, the test keeps a close watch on the resource (vCPU, memory, disk, and network) usage of each ECS instance in a monitored Alibaba cloud account. In the process, administrators can quickly and accurately identify instances that are over-utilizing resources, and initiate measures to right-size such instances - eg., by way of recommending an upgrade to an instance type with a higher vCPU/memory configuration, by adding more system disks to instances that are running out of disk space, etc.
Target of the test : An Alibaba Cloud Account
Agent deploying the test : A remote agent
Outputs of the test : One set of results for each instance in the Alibaba cloud account that is being monitored
Parameters | Description |
---|---|
Test period |
How often should the test be executed |
Host |
The host for which the test is to be configured. |
Alibaba Access Key and Alibaba Secret Key |
This test makes REST API requests to the Alibaba cloud to pull the metrics. For this purpose, the test needs to be configured with an AccessKey pair. An AccessKey pair is typically used to call an operation of an Alibaba Cloud service. It is also used to initiate an API request or use a cloud service SDK to manager cloud resources. An AccessKey pair is characterized by an AccessKey ID and an AccessKey Secret. The AccessKey ID is used to identify a user/cloud account. The AccessKey Secret is used to verify a user/cloud account. The first step to configuring the eG agent with an AccessKey pair is to create an AccessKey pair for the target cloud acount. To achieve this, follow the steps below:
If you failed to make note of the AccessKey ID and AccessKey Secret at the time of creating the AccessKey pair, then you can obtain the same at a later point in time. Similarly, if an AccessKey pair pre-exists for the target cloud account, then you do not have to create another one. Instead, you can obtain the AccessKey ID and AccessKey Secret of the existing AccessKey pair and configure the eG agent with the same. For this, follow the steps below:
|
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement | Description | Measurement Unit | Interpretation | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Status |
Indicates the current state of this instance. |
|
The values that this measure reports and their corresponding numeric values are listed below:
Some of the Measure Values listed in the table above are described below:
Note: This measure reports the Measure Values listed in the table above to indicate the current state of an ECS instance. In the graph of this measure however, the same is indicated using the numeric equivalents only. Use the detailed diagnosis of this measure to know more about the instance. The details displayed include the instance type, when it was created, the operating system of the instance, the region and zone to which the instance belongs, the image from which the instance was created, and the network type, IP addresses, VPC, and security group of the instance. |
||||||||||||||||||||||||
Total CPU |
Indicates the total number of CPU cores configured for this instance. |
Number |
|
||||||||||||||||||||||||
Total memory |
Indicates the memory configuration of this instance. |
MB |
|
||||||||||||||||||||||||
Network throughput |
Indicates the total inbound and outbound bandwidth usage of this instance. |
Kbps |
Compare the value of this measure across instances to know which instance is making the most use of the bandwidth resources. |
||||||||||||||||||||||||
Network inbound bandwidth |
Indicates the maximum bandwidth used by traffic flowing into this instance from the public network. |
Kbps |
These metrics will give administrators an idea as to where public bandwidth resources are spent. |
||||||||||||||||||||||||
Network outbound bandwidth |
Indicates the maximum bandwidth used by traffic flowing out of this instance to the public network. |
Kbps |
|||||||||||||||||||||||||
CPU utilization |
Indicates the percentage of allocated CPU units that is currently used by this instance. |
Percent |
A value close to 100% for an instance indicates that such an instance is over-utilizing the CPU resources allocated to it. |
||||||||||||||||||||||||
Intranet traffic received |
Indicates the bandwidth consumed by traffic flowing into this instance from the intranet. |
Kbps |
By comparing the value of this measure across instances, you can accurately identify the instance that is receiving bandwidth-intensive intranet traffic. |
||||||||||||||||||||||||
Intranet traffic sent |
Indicates the bandwidth consumed by traffic flowing out of this instance to the intranet. |
Kbps |
By comparing the value of this measure across instances, you can accurately identify the instance that is sending bandwidth-intensive intranet traffic. |
||||||||||||||||||||||||
Intranet bandwidth |
Indicates the total bandwidth consumed by intranet traffic flowing into and out of this instance. |
Kbps |
Compare the value of this measure across instances to identify the instance handling bandwidth-intensive intranet traffic. You can then compare the value of the Intranet traffic received and Intranet traffic sent measures of that instance to figure out what type of intranet traffic is hogging the bandwidth resources - incoming traffic? or outgoing traffic? |
||||||||||||||||||||||||
Internet bandwidth |
Indicates the total bandwidth consumed by internet traffic flowing into and out of this instance. |
Kbs |
Compare the value of this measure across instances to identify the instance handling bandwidth-intensive internet traffic. You can then compare the value of the Internet traffic received and Internet traffic sent measures of that instance to figure out what type of internet traffic is hogging the bandwidth resources - incoming traffic? or outgoing traffic? |
||||||||||||||||||||||||
Internet traffic received |
Indicates the bandwidth consumed by traffic flowing into this instance from the internet. |
Kbps |
By comparing the value of this measure across instances, you can accurately identify the instance that is receiving bandwidth-intensive internet traffic. |
||||||||||||||||||||||||
Internet traffic sent |
Indicates the bandwidth consumed by traffic flowing out of this instance to the internet. |
Kbps |
By comparing the value of this measure across instances, you can accurately identify the instance that is sending bandwidth-intensive internet traffic. |
||||||||||||||||||||||||
Disk IOPS |
Indicates rate at which I/O operations were performed on the disks of this instance. |
Operations/Sec |
Compare the value of this measure across instances to know which instance is experiencing unusually high levels of I/O activity. In such a situation, you can compare the value of the Disk read operations and Disk write operations measures for that instance to accurately isolate what caused the I/O overload - a high rate of read operations? or write operations? |
||||||||||||||||||||||||
Disk read operations |
Indicates the rate at which disk read operations were performed by this instance. |
Operations/Sec |
By comparing the value of this measure across instances, you can accurately identify the instance that is experiencing a high level of disk read operations. |
||||||||||||||||||||||||
Disk write operations |
Indicates the rate at which disk write operations were performed by this instance. |
Operations/Sec |
By comparing the value of this measure across instances, you can accurately identify the instance that is experiencing a high level of disk write operations. |
||||||||||||||||||||||||
Disk throughput |
Indicates the bandwidth consumed by disk read/write operations on this instance. |
KB/Sec |
If this measure is very high for an instance, it means that the I/O activity on the disks of that instance is consuming bandwidth excessively. In such a situation, you can compare the value of the Disk read bandwidth and Disk write bandwidth measures of that instance to understand what type of I/O activity is contributing to the unusual bandwidth consumption - read activity? or write activity? |
||||||||||||||||||||||||
Disk read bandwidth |
Indicates the bandwidth consumed by disk read operations on this instance. |
KB/Sec |
Compare the value of this measure across instances to know which instance is engaged in bandwidth-intensive disk reads. |
||||||||||||||||||||||||
Disk write bandwidth |
Indicates the bandwidth consumed by disk write operations on this instance. |
KB/Sec |
Compare the value of this measure across instances to know which instance is engaged in bandwidth-intensive disk writes. |
||||||||||||||||||||||||
CPU credit usage |
Indicates the number of CPU credits consumed by this instance. |
Number |
This measure is reported only for burstable instances. Burstable instances are an economical instance type that is intended to cope with burstable performance requirements in entry-level computing scenarios. These instances use CPU credits to ensure computing performance, and are suited for scenarios where CPU usage is typically low but bursts in CPU usage occur on occasion. You can accumulate CPU credits that can be used to increase the computing performance of burstable instances when required by your workloads. The CPU credit mechanism allows you to minimize the consumption of resources during off-peak hours, and scale resources out during peak hours at no extra cost. When you create a burstable instance, 30 CPU credits are provisioned for each vCPU of the instance, which are initial CPU credits. These credits enable you to complete deployment tasks after you start the instance. When a burstable instance is started, it starts to consume CPU credits to maintain its computing performance. The value of this measure denotes the number of CPU credits so spent. By comparing the value of this measure across burstable instances, you can quickly identify the instance that is consuming too many CPU credits. |
||||||||||||||||||||||||
CPU credit balance |
Indicates the CPU credits that are still to be used by this instance. |
Number |
As mentioned earlier, once a burstable instance is started, it begins consuming Initial CPU credits of 30 that is provisioned to it. While at it, the burstable instance also earns CPU credits at a fixed rate that is determined by the instance type. The amount of CPU credits that a vCPU can earn per hour is based on its baseline performance - i.e., the amount of vCPU capacity that is continuously provisioned to a burstable instance. For example, 25% baseline performance of instance A indicates that the CPU credits that a vCPU of the instance earns per hour can keep the vCPU running at 25% utilization for an hour or at 100% utilization for 15 minutes (60 × 25%). In response to its baseline performance, each vCPU earns 15 CPU credits per hour. Therefore, if instance A has two vCPUs, it earns 30 CPU credits per hour. If the CPU credits so earned exceed the credits consumed, the net credits are accrued as CPU credit balance. This is the value that is reported by the CPU credit balance measure. A high value is desired for this measure, as a high CPU credit balance for a burstable instance means that CPU resources are guaranteed to that instance for a maximum of 24 hours. |
||||||||||||||||||||||||
Total disk |
Indicates the total number of disks currently used by this instance. |
Number |
Use the detailed diagnosis of this measure to know which disks are used by the instance, the type of each disk, when every disk was created, the image that stores a copy of that disk's data, and when the disk was attached to the instance. |
||||||||||||||||||||||||
Disk size |
Indicates the total capacity of disks used by this instance. |
GB |
|
||||||||||||||||||||||||
CPU pending I/O operations |
Indicates the percentage of the CPU processes waiting for I/O operations to complete. |
Percent |
A high value indicates frequent I/O operations on an instance. |
||||||||||||||||||||||||
Free memory |
Indicates the percentage of memory allocated to this instance that is still unused. |
Percent |
A high value is desired for this measure. A value close to 100% indicates that the instance is running out of memory. |
||||||||||||||||||||||||
Memory usage |
Indicates the percentage of allocated memory that is used by this instance. |
Percent |
A low value is desired for this measure. A value close to 100% is a cause for concern, as it indicates that the instance is rapidly running out of memory. If the instance appears to be consistently over-utilizing its memory, you may want to consider upgrading to a different instance type to meet with its memory demand. |
||||||||||||||||||||||||
Average system load |
Indicates the average load on this instance during the last 5 minutes. |
Percent |
A high value indicates that the instance is busy. |
||||||||||||||||||||||||
Total snapshots |
Indicates the total number of snapshots created for disks used by this instance. |
Number |
The Alibaba Cloud snapshot service allows you to create crash-consistent snapshots for all disk categories. You can use snapshots for the following scenarios:
|
||||||||||||||||||||||||
Total snapshot size |
Indicates the total size of the snapshots created for the disks used by this instance. |
Number |
|