Elastic Block Store - EBS Test

Amazon Elastic Block Store (Amazon EBS) provides persistent block level storage volumes for use with Amazon instances in the AWS Cloud. An Amazon EBS volume is a durable, block-level storage device that you can attach to a single instance. You can use EBS volumes as primary storage for data that requires frequent updates, such as system drive for an instance or storage for a database application. If such an EBS volume suddenly becomes unavailable or impaired, it is bound to adversely impact the operations of the instance attached to that volume, which in turn will damage the experience of the users of that instance. Administrators need to be promptly alerted to such problem conditions, so that they can instantly initiate remedial action and ensure high instance uptime. Besides volume status, administrators also need to track the I/O load on the EBS volume and continuously measure the ability of the volume to handle that load. This insight will enable administrators to provision the volumes with more or less I/O, so as to optimize I/O processing and maximize volume performance. The Elastic Block Store - EBS test helps administrators in this exercise.

The test periodically checks the health and availability status of each volume used by the instances in the monitored region and notifies administrators if any volume is in an abnormal state. Similarly, the test also tracks the I/O load on every volume and measures how well each volume processes the load - overloaded volumes and those that are experiencing processing hiccups are highlighted in the process.

Target of the test: Amazon Region

Agent deploying the test: A remote agent

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

AWS Access Key, AWS Secret Key, Confirm AWS Access Key, Confirm AWS Secret Key

To monitor an Amazon instance, the eG agent has to be configured with the access key and secret key of a user with a valid AWS account. For this purpose, we recommend that you create a special user on the AWS cloud, obtain the access and secret keys of this user, and configure this test with these keys. The procedure for this has been detailed in the Obtaining an Access key and Secret key topic. Make sure you reconfirm the access and secret keys you provide here by retyping it in the corresponding Confirm text boxes.

Proxy Host and Proxy Port

In some environments, all communication with the AWS cloud and its regions could be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none , indicating that the eG agent is not configured to communicate via a proxy, by default.

Proxy User Name, Proxy Password, and Confirm Password

If the proxy server requires authentication, then, specify a valid proxy user name and password in the proxy user name and proxy password parameters, respectively. Then, confirm the password by retyping it in the CONFIRM PASSWORD text box. By default, these parameters are set to none, indicating that the proxy sever does not require authentication by default.

Proxy Domain and Proxy Workstation

If a Windows NTLM proxy is to be configured for use, then additionally, you will have to configure the Windows domain name and the Windows workstation name required for the same against the proxy domain and proxy workstation parameters. If the environment does not support a Windows NTLM proxy, set these parameters to none.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measures reported by the test:

Measurement

Description

Measurement Unit

Interpretation

State

Indicates the current state of this volume.

 

The values that this measure can report and their corresponding numeric values are detailed in the table below:

Measure Value Description Numeric Value
Creating The volume is being created. The volume will be inaccessible during creation. 0
Available The volume is available 1
In-use The volume is in use 2
Deleting The volume is being deleted 3
Deleted The volume is deleted 4
Error Some error has occurred in the volume 5

The detailed diagnosis of this measure will reveal when the volume was created and in which availability zone it resides.

Note:

By default, this measure will report the Measure Values listed in the table above to indicate the current availability state of a volume. In the graph of this measure however, the same will be represented using the numeric equivalents only.

If any EBS volume is found to be in an abnormal state, then you can use the detailed diagnosis of this measure to know the volume type, when that volume was created, and in which availability zone the volume resides.

Volume Status

Indicates the current health status of this volume

 

AWS  periodically runs volume status checks to enable you to better understand, track, and manage potential inconsistencies in the data on an Amazon EBS volume.

Volume status checks are automated tests that run every 5 minutes and return a pass or fail status. The value that this measure reports varies with the status reported by the volume status checks. The table below describes what value this measure reports when , and also lists the numeric values that correspond to the measure values.

Measure Value Description Numeric Value
OK If all checks pass, the status of the volume is OK. 0
Impaired If a check fails, the status of the volume is impaired 1
Insufficient-data If checks are in progress, then insufficient-data is reported 2

Note:

By default, this measure will report the Measure Values listed in the table above to indicate the current status of a volume. In the graph of this measure however, the same will be represented using the numeric equivalents only.

Idle time:

Indicates the total number of seconds during which no read or write operations were submitted to this volume.

Secs

 

Queue length:

 

 

Indicates the number of read and write operation requests waiting to be completed.

Number

A consistent increase in the value of this measure could indicate a I/O processing bottleneck on the volume.

Read operations:

Indicates the rate at which read operations were performed on this volume.

Operations/Sec

Compare the value of this measure across volumes to know which volume is too slow in processing read requests.

Write operations:

Indicates the rate at which write operations were performed on this volume.

Operations/Sec

Compare the value of this measure across volumes to know which volume is too slow in processing write requests.

Reads:

Indicates the rate at which data was read from this volume.

KB/Sec

Compare the value of this measure to identify the volume that is the slowest in responding to read requests.

Writes:

Indicates the rate at which data was written to this volume.

KB/Sec

Compare the value of this measure to identify the volume that is the slowest in responding to write requests.

Total read time:

 

Indicates the total time taken by all completed read operations.

Secs

A very high value for this measure could indicate that the volume took too long to service one/more read requests.

Total write time:

Indicates the total time taken by all completed write operations.

Secs

A very high value for this measure could indicate that the volume took too long to service one/more write requests.

Provisioned IOPS (SSD)volume throughput:

 

Indicates the percentage of I/O operations per second (IOPS) delivered of the total IOPS provisioned for this volume.

Percent

This measure will be reported for Provisioned IOPS volumes only.

Provisioned IOPS (SSD) volumes are designed to meet the needs of I/O-intensive workloads, particularly database workloads, that are sensitive to storage performance and consistency in random access I/O throughput. You specify an IOPS rate when you create the volume, and Amazon EBS delivers within 10 percent of the provisioned IOPS performance 99.9 percent of the time over a given year.

A Provisioned IOPS (SSD) volume can range in size from 4 GiB to 16 TiB and you can provision up to 20,000 IOPS per volume. The ratio of IOPS provisioned to the volume size requested can be a maximum of 30; for example, a volume with 3,000 IOPS must be at least 100 GiB. You can stripe multiple volumes together in a RAID configuration for larger size and greater performance.

For smaller I/O operations, you may even see an IOPS value that is higher than what you have provisioned - i.e., the value of this measure can be greater than 100%. This could be because the client operating system may be coalescing multiple smaller I/O operations into a smaller number of large chunks.

On the other hand, if the value of this measure is consistently lower than the expected IOPS or throughput you have provisioned, then ensure that your bandwidth is not the limiting factor; your instance should be EBS-optimized (or include 10 Gigabit network connectivity) and your instance type EBS dedicated bandwidth should exceed the I/O throughput you intend to drive. Another possible cause for not experiencing the expected IOPS is that you are not driving enough I/O to the EBS volumes.

Size:

Indicates the current size of this volume.

GB

For a General Purpose (SSD) Volume, volume size is what dictates the baseline performance level of the volume and how quickly it accumulates I/O credits; larger volumes have higher baseline performance levels and accumulate I/O credits faster.

For a Provisioned IOPS (SSD) Volume, the ratio of IOPS provisioned to volume size can be a maximum of 30; for example, a volume with 3,000 IOPS must be at least 100 GiB.

Magnetic volumes can range in size from 1 GiB to 1 TiB.

Total IOPS:

Indicates the total number of I/O operations that were performed on this volume per second.

Operations/Sec

IOPS are input/output operations per second. Amazon EBS measures each I/O operation per second (that is 256 KiB or smaller) as one IOPS. I/O operations that are larger than 256 KiB are counted in 256 KiB capacity units. For example, a single 1,024 KiB I/O operation would count as 4 IOPS; however, 1,024 I/O operations at 1 KiB each would count as 1,024 IOPS.

When you create a 3,000 IOPS volume, either a 3,000 IOPS Provisioned IOPS (SSD) volume or a 1,000 GiB General Purpose (SSD) volume, and attach it to an EBS-optimized instance that can provide the necessary bandwidth, you can transfer up to 3,000 chunks of data per second (provided that the I/O does not exceed the per volume throughput limit of the volume).

If your I/O chunks are very large, then the value of this measure may be lesser than what you provisioned because you are hitting the throughput limit of the volume. For example 1,000 GiB General Purpose (SSD) volume has an IOPS limit of 3,000 and a volume throughput limit of 160 MiB/s. If you are using a 256 KiB I/O size, your volume will reach its throughput limit at 640 IOPS (640 x 256 KiB = 160 MiB). For smaller I/O sizes (such as 16 KiB), this same volume can sustain 3,000 IOPS because the throughput is well below 160 MiB/s.

On Provisioned IOPS Volumes, for smaller I/O operations, you may even see that the value of this measure is higher than what you have provisioned. This could be because the client operating system may be coalescing multiple smaller I/O operations into a smaller number of large chunks.

On the other hand, if the value of this measure is consistently lower than the expected IOPS or throughput you have provisioned for a Provisioned IOPS volume, then ensure that your bandwidth is not the limiting factor; your instance should be EBS-optimized (or include 10 Gigabit network connectivity) and your instance type EBS dedicated bandwidth should exceed the I/O throughput you intend to drive. Another possible cause for not experiencing the expected IOPS is that you are not driving enough I/O to the EBS volumes.

Magnetic volumes deliver approximately 100 IOPS on average, with burst capability of up to hundreds of IOPS.

IOPS limits:

Indicates the IOPS limit of this volume.

Operations/Sec

For Provisioned IOPS volumes, the IOPS limit is specified when creating the volumes.

For General Purpose IOPS volumes, the volume size dictates the baseline IOPS limit of that volume and how quickly it accumulates I/O credits.

IOPS utilization:

 

Indicates the percentage of provisioned IOPS or IOPS limit that is being utilized by this volume.

Percent

This metric can also help you identify over-utilized volumes, which could be impacting application performance. In these cases, you could improve performance by upgrading to a different volume type or provisioning more IOPS.

Throughput:

Indicates the rate of reads and writes processed by this volume.

KB/Second

A consistent drop in this value could indicate a I/O processing bottleneck on the volume.

You may want to closely track the variations to this measure, so that you can proactively identify the volume that may soon reach its throughput limit.

If your I/O chunks are very large, then a volume will reach its throughput limit much before its IOPS limit is reached.

If you are not experiencing the throughput you have provisioned, ensure that your bandwidth is not the limiting factor; your instance should be EBS-optimized (or include 10 Gigabit network connectivity) and your instance type EBS dedicated bandwidth should exceed the I/O throughput you intend to drive.

Detailed Diagnosis:

The detailed diagnosis of the State measure of a volume will reveal when the volume was created and in which availability zone it resides.

Figure 1 : The detailed diagnosis of the State measure of the AWS Elastic Block Store - EBS Test