Xen Storage Activity Test

XenServer provides support for a broad range of storage hardware. The term Storage Repository (SR) is used to describe a particular storage target on which Virtual Disk Images (VDIs) are stored. A VDI is a disk abstraction that contains the contents of a disk as presented to a virtual machine. XenServer allows these VDIs to be supported on a large number of SR types, including local disks, NFS filers, Fibre Channel disks and shared iSCSI LUNs. The SR abstraction allows advanced storage features such as thin provisioning, VDI snapshots, and fast cloning to be exposed on storage targets that support them.

If a XenServer host is unable to or takes too much time to read from or write to an SR, it can result in undue delays in the provisioning and maintenance (i.e., creation, deletion, cloning, connecting, resizing, etc.) of virtual disk images. This, in turn, can significantly slowdown VM accesses. To ensure that the user experience with VMs remains top-notch, administrators should continuously monitor the I/O throughput of each storage repository (SR) supported by a XenServer host and quickly isolate the slow SRs. This is where the Xen Storage Activity test helps. By continuously measuring and reporting how well each SR handles read and write requests, this test precisely pinpoints slow SRs, thus prompting administrators to probe into the reasons for the slowness and fix them.

Note:

The performance metrics reported by this test are enabled by default in the XenServer 6.1.0 Performance and Monitoring Supplemental Pack. In XenServer 6.2.0 however, these metrics, though part of the core product, are disabled by default, owing to performance reasons related to XenCenter. This means that, when monitoring XenServer 6.2.0, this test will not report any metrics by default. In such cases, to make sure that the test reports metrics, do the following:

  • Login to the XenServer host as root user.
  • Enable the metrics by issuing the following command from the CLI:

    xe-enable-all-plugin-metrics true

Target of the test : A XenServer host

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for each PDB connecting the monitored XenServer host to an SR

Configurable parameters for the test

  1. Test period - How often should the test be executed
  2. Host - The host for which the test is to be configured.
  3. XEN user - To enable the eG agent to connect to the XenServer API for collecting statistics of interest, this test should login to the XenServer as a root user. Provide the name of the root user in the XEN USER text box. Root user privileges are mandatory when monitoring a XenServer 5.5 (or below). However, if you are monitoring XenServer 5.6 (or above) and you prefer not to expose the credentials of the root user, then, you have the option of configuring a user with pool-admin privileges as the xen user. If you do not want to expose the credentials of a root/pool-admin user, then you can configure the tests with the credentials of a xen user with Read-only privileges to the XenServer. However, if this is done, then the Xen Uptime test will not run, and the Xen CPU and Xen Memory tests will not be able to report metrics for the control domain descriptor. To avoid such an outcome, do the following before attempting to configure the eG tests with a xen user who has Read-only privileges to the XenServer:

    • Modify the target XenServer’s configuration in the eG Enterprise system. For this, follow the Infrastructure -> Components -> Add/Modify menu sequence, pick Citrix XenServer as the Component type, and click the Modify button corresponding to the target XenServer.
    • In the modify component details page that then appears, make sure that the os is set to Xen and the Mode is set to ssh.
    • Then, in the same page, proceed to provide the User and Password of a user who has the right to connect to the XenServer console via SSH.
    • Then, click the Update button to save the changes.
  4. Once this is done, you can configure the eG tests with the credentials of a xen user with Read-only privileges.   

  5. xen password - The password of the specified xen user needs to be mentioned here.
  6. confirm password - Confirm the xen password by retyping it here.
  7. ssl - By default, the Xen Server is not SSL-enabled. This indicates that by default, the eG agent communicates with the XenServer using HTTP. Accordingly, the ssl flag is set to No by default. If you configure the XenServer to use SSL, then make sure that the SSL flag is set to Yes, so that the eG agent communicates with the XenServer using HTTPS. Note that a default SSL certificate comes bundled with every XenServer installation. If you want the eG agent to use this default certificate for communicating with an SSL-enabled XenServer, then no additional configuration is required. However, if you do not want to use the default certificate, then you can generate a self-signed certificate for use by the XenServer. In such a case, you need to explicitly follow the broad steps given below to enable the eG agent to communicate with the XenServer via HTTPS:

    • Obtain the server-certificate for the XenServer
    • Import the server-certificate into the local certificate store of the eG agent

    For a detailed discussion on each of these steps, refer to the Troubleshooting section of this document.

  8. webport - By default, in most virtualized environments, the XenServer listens on port 80 (if not SSL-enabled) or on port 443 (if SSL-enabled). This implies that while monitoring an SSL-enabled XenServer, the eG agent, by default, connects to port 443 of the server to pull out metrics, and while monitoring a non-SSL-enabled XenServer, the eG agent connects to port 80. Accordingly, the webport parameter is set to 80 or 443 depending upon the status of the ssl flag.  In some environments however, the default ports 80 or 443 might not apply. In such a case, against the webport parameter, you can specify the exact port at which the XenServer in your environment listens so that the eG agent communicates with that port.

Measurements made by the test

Measurement Description Measurement Unit Interpretation

Total throughput

Indicates the throughput of this SR.

MB/Sec

A high value indicates high throughput and rapid I/O processing by the SR. Compare the value of this measure across SRs to identify the SR with the lowest throughput.

Read rate

Indicates the rate at which the host reads data from this SR.

MB/Sec

Ideally, the value of this measure should be high. A consistent drop in the value of this measure indicates a reading bottleneck in the SR. You can compare the value of this measure across SRs to identify that SR which is the slowest in processing read requests.

Write rate

Indicates the rate at which the host writes data to this SR.

MB/Sec

Ideally, the value of this measure should be high. A consistent drop in the value of this measure indicates a writing bottleneck in the SR. You can compare the value of this measure across SRs to identify that SR which is the slowest in processing write requests.

Total IOPS

Indicates the rate at which I/O operations are performed by this SR.

Requests/Sec

This measure is a good indicator of the I/O processing capacity of the SR.  A high value is hence desired for this measure. A consistent drop in this value could indicate a processing bottleneck. In such a situation, you can compare the value of the Read operations and Write operations measures of the corresponding SR to figure out where the bottleneck lies – in reading data from the SR? or in writing to the SR?

Read operations

Indicates the rate at which this SR services read requests.

Requests/Sec

Ideally, the value of this measure should be high. A steady drop in this value indicates a slowdown in processing read requests. Compare the value of this measure across SRs to know which SR is the slowest in responding to read requests.

Write operations

Indicates the rate at which this SR services write requests.

Requests/Sec

Ideally, the value of this measure should be high. A steady drop in this value indicates a slowdown in processing write requests. Compare the value of this measure across SRs to know which SR is the slowest in responding to write requests. 

Time spent waiting for I/O

Indicates the percentage of time the host’s CPU was waiting for this SR to complete I/O processing.

Percent

A high value for this measure indicates that the SR is taking too long to complete I/O processing. This hints at a probable processing bottleneck with the SR. 

Average latency

Indicates the average time taken by this SR to process I/O requests.

MilliSeconds

A high value for this measure is a cause for concern, as it indicates that the SR is highly latent and takes too long to process I/O. Compare the value of this measure across SRs to identify the most latent SR.

Average queue size

Indicates the average number of I/O requests to this SR that are in queue for processing.

Number

If the value of this measure grows consistently, it indicates that the SR is unable to process requests quickly enough to clear the queue. The SR with the maximum number of queued requests could be experiencing a serious I/O processing bottleneck. To identify this SR, compare the value of this measure across SRs.

Current requests in flight

Indicates the number of I/O requests to this SR that are currently being processed. 

Number