Exadata Cell Disk IO Test

When database performance issues are related to I/O load on the Exadata storage servers, typically there will be increased latencies in the I/O-related wait events, and increased database time in the User I/O or System I/O wait classes.

A cell disk with a processing bottleneck will not be able to process user requests for data quickly, thereby causing prolonged delays in data access for users. Similarly, a cell disk that is overloaded will not be able to perform at peak capacity, thus affecting the user experience with the storage server. Administrators hence have to continuously track the load on and the processing speed of each of the cell disks, so that potential overload conditions and probable processing delays can be detected proactively and pre-emptively treated. The Exadata Cell Disk IO test helps administrators with this.

This test monitors the level of traffic on each cell disk created on an Oracle Exadata Storage Server, and helps isolate irregularities in load balancing across the cell disks. Alongside, the test also helps identify which cell disk is experiencing processing bottlenecks (if any), so that the bottleneck can be cleared before users complain of slowdowns.

Target of the test : Oracle Exadata Storage Server

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each cell disk on the target Oracle Exadata Storage Server being monitored

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed

Host

The IP address of the host for which this test is to be configured.

Port

The port number at which the specified host listens. By default, this is NULL.

Username, Password and Confirm Password

By default, this test uses the Cell Control Command-Line Interface (CellCLI) to pull out the required metrics. To use the CLI, the test first needs to connect to the target storage server via SSH, and then run commands using CLI. For running the commands, this test requires the credentials of a cellmonitor user. Specify the login credentials of such a user in the Username and Password text boxes and confirm the Password by retyping it in the Confirm Password text box.

SSH Port

This test uses the Cell CLI to pull metrics from the target Oracle Exadata Storage Server. To run the CLI commands, this test first needs to establish an SSH connection with the target storage server. To enable the test to establish this connection, specify the SSH Port here.

Timeout

 Specify the time duration for which this test should wait for a response from the storage system in the Timeout text box. By default, this is 120 seconds.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Large blocks read rate

Indicates the rate at which data was read in large blocks from this cell disk.

MB/sec

These measures are a good indicator of read I/O processing ability of the cell disks.

Compare the value of these measures across the cell disks to figure out the cell disk that reads maximum large blocks/small blocks.

Small blocks read rate

Indicates the rate at which data was read in small blocks from this cell disk.

MB/sec

Large blocks write rate

Indicates the rate at which data was written in large blocks to this cell disk.

MB/sec

These measures are a good indicator of write I/O processing ability of the cell disks.

Compare the value of these measures across the cell disks to figure out the cell disk that reads maximum large blocks/small blocks.

 

Small blocks write rate

Indicates the rate at which data was written in small blocks to this cell disk.

MB/sec

Scrubbing job read rate

Indicates the rate at which data was read from this cell disk by the scrubbing job.

MB/sec

Scrub IO - occurs when Oracle Exadata System Software automatically inspects and repairs the hard disks. Scrub I/O is performed periodically when the hard disks are idle, and mostly results in large disk reads, which should be throttled automatically if the disk becomes I/O bound.

IO error rate

Indicates the number of I/O errors recorded for this cell disk per minute.

Errors/min

Ideally, the value of this measure should be zero.

Compare the value of this measure across cell disks to identify the cell disk that is prone to errors.

Large block read requests

Indicates the number of read requests to read large blocks from this cell disk per second.

Requests/sec

Compare the value of these measures across the cell disks to figure out the cell disk that processes maximum read requests to read large blocks/small blocks.

Small block read requests

Indicates the number of read requests to read small blocks from this cell disk per second.

Requests/sec

Large block write requests

Indicates the number of write requests to write large blocks to this cell disk per second.

Requests/sec

Compare the value of these measures across the cell disks to figure out the cell disk that processes maximum write requests to write large blocks/small blocks.

 

Small block write requests

Indicates the number of write requests to write small blocks to this cell disk per second.

Requests/sec

Scrubbing job read requests

Indicates the number of requests to read data from from this cell disk by the scrubbing job per second.

Requests/sec

 

Average latency for Large block reads

Indicates the average time taken to read large blocks from this cell disk per request.

Milliseconds/request

A low value is desired for this measure. A sudden/gradual increase in the value of this measure indicates that the I/O processing ability of the cell disk is on decline. Administrators need to analyze the reason behind such issues and rectify them at the earliest.

Average latency for Small block reads

Indicates the average time taken to read small blocks from this cell disk per request.

Milliseconds/request

A low value is desired for this measure. A sudden/gradualincrease in the value of this measure indicates that the I/O processing ability of the cell disk is on decline. Administrators need to analyze the reason behind such issues and rectify them at the earliest.

Average latency for Large block writes

Indicates the average time taken to write large blocks to this cell disk per request.

Milliseconds/request

A low value is desired for this measure. A sudden/gradual increase in the value of this measure indicates that the I/O processing ability of the cell disk is on decline. Administrators need to analyze the reason behind such issues and rectify them at the earliest.

Average latency for Small block writes

Indicates the average time taken to write small blocks to this cell disk per request.

Milliseconds/request

A low value is desired for this measure. A sudden/gradual iincrease in the value of this measure indicates that the I/O processing ability of the cell disk is on decline. Administrators need to analyze the reason behind such issues and rectify them at the earliest.

Device utilization

Indicates the percentage of disk resources utilized for this cell disk.

Percent

A high value indicates that this cell disk is utilizing the maximum of disk resources.

Compare the value of this measure across cell disks to figure out the cell disk that is utilizing maximum disk resources.

Device utilization by large request

Indicates the percentage of disk resources utilized by large requests for this cell disk.

Percent

Compare the value of this measure across cell disks to figure out the cell disk that is utilizing the maximum of disk resources for processing large requests.

Device utilization by small request

Indicates the percentage of disk resources utilized by small requests for this cell disk.

Percent

Compare the value of this measure across cell disks to figure out the cell disk that is utilizing the maximum of disk resources for processing small requests.