Exadata Cell Disk IO Test
When database performance issues are related to I/O load on the Exadata storage servers, typically there will be increased latencies in the I/O-related wait events, and increased database time in the User I/O or System I/O wait classes.
A cell disk with a processing bottleneck will not be able to process user requests for data quickly, thereby causing prolonged delays in data access for users. Similarly, a cell disk that is overloaded will not be able to perform at peak capacity, thus affecting the user experience with the storage server. Administrators hence have to continuously track the load on and the processing speed of each of the cell disks, so that potential overload conditions and probable processing delays can be detected proactively and pre-emptively treated. The Exadata Cell Disk IO test helps administrators with this.
This test monitors the level of traffic on each cell disk created on an Oracle Exadata Storage Server, and helps isolate irregularities in load balancing across the cell disks. Alongside, the test also helps identify which cell disk is experiencing processing bottlenecks (if any), so that the bottleneck can be cleared before users complain of slowdowns.
Target of the test : Oracle Exadata Storage Server
Agent deploying the test : A remote agent
Outputs of the test : One set of results for each cell disk on the target Oracle Exadata Storage Server being monitored
Parameter | Description |
---|---|
Test period |
How often should the test be executed |
Host |
The IP address of the host for which this test is to be configured. |
Port |
The port number at which the specified host listens. By default, this is NULL. |
Username, Password and Confirm Password |
By default, this test uses the Cell Control Command-Line Interface (CellCLI) to pull out the required metrics. To use the CLI, the test first needs to connect to the target storage server via SSH, and then run commands using CLI. For running the commands, this test requires the credentials of a cellmonitor user. Specify the login credentials of such a user in the Username and Password text boxes and confirm the Password by retyping it in the Confirm Password text box. |
SSH Port |
This test uses the Cell CLI to pull metrics from the target Oracle Exadata Storage Server. To run the CLI commands, this test first needs to establish an SSH connection with the target storage server. To enable the test to establish this connection, specify the SSH Port here. |
Timeout |
Specify the time duration for which this test should wait for a response from the storage system in the Timeout text box. By default, this is 120 seconds. |
Measurement | Description | Measurement Unit | Interpretation |
---|---|---|---|
Large blocks read rate |
Indicates the rate at which data was read in large blocks from this cell disk. |
MB/sec |
These measures are a good indicator of read I/O processing ability of the cell disks. Compare the value of these measures across the cell disks to figure out the cell disk that reads maximum large blocks/small blocks. |
Small blocks read rate |
Indicates the rate at which data was read in small blocks from this cell disk. |
MB/sec |
|
Large blocks write rate |
Indicates the rate at which data was written in large blocks to this cell disk. |
MB/sec |
These measures are a good indicator of write I/O processing ability of the cell disks. Compare the value of these measures across the cell disks to figure out the cell disk that reads maximum large blocks/small blocks.
|
Small blocks write rate |
Indicates the rate at which data was written in small blocks to this cell disk. |
MB/sec |
|
Scrubbing job read rate |
Indicates the rate at which data was read from this cell disk by the scrubbing job. |
MB/sec |
Scrub IO - occurs when Oracle Exadata System Software automatically inspects and repairs the hard disks. Scrub I/O is performed periodically when the hard disks are idle, and mostly results in large disk reads, which should be throttled automatically if the disk becomes I/O bound. |
IO error rate |
Indicates the number of I/O errors recorded for this cell disk per minute. |
Errors/min |
Ideally, the value of this measure should be zero. Compare the value of this measure across cell disks to identify the cell disk that is prone to errors. |
Large block read requests |
Indicates the number of read requests to read large blocks from this cell disk per second. |
Requests/sec |
Compare the value of these measures across the cell disks to figure out the cell disk that processes maximum read requests to read large blocks/small blocks. |
Small block read requests |
Indicates the number of read requests to read small blocks from this cell disk per second. |
Requests/sec |
|
Large block write requests |
Indicates the number of write requests to write large blocks to this cell disk per second. |
Requests/sec |
Compare the value of these measures across the cell disks to figure out the cell disk that processes maximum write requests to write large blocks/small blocks.
|
Small block write requests |
Indicates the number of write requests to write small blocks to this cell disk per second. |
Requests/sec |
|
Scrubbing job read requests |
Indicates the number of requests to read data from from this cell disk by the scrubbing job per second. |
Requests/sec |
|
Average latency for Large block reads |
Indicates the average time taken to read large blocks from this cell disk per request. |
Milliseconds/request |
A low value is desired for this measure. A sudden/gradual increase in the value of this measure indicates that the I/O processing ability of the cell disk is on decline. Administrators need to analyze the reason behind such issues and rectify them at the earliest. |
Average latency for Small block reads |
Indicates the average time taken to read small blocks from this cell disk per request. |
Milliseconds/request |
A low value is desired for this measure. A sudden/gradualincrease in the value of this measure indicates that the I/O processing ability of the cell disk is on decline. Administrators need to analyze the reason behind such issues and rectify them at the earliest. |
Average latency for Large block writes |
Indicates the average time taken to write large blocks to this cell disk per request. |
Milliseconds/request |
A low value is desired for this measure. A sudden/gradual increase in the value of this measure indicates that the I/O processing ability of the cell disk is on decline. Administrators need to analyze the reason behind such issues and rectify them at the earliest. |
Average latency for Small block writes |
Indicates the average time taken to write small blocks to this cell disk per request. |
Milliseconds/request |
A low value is desired for this measure. A sudden/gradual iincrease in the value of this measure indicates that the I/O processing ability of the cell disk is on decline. Administrators need to analyze the reason behind such issues and rectify them at the earliest. |
Device utilization |
Indicates the percentage of disk resources utilized for this cell disk. |
Percent |
A high value indicates that this cell disk is utilizing the maximum of disk resources. Compare the value of this measure across cell disks to figure out the cell disk that is utilizing maximum disk resources. |
Device utilization by large request |
Indicates the percentage of disk resources utilized by large requests for this cell disk. |
Percent |
Compare the value of this measure across cell disks to figure out the cell disk that is utilizing the maximum of disk resources for processing large requests. |
Device utilization by small request |
Indicates the percentage of disk resources utilized by small requests for this cell disk. |
Percent |
Compare the value of this measure across cell disks to figure out the cell disk that is utilizing the maximum of disk resources for processing small requests. |