Exadata Cell System Test
Storage cells are configured on the network, and are managed by the Oracle Exadata System Software CellCLI utility. Storage servers contain cell-based utilities and processes from Oracle Exadata System Software, including:
-
Cell Server (CELLSRV) - This is the primary component of the Oracle Exadata System Software running in the storage server, which provides the majority of the storage server services. CELLSRV services database requests for disk I/O and provides the advanced SQL offload capabilities.
-
Offload Server (CELLOFLSRV) - This is a helper process to the Cell Server that processes offload requests from a specific Database version. These processes allow the Storage server to respond to requests from multiple database versions residing on the same or multiple Database servers.
-
Management Server (MS) - The primary interface to administer, manage and query the status of the storage server. It works in cooperation with the Cell Control Command-Line Interface (CellCLI) and processes most of the commands from CellCLI.
-
Restart Server (RS) - Monitors the heartbeat with the MS and the CELLSRV processes, and restarts the servers if they fail to respond within the allowable heartbeat period.
If any of the cell-based utilities are unavailable/offline/stopped, then the functioning of the storage cell may slow down resulting in poor I/O processing. Also, a sudden hardware failure or an increase in the temperature of the storage cell may result in malfunctioning of the storage cell. To avoid such serious damages and to ensure that the storage cell is functioning at its peak efficiency, it is essential to keep a constant vigil on the performance of the storage cell. This is where the Exadata Cell System test helps!
This test monitors the status of the storage cell. This test also monitors the cell-based utilities of the storage cell and reports the utilities that are offline or stopped. Failure of the hardware components (power supply, fan) are proactively detected and reported. The physical memory utilization and CPU utilization of the cell server and management server helps administrators figure out the server that is consuming too much of resources.
Target of the test : Oracle Exadata Storage Server
Agent deploying the test : A remote agent
Outputs of the test : One set of results for the target Oracle Exadata Storage Server that is being monitored
Parameter | Description |
---|---|
Test period |
How often should the test be executed |
Host |
The IP address of the host for which this test is to be configured. |
Port |
The port number at which the specified host listens. By default, this is NULL. |
Username, Password and Confirm Password |
By default, this test uses the Cell Control Command-Line Interface (CellCLI) to pull out the required metrics. To use the CLI, the test first needs to connect to the target storage server via SSH, and then run commands using CLI. For running the commands, this test requires the credentials of a cellmonitor user. Specify the login credentials of such a user in the Username and Password text boxes and confirm the Password by retyping it in the Confirm Password text box. |
SSH Port |
This test uses the Cell CLI to pull metrics from the target Oracle Exadata Storage Server. To run the CLI commands, this test first needs to establish an SSH connection with the target storage server. To enable the test to establish this connection, specify the SSH Port here. |
Timeout |
Specify the time duration for which this test should wait for a response from the storage system in the Timeout text box. By default, this is 120 seconds. |
Measurement |
Description |
Measurement Unit |
Interpretation |
||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Cell status |
Indicates the current status of the storage cell or target storage server. |
|
The table below indicates the values that this measure can report and their corresponding numeric equivalents:
Note: By default, this measure reports the above-mentioned Measure Values while indicating the current status of the storage cell. However, in the graph of this measure, the status of the storage cell will be represented using the corresponding numeric equivalents only - i.e., 0 or 100. |
||||||||
Fan status |
Indicates the current status of the fan operating in the storage cell. |
|
The table below indicates the values that this measure can report and their corresponding numeric equivalents:
Note: By default, this measure reports the above-mentioned Measure Values while indicating the current status of the fan. However, in the graph of this measure, the status of the fan will be represented using the corresponding numeric equivalents mentioned in the table above. |
||||||||
Temperature status |
Indicates the current temperature status of the storage cell. |
|
The table below indicates the values that this measure can report and their corresponding numeric equivalents:
Note: By default, this measure reports the above-mentioned Measure Values while indicating the current temperature status of the storage cell. However, in the graph of this measure, the temperature status will be represented using the corresponding numeric equivalents mentioned in the table above. |
||||||||
Power status |
Indicates the current power status of the storage cell. |
|
The table below indicates the values that this measure can report and their corresponding numeric equivalents:
Note: By default, this measure reports the above-mentioned Measure Values while indicating the current power status of the storage cell. However, in the graph of this measure, the power status will be represented using the corresponding numeric equivalents mentioned in the table above. |
||||||||
Cell server status |
Indicates the current status of the cell server in the storage cell. |
|
The table below indicates the values that this measure can report and their corresponding numeric equivalents:
Note: By default, this measure reports the above-mentioned Measure Values while indicating the current status of the cell server. However, in the graph of this measure, the status of the cell server will be represented using the corresponding numeric equivalents mentioned in the table above. |
||||||||
Management server status |
Indicates the current status of the management server in the storage cell. |
|
The table below indicates the values that this measure can report and their corresponding numeric equivalents:
Note: By default, this measure reports the above-mentioned Measure Values while indicating the current status of the management server. However, in the graph of this measure, the status of the management server will be represented using the corresponding numeric equivalents mentioned in the table above. |
||||||||
Restart server status |
Indicates the current status of the Restart server in the storage cell. |
|
The table below indicates the values that this measure can report and their corresponding numeric equivalents:
Note: By default, this measure reports the above-mentioned Measure Values while indicating the current status of the Restart server. However, in the graph of this measure, the status of the Restart server will be represented using the corresponding numeric equivalents mentioned in the table above. |
||||||||
Locator LED status |
Indicates the current status of the Locator LED. |
|
The table below indicates the values that this measure can report and their corresponding numeric equivalents:
Note: By default, this measure reports the above-mentioned Measure Values while indicating the current status of the Locator LED. However, in the graph of this measure, the status of the Locator LED will be represented using the corresponding numeric equivalents mentioned in the table above. |
||||||||
Uptime |
Indicates the total time that the storage cell has been up since its last reboot. |
Mins |
Administrators may wish to be alerted if a storage cell has been running without a reboot for a very long period. Setting a threshold for this metric allows administrators to determine such conditions. |
||||||||
Uptime since last measure |
the time period that the storage cell has been up since the last time this test ran. |
Secs |
If the storage cell has not been rebooted during the last measurement period and the agent has been running continuously, this value will be equal to the measurement period. If the storage cell was rebooted during the last measurement period, this value will be less than the measurement period of the test. For example, if the measurement period is 300 secs, and if the storage cell was rebooted 120 secs back, this metric will report a value of 120 seconds. The accuracy of this metric is dependent on the measurement period – the smaller the measurement period, greater the accuracy. |
||||||||
Is restarted? |
Indicates whether/not the storage cell was restarted. |
|
The table below indicates the values that this measure can report and their corresponding numeric equivalents:
Note: By default, this measure reports the above-mentioned Measure Values while indicating whether/not the storage cell was restarted. However, the graph of this measure will be represented using the corresponding numeric equivalents only. |
||||||||
Battery charge on disk controller |
Indicates the percentage of battery charge on the disk controller. |
Percent |
A sudden/gradual decrease in the value of this measure indicates that the battery resource of the disk controller is depleting at a faster pace and the battery needs to be recharged/replaced. |
||||||||
Temperature of disk controller |
Indicates the current temperature of the disk controller. |
Celsius |
The temperature of the disk controller should always be maintained in admissible range. A sudden/gradual increase in the temperature results in over heating of the disk controller and eventually causes the storage server to malfunction. |
||||||||
Temperature of cell |
Indicates the current temperature of the storage cell. |
Celsius |
Ideally, the value of this measure should be within admissible range. A sudden/gradual increase in the value of this measure results in over heating of the storage cell and eventually causes the storage cell to malfunction. |
||||||||
Physical memory utilization |
Indicates the overall percentage of physical memory utilized by the storage cell. |
Percent |
A value close to 100 is a cause of concern and warrants further investigation. |
||||||||
Physical memory utilized by cell server |
Indicates the percentage of physical memory utilized by the cell server. |
Percent |
A high value for this measure indicates that the cell server is consuming too much of physical memory. |
||||||||
Physical memory utilized by management server |
Indicates the percentage of physical memory utilized by the management server. |
Percent |
A high value for this measure indicates that the management server is consuming too much of physical memory. |
||||||||
Swap memory usage |
Indicates the percentage of swap memory utilized by the storage cell. |
Percent |
|
||||||||
Virtual memory utilized by cell server |
Indicates the amount of virtual memory utilized by the cell server. |
GB |
|
||||||||
Total memory utilized by management server |
Indicates the total amount of memory utilized by the management server. |
GB |
|
||||||||
CPU utilization |
Indicates the percentage of CPU utilized by the storage cell. |
Percent |
A value close to 100 is a cause of concern. |
||||||||
CPU utilized by cell server |
Indicates the percentage of CPU utilized by the cell server. |
Percent |
A high value for this measure indicates that the cell server is consuming too much of CPU resources. |
||||||||
CPU utilized by management server |
Indicates the percentage of CPU utilized by the management server. |
Percent |
A high value for this measure indicates that the management server is consuming too much of CPU resources. |