Storm Supervisors Test
The nodes that follow instructions given by the nimbus are called as Supervisors. A supervisor has multiple worker processes and it governs worker processes to complete the tasks assigned by the nimbus. A worker process will execute tasks related to a specific topology.
Using this test, administrators can identify the number of available CPU cores, the total memory used by the CPU and the available worker slots in the Supervisor node of the target Apache Storm. Any aberrant condition of memory usage will alert the administrators to take remedial measures before users start complaining. Administrators may schedule periodic reboots of the Supervisor nodes. By knowing that a specific node has been up for an unusually long time, an administrator may come to know that the scheduled reboot task is not working on a node. This test included in the eG agent also monitors the uptime of critical nodes.
Target of the test : Apache Storm
Agent deploying the test : An internal/remote agent
Outputs of the test : One set of results for each Supervisor node in the target Apache Storm.
Parameter | Description |
---|---|
Test period |
How often should the test be executed. |
Host |
The IP address of the target server that is being monitored. |
Port |
The port number through which the Apache Storm communicates. The default port is 8080. |
SSL |
By default, the SSL flag is set to No, indicating that the target Apache Storm is not SSL-enabled by default. To enable the test to connect to an SSL-enabled Apache Storm, set the SSL flag to Yes. |
Measurement |
Description |
Measurement Unit |
Interpretation |
||||||
---|---|---|---|---|---|---|---|---|---|
Total slots |
Indicates the total number of slots in the Supervisor node. |
Number |
|
||||||
Used slots |
Indicates the number of used slots in the Supervisor node. |
Number |
A very low value is required for this measure. |
||||||
Free slots |
Indicates the number of free slots in the Supervisor node. |
Number |
A very high value is required for this measure. |
||||||
Total CPU core |
Indicates the total number of CPU cores in the Supervisor node. |
Number |
|
||||||
Used CPU core |
Indicates the number of used CPU cores in the Supervisor node. |
Number |
A very low value is required for this measure. |
||||||
Available CPU core |
Indicates the number of available CPU cores in the Supervisor node. |
Number |
A very high value is required for this measure. |
||||||
Available CPU percentage |
Indicates the percentage of available CPU in the Supervisor node. |
Percent |
A value close to 100% is required for this measure. |
||||||
Total memory |
Indicates the total size of memory in the Supervisor node. |
MB |
|
||||||
Used memory |
Indicates the size of used memory in the Supervisor node. |
MB |
A very low value is required for this measure. |
||||||
Available memory |
Indicates the size of available memory in the Supervisor node. |
MB |
A very high value is required for this measure. |
||||||
Available memory percentage |
Indicates the percentage of available memory in the Supervisor node. |
Percent |
A value close to 100% is required for this measure. |
||||||
Uptime |
Indicates the time period that the Supervisor node has been up since the last time this test ran. |
Hrs/Mins/Secs |
If the Supervisor node has not been rebooted during the last measurement period and the agent has been running continuously, this value will be equal to the measurement period. If the Supervisor node was rebooted during the last measurement period, this value will be less than the measurement period of the test. For example, if the measurement period is 300 secs, and if the Supervisor node was rebooted 120 secs back, this metric will report a value of 120 seconds. The accuracy of this metric is dependent on the measurement period - the smaller the measurement period, greater the accuracy. |
||||||
Rebooted |
Indicates whether the Supervisor node has been rebooted during the last measurement period or not. |
|
The values reported by this measure and its numeric equivalents are mentioned in the table below:
Note: By default, this measure reports the Measure Values listed in the table above to indicate whether the Supervisor node has been rebooted during the last measurement period or not. If this measure shows 1, it means that the Supervisor node was rebooted during the last measurement period. By checking the time periods when this metric changes from 0 to 1, an administrator can determine the times when this Supervisor node was rebooted. |