Storm Supervisors Test

The nodes that follow instructions given by the nimbus are called as Supervisors. A supervisor has multiple worker processes and it governs worker processes to complete the tasks assigned by the nimbus. A worker process will execute tasks related to a specific topology.

Using this test, administrators can identify the number of available CPU cores, the total memory used by the CPU and the available worker slots in the Supervisor node of the target Apache Storm. Any aberrant condition of memory usage will alert the administrators to take remedial measures before users start complaining. Administrators may schedule periodic reboots of the Supervisor nodes. By knowing that a specific node has been up for an unusually long time, an administrator may come to know that the scheduled reboot task is not working on a node. This test included in the eG agent also monitors the uptime of critical nodes.

Target of the test : Apache Storm

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for each Supervisor node in the target Apache Storm.

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed.

Host

The IP address of the target server that is being monitored.

Port

The port number through which the Apache Storm communicates. The default port is 8080.

SSL

By default, the SSL flag is set to No, indicating that the target Apache Storm is not SSL-enabled by default. To enable the test to connect to an SSL-enabled Apache Storm, set the SSL flag to Yes.

Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Total slots

Indicates the total number of slots in the Supervisor node.

Number

 

Used slots

Indicates the number of used slots in the Supervisor node.

Number

A very low value is required for this measure.

Free slots

Indicates the number of free slots in the Supervisor node.

Number

A very high value is required for this measure.

Total CPU core

Indicates the total number of CPU cores in the Supervisor node.

Number

 

Used CPU core

Indicates the number of used CPU cores in the Supervisor node.

Number

A very low value is required for this measure.

Available CPU core

Indicates the number of available CPU cores in the Supervisor node.

Number

A very high value is required for this measure.

Available CPU percentage

Indicates the percentage of available CPU in the Supervisor node.

Percent

A value close to 100% is required for this measure.

Total memory

Indicates the total size of memory in the Supervisor node.

MB

 

Used memory

Indicates the size of used memory in the Supervisor node.

MB

A very low value is required for this measure.

Available memory

Indicates the size of available memory in the Supervisor node.

MB

A very high value is required for this measure.

Available memory percentage

Indicates the percentage of available memory in the Supervisor node.

Percent

A value close to 100% is required for this measure.

Uptime

Indicates the time period that the Supervisor node has been up since the last time this test ran.

Hrs/Mins/Secs

If the Supervisor node has not been rebooted during the last measurement period and the agent has been running continuously, this value will be equal to the measurement period. If the Supervisor node was rebooted during the last measurement period, this value will be less than the measurement period of the test. For example, if the measurement period is 300 secs, and if the Supervisor node was rebooted 120 secs back, this metric will report a value of 120 seconds. The accuracy of this metric is dependent on the measurement period - the smaller the measurement period, greater the accuracy.

Rebooted

Indicates whether the Supervisor node has been rebooted during the last measurement period or not.

 

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value

Numeric Value
Yes 0
No 1

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether the Supervisor node has been rebooted during the last measurement period or not.

If this measure shows 1, it means that the Supervisor node was rebooted during the last measurement period. By checking the time periods when this metric changes from 0 to 1, an administrator can determine the times when this Supervisor node was rebooted.