Status Hardware Traps Test

This test monitors the status of various hardware elements present in the Stratus server using SNMP traps.

Target of the test : An SNMP trap

Agent deploying the test : An internal agent

Outputs of the test : One set of results for every OID value monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Port

The port at which the application listens.

SourceAddress

Specify a comma-separated list of IP addresses or address patterns of the hosts sending the traps. For example, 10.0.0.1,192.168.10.*. A leading '*' signifies any number of leading characters, while a trailing '*' signifies any number of trailing characters.

OIDValue

Provide a comma-separated list of OID and value pairs returned by the traps. The values are to be expressed in the form, DisplayName:OID-OIDValue. For example, assume that the following OIDs are to be considered by this test: .1.3.6.1.4.1.9156.1.1.2 and .1.3.6.1.4.1.9156.1.1.3. The values of these OIDs are as given hereunder:

OID Value

.1.3.6.1.4.1.9156.1.1.2

Host_system

.1.3.6.1.4.1.9156.1.1.3

NETWORK

In this case the OIDvalue parameter can be configured as Trap1:.1.3.6.1.4.1.9156.1.1.2-Host_system,Trap2:.1.3.6.1.4.1.9156.1.1.3-Network, where Trap1 and Trap2 are the display names that appear as descriptors of this test in the monitor interface.

The test considers a configured OID for monitoring only when the actual value of the OID matches with its configured value. For instance, in the example above, if the value of OID .1.3.6.1.4.1.9156.1.1.2 is found to be Host and not Host_system, then the test ignores OID .1.3.6.1.4.1.9156.1.1.2 while monitoring.

An * can be used in the OID/value patterns to denote any number of leading or trailing characters (as the case may be). For example, to monitor all the OIDs that return values which begin with the letter 'F', set this parameter to Failed:*-F*.

ShowOID

Selecting the True option against ShowOID will ensure that the detailed diagnosis of this test shows the OID strings along with their corresponding values. If you select False, then the values alone will appear in the detailed diagnosis page, and not the OIDs.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Empty

Indicates that a slot in the system is in an "empty" state.

Boolean

For a slot, this state indicates that the slot is empty, physically not present, or electrically inaccessible. If the empty device causes the system to be go into simplex mode, the device is no longer fault tolerant. In some cases this state represents both a slot and a device. For instance, an instance of an SRA_DIMM in the Empty state means that a slot exists for the DIMM, but that the slot is empty. DIMMs, CPU Boards, IO Boards and Processors are represented by such WMI objects. Sensors go to this state instead of the"Not Present" state when they are not present. Empty devices are generally enumerable.

Not present

Indicates that a device in the system is in a "not present" state.

Boolean

This state indicates that a device is either physically not present or electrically inaccessible. For instance, pulling the power cord on a CPU board makes the DIMMs and Processors on the board go to this state. When a WMI object goes to this state, it is generally not enumerable. Thus, this state only appears in state change events.

Removed

Indicates that a device in the system is in a "removed" state. Usually, this is a final state but it can be a transient state.

Boolean

Usually, this state indicates that a device was intentionally removed from service. When intentionally removed from service, the device remains in this state. Only some devices go to this state when removed from services; other devices go to other offline states. Some devices pass through this state as they are brought online.

Dumping

Indicates that a device is in a "Dumping" state. This is a transient state.

Boolean

This state indicates a device is in the process of writing a dump to a file.

Diagnostics passed

Indicates that a device is in a "Diagnostic Passed" state. This is a transient state and the device should change to "online" state when it is brought online.

Boolean

This state indicates that a device has just completed its diagnostics tests.

Initialising

Indicates that a device is in a "Initialising" state. This is a transient state and the device should change to "online" state when it is brought online.

Boolean

This state indicates that a device is in the process of initializing.

Syncing

Indicates that a device is in a "synching" state. This is a transient state and the device should change to "online" state when it is brought online.

Boolean

This state indicates that a device is synchronizing itself with its partners. For instance, when a CPU is brought up, it synchronizes its memory and its processor state with that of its partners.

Offline

Indicates that a device is in a "offline" state.

 

Boolean

This state indicates that a device is offline. Only some devices can go to this state while other devices go into the "Removed From Service" state.

Firmware update complete

Indicates that a device's firmware update procedure has completed.

Boolean

 

Diagnostics

Indicates that a device is running diagnostics.

Boolean

 

Online

Indicates that a device is in a "online" state.

Boolean

This state indicates that the device is online, but not configured for redundancy. For instance, a working NIC that is not part of a team will be in this state. Although the online state does not indicate whether a device is safe-to-pull or not, on a properly configured system such devices can be assumed safe-to-pull.

Simplex

Indicates that a device is in a "Simplex" state.

Boolean

This state indicates that a device is online, configured for redundancy, and is not safe-to-pull. When applied to a port, indicates that the port is configured for redundancy, and that whatever is connected to the port is not safe-to-pull.

Duplex

Indicates that a device is in a "Duplex" state.

Boolean

This state indicates that a device is online, configured for redundancy, and is safe-to-pull. When applied to a port, indicates that the port is configured for redundancy, and that whatever is connected to the port is safe-to-pull.

Shot

Indicates that a device is in a "Shot" state. This is a transient state and the device should either transit to "broken" or "online" state after diagnostic is done.

Boolean

This state indicates that a device experienced a problem and will soon move to either an online state or the broken state.

Broken

Indicates that a device is in a "Broken" state.

Boolean

This state Indicates that a device is broken. In the case of a port, this state may mean that the port is inoperative or that that which attaches to the port is inoperative. There are several reasons that a device could be broken but usually points to hardware errors. Contact your service providers for service checks. In the case where the device is a port, it usually indicates that there is nothing attached to the port, or when whatever should be attached to the port is not responding. For example, a NIC port will be in this state when it cannot detect link.