Machine Reliability - OS Test

Frequent crashes, Blue Screen of Death (BSOD) errors, application failures and slowdowns over time can lead to significant downtime, data loss or complete shutdown of a system. To prevent such anomalies, administrators should continuously track stability of the system to quickly identify potential bottlenecks and resolve issues before they cause major disruptions in business critical operations and unexpected system failures. The Machine Reliability - OS test can help administrators in this regard!

This test auto-discovers the Windows systems in the target Windows Systems Group, and for each system, reports the machine stability in percentage. If the value of this measure is found to be low, it may indicate that the system is experiencing frequent errors, crashes, high CPU/RAM usage, overheating, or hardware issues. This way, the administrators are promptly alerted to potential bottlenecks and instability of the system.

Target of the test : A Windows Systems Group

Agent deploying the test : A remote agent

Outputs of the test : One set of results for every Windows system

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed. By default, this is set to 60 mins.

Host

The nick name of the Windows Systems Group component for which this test is to be configured.

Port

The port at which the specified Host listens. By default, this is NULL.

Inside View Using

To obtain the 'inside view' of performance of the systems - i.e., to measure the internal performance of the systems - this test uses a light-weight eG VM Agent software deployed on each of the systems. Accordingly, this parameter is by default set to eG VM Agent.

Report By User

This flag is set to No by default. This implies that the Windows systems in environments will always be identified using the system name. In other words, this test will, by default, report measures for every systemname. On the other hand, if you want this test to report the measures for every user on a system, then set this flag to Yes. In such a case, this test will report the measures for every username_on_systemname.

Report Powered OS

By default, this flag is set to Yes, then the 'inside view' tests will report measures for even those Windows systems that do not have any users logged in currently. The systems will be identified by their name and not by the username_on_systemname. On the other hand, if this flag is set to No, then this test will not report measures for those systems to which no users are logged in currently.

Is Cloud VMs?

This flag is set to Yes by default. The value of this flag cannot be changed. This implies that the cloud-based Windows systems in environments will always be identified using the login name of the user. In other words, in cloud environments, this test will, by default, report measures for every username_on_systemname.

Stability Interval Minutes

By default, this is set to 120 minutes indicating that this test will check the reliability monitoring tools for machine stability at the interval of 120 minutes. However, you can override this setting if required.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD FREQUENCY.

Detailed Diagnosis

To make diagnosis more efficient and accurate, eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability

  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Machine stability index

Indicates the current stability percentage of this system.

Percent

Ideally, the value of this measure is preferred to be high.

If the value of this measure is low, administrators should investigate the problem conditions, consider updating the system, or repair/replace software/hardware of the system.

The detailed diagnosis reported by the machine stability index measures reveals the time stamp at which the system updates ran on the system, ID, type and source of the updates, brief description about the updates and the messages stating the installation state of the updates.

Detailed diagnosis reported by Machine stability index measure

Figure 1 : The detailed diagnosis reported by the Machine stability index measure