Machine Reliability - OS Test
Frequent crashes, Blue Screen of Death (BSOD) errors, application failures and slowdowns over time can lead to significant downtime, data loss or complete shutdown of a system. To prevent such anomalies, administrators should continuously track stability of the system to quickly identify potential bottlenecks and resolve issues before they cause major disruptions in business critical operations and unexpected system failures. The Machine Reliability - OS test can help administrators in this regard!
This test auto-discovers the Windows systems in the target Windows Systems Group, and for each system, reports the machine stability in percentage. If the value of this measure is found to be low, it may indicate that the system is experiencing frequent errors, crashes, high CPU/RAM usage, overheating, or hardware issues. This way, the administrators are promptly alerted to potential bottlenecks and instability of the system.
Target of the test : A Windows Systems Group
Agent deploying the test : A remote agent
Outputs of the test : One set of results for every Windows system
Parameter | Description |
---|---|
Test Period |
How often should the test be executed. By default, this is set to 60 mins. |
Host |
The nick name of the Windows Systems Group component for which this test is to be configured. |
Port |
The port at which the specified Host listens. By default, this is NULL. |
Inside View Using |
To obtain the 'inside view' of performance of the systems - i.e., to measure the internal performance of the systems - this test uses a light-weight eG VM Agent software deployed on each of the systems. Accordingly, this parameter is by default set to eG VM Agent. |
Report By User |
This flag is set to No by default. This implies that the Windows systems in environments will always be identified using the system name. In other words, this test will, by default, report measures for every systemname. On the other hand, if you want this test to report the measures for every user on a system, then set this flag to Yes. In such a case, this test will report the measures for every username_on_systemname. |
Report Powered OS |
By default, this flag is set to Yes, then the 'inside view' tests will report measures for even those Windows systems that do not have any users logged in currently. The systems will be identified by their name and not by the username_on_systemname. On the other hand, if this flag is set to No, then this test will not report measures for those systems to which no users are logged in currently. |
Is Cloud VMs? |
This flag is set to Yes by default. The value of this flag cannot be changed. This implies that the cloud-based Windows systems in environments will always be identified using the login name of the user. In other words, in cloud environments, this test will, by default, report measures for every username_on_systemname. |
Stability Interval Minutes |
By default, this is set to 120 minutes indicating that this test will check the reliability monitoring tools for machine stability at the interval of 120 minutes. However, you can override this setting if required. |
DD Frequency |
Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD FREQUENCY. |
Detailed Diagnosis |
To make diagnosis more efficient and accurate, eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement |
Description |
Measurement Unit |
Interpretation |
---|---|---|---|
Machine stability index |
Indicates the current stability percentage of this system. |
Percent |
Ideally, the value of this measure is preferred to be high. If the value of this measure is low, administrators should investigate the problem conditions, consider updating the system, or repair/replace software/hardware of the system. |
The detailed diagnosis reported by the machine stability index measures reveals the time stamp at which the system updates ran on the system, ID, type and source of the updates, brief description about the updates and the messages stating the installation state of the updates.
Figure 1 : The detailed diagnosis reported by the Machine stability index measure