Failover Cluster Operational Log Test
Windows Failover Clustering allows multiple nodes to work together to provide high availability for critical applications and services. The Operational log of the Failover-Clustering subsystem records detailed runtime events related to cluster configuration changes, node state transitions, quorum status, resource load behavior, and cluster-level problem conditions. Any abnormal behavior such as repeated node transitions, resource failures, loss of connectivity, quorum changes or failover instability can directly affect workload availability, reduce service resilience, and lead to unexpected downtime for users.
Continuous monitoring of the cluster’s operational log helps administrators detect underlying issues early - long before a full failover or outage occurs. This test monitors the Failover Cluster Operational log and reports the count of event entries of different severity levels - informational messages, warnings, error messages, critical messages, and verbose messages. By tracking the volume and nature of these message types, this test enables administrators to quickly identify abnormal patterns, troubleshoot instability faster, and proactively investigate events that can impact business workloads hosted on the cluster.
This test is disabled by default. To enable the test, go to the enable / disable tests page using the menu sequence : Agents -> Tests -> Enable/Disable, pick the desired Component type, set Performance as the Test type, choose the test from the DISABLED TESTS list, and click on the << button to move the test to the ENABLED TESTS list. Finally, click the Update button.
Target of the test : Unix/Windows server
Agent deploying the test : An internal agent
Outputs of the test : One set of results for the filter configured
| Parameter | Description |
|---|---|
|
Test Period |
How often should the test be executed. |
|
Host |
The IP address of the host for which the test is being configured. |
|
Port |
Specify the port at which the target host listens to. |
|
Logtype |
Refers to the type of event logs to be monitored. |
|
Policy based filter |
Using this page, administrators can configure the event sources, event IDs, and event descriptions to be monitored by this test. In order to enable administrators to easily and accurately provide this specification, this page provides the following options:
For explicit, manual specification of the filter conditions, select the NO option against the POLICY BASED FILTER field. This is the default selection. To choose from the list of pre-configured filter policies, or to create a new filter policy and then associate the same with the test, select the YES option against the POLICY BASED FILTER field. |
|
Filter |
If the POLICY BASED FILTER flag is set to NO, then a FILTER text area will appear, wherein you will have to specify the event sources, event IDs, and event descriptions to be monitored. This specification should be of the following format: {Displayname}:{event_sources_to_be_included}:{event_sources_to_be_excluded}:{event_IDs_to_be_included}:{event_IDs_to_be_excluded}:{event_descriptions_to_be_included}:{event_descriptions_to_be_excluded}. For example, assume that the FILTER text area takes the value, OS_events:all:Browse,Print:all:none:all:none. Here:
By default, the filter parameter contains the value: all:all:none:all:none:all:none. Multiple filters are to be separated by semi-colons (;). Note: The event sources and event IDs specified here should be exactly the same as that which appears in the Event Viewer window. On the other hand, if the POLICY BASED FILTER flag is set to YES, then a FILTER list box will appear, displaying the filter policies that pre-exist in the eG Enterprise system. A filter policy typically comprises of a specific set of event sources, event IDs, and event descriptions to be monitored. This specification is built into the policy in the following format: {Policyname}:{event_sources_to_be_included}:{event_sources_to_be_excluded}:{event_IDs_to_be_included}:{event_IDs_to_be_excluded}:{event_descriptions_to_be_included}:{event_descriptions_to_be_excluded} To monitor a specific combination of event sources, event IDs, and event descriptions, you can choose the corresponding filter policy from the FILTER list box. Multiple filter policies can be so selected. Alternatively, you can modify any of the existing policies to suit your needs, or create a new filter policy. To facilitate this, a Click here link appears just above the test configuration section, once the YES option is chosen against POLICY BASED FILTER. Clicking on the Click here link leads you to a page where you can modify the existing policies or create a new one (refer to page 1). The changed policy or the new policy can then be associated with the test by selecting the policy name from the FILTER list box in this page. |
|
Usewmi |
The eG agent can either use WMI to extract event log statistics or directly parse the event logs using event log APIs. If the USEWMI flag is YES, then WMI is used. If not, the event log APIs are used. This option is provided because on some Windows systems (especially ones with service pack 3 or lower), the use of WMI access to event logs can cause the CPU usage of the WinMgmt process to shoot up. On such systems, set the USEWMI parameter value to NO. On the other hand, when monitoring systems that are operating on any other flavor of Windows (say, Windows 2012 or above), the USEWMI flag should always be set to ‘Yes’. |
|
DD Frequency |
Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency. |
|
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
| Measurement | Description | Measurement Unit | Interpretation |
|---|---|---|---|
|
Information messages |
Indicates the number of information events that were generated during the test's last execution. |
Number |
A change in value of this measure may indicate infrequent but successful operations performed by one or more applications. |
|
Warnings |
Indicates the number of warnings generated during the test's last execution. |
Number |
A high value of this measure indicates problems that may not have an immediate impact, but may cause future problems. |
|
Error messages |
Indicates the number of error events generated during the last execution of the test. |
Number |
A very low value (zero) is desired for this measure, as it indicates good health. An increasing trend or a high value indicates the existence of problems. |
|
Critical messages |
Indicates the number of critical events that were generated when the test was last executed. |
Number |
A critical event is one that a system/application cannot automatically recover from. A very low value (zero) indicates that the system/application is in a healthy state and is running smoothly without any potential problems. An increasing trend or high value indicates the existence of fatal/irrepairable problems. The detailed diagnosis of this measure describes all the critical events captured by the configured logtype during the last measurement period. |
|
Verbose messages |
Indicates the number of verbose events that were generated when the test was last executed. |
Number |
Verbose logging provides more details in the log entry, which will enable you to troubleshoot issues better. The detailed diagnosis of this measure describes all the verbose events that were captured by the configured logtype during the last measurement period. |