Db2 Cluster Status Test
A DB2 active-passive cluster is a high-availability setup where only one node, known as the active node, runs the DB2 database service at any given time, while the other node(s), referred to as passive or standby nodes, remain on standby to take over in case the active node fails. This configuration is typically implemented using Microsoft Failover Clustering, where cluster resources are automatically moved from the failed node to a standby node to minimize downtime. The passive node becomes active through a process called failover, ensuring continuity of service with minimal disruption. However, issues such as unexpected cluster switches, frequent failovers, cluster restarts, or extended downtime can occur due to hardware failures, configuration errors, or resource constraints. These issues can impact database availability, interrupt user connections, cause transaction failures, and reduce the overall reliability of the DB2 environment.
The Db2 Cluster Status Test monitors the cluster’s availability and stability by tracking whether the cluster is running, whether a failover or restart has occurred, and how long the cluster has been up and stable. This enables administrators to detect failover patterns, verify cluster uptime, and proactively address any instability before it affects end users.
Target of the test : A DB2 Active-Passive Cluster
Agent deploying the test : A remote agent
Outputs of the test : One set of results for the target DB2 Cluster being monitored.
|
Parameter |
Description |
|---|---|
|
Test Period |
How often should the test be executed |
|
Host |
Specify the HOST for which this test is to be configured |
|
Port |
Specify the port at which the target host is listening.. The default port is 50000. |
|
User |
To monitor the target cluster, the eG agent should be configured with the credentials of a monitoring user. An admin user with SECADM (Security Administrator) or SYSADM (System Administrator) privileges must execute the following steps to grant the required privileges to the monitoring user. CONNECT TO <dbname> USER <username> USING <password>; GRANT EXECUTE ON FUNCTION SYSPROC.MON_GET_INSTANCE TO USER <username>; GRANT EXECUTE ON FUNCTION SYSPROC.ENV_GET_REG_VARIABLES TO USER <username>; For example: CONNECT TO ProdDB USER db2admin USING p@ssword123; GRANT EXECUTE ON FUNCTION SYSPROC.MON_GET_INSTANCE TO USER eGuser; GRANT EXECUTE ON FUNCTION SYSPROC.ENV_GET_REG_VARIABLES TO USER eGuser; ProdDB: Name of the target database. db2admin: Admin user with SECADM or SYSADM privileges. p@ssword123: Password for the admin user. eGuser: Monitoring user to whom privileges are granted. Specify the name of the monitoring user (e.g., eGuser) in the USER text box. |
|
Password |
The eG agent should be configured with the credentials of the above mentioned user to monitor the target cluster. Specify the password of the specified user in the PASSWORD text box. |
|
Confirm Password |
Confirm the password by retyping it here. |
|
Database |
Specify the name of a DB2 database that the eG agent will connect to on the active node of the DB2 Active Passive Cluster in the DATABASE text box. The eG agent uses this to connect and pull cluster status metrics. |
|
DD Frequency |
Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency. |
|
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
| Measurement | Description | Measurement Unit | Interpretation | ||||||
|---|---|---|---|---|---|---|---|---|---|
|
Is cluster running? |
Indicates whether the cluster is currently up and running. |
|
The values reported by this measure and its numeric equivalents are mentioned in the table below:
Note: By default, this measure reports the value Yes or No to indicate whether/not the cluster is running. However, in the graph, this measure is indicated using the Numeric Values listed in the table above. Use the detailed diagnosis to find out the details of the active node. |
||||||
|
Has DB2 cluster switched? |
Indicates whether/not a failover has occurred since the last measurement period. |
|
The values reported by this measure and its numeric equivalents are mentioned in the table below:
Note: By default, this measure reports the value Yes or No to indicate whether failover occured or not. However, in the graph, this measure is indicated using the Numeric Values listed in the table above. Use the detailed diagnosis to find out the details of the failover such as Previous Node, Previous Node Start Time, Previous Node EndTime, Current Node, Current Node Start Time. |
||||||
|
Uptime of the DB2 cluster |
Indicates the total time the DB2 cluster has been up since its last reboot. |
Minutes |
A high value reflects prolonged stability. However, unusually long uptimes may warrant attention if regular maintenance or patching is expected. Thresholds can be configured to flag clusters that have not been restarted for extended periods. |
||||||
|
DB2 cluster uptime since last measure |
Indicates how long the DB2 cluster has been running since the last measurement period. |
Seconds |
A significantly lower value than expected may suggest a recent interruption or failover. If no reboot occurred during the last interval, the value equals the measurement period. If a reboot occurred, this value will be lower and represent the time since the reboot. |
||||||
|
Has the DB2 cluster been restarted? |
Indicates whether the cluster was restarted since the last measurement. |
|
The values reported by this measure and its numeric equivalents are mentioned in the table below:
Note: By default, this measure reports the value Yes or No to indicate whether/not the cluster was restarted. However, in the graph, this measure is indicated using the Numeric Values listed in the table above. |