Db2 Replication Log Gap Test
Recovery Point Objective (RPO) is the maximum tolerable amount of data you can afford to lose in case of a potential DB2 UDB database server crash. Recovery Time Objective is a metric that helps to calculate how quickly you need to recover your Application, database and other services following a disaster (crash) in order to maintain business continuity.
In a high availability setup, the primary and the standby databases should always be in sync. If the primary database crashes before data is synced with the standby databases, then, a significant amount of data will be lost. Generally, administrators do not wish to lose data in case of failures/crashes. To avoid such data loss, it is essential for the administrators to periodically keep track on the amount of data that each standby database is lagging behind i.e., the amount of data that is still more required for the standby database and the primary database to be in sync. Similarly, if data and infrastructure are not recovered following a disaster within the time duration set for the Recovery Time Objective, then, businesses could suffer irreparable data loss and integrity. To avoid such unpleasant eventualities and to ensure that their business is back to normal in a very short duration, administrators may have to periodically keep track on the RPO and RTO of the target DB2 UDB database server.
For each database created on the target DB2 UDB database server, this test reports the amount of data that was lost when a switch over happened and the time lag noticed in the transport of logs between the primary and standby databases. Using this test, administrators can accurately estimate the time and amount of data required for the primary and standby databases to be in sync. This will help administrators fine-tune their high availability environment.
Target of the test : A DB2 database server
Agent deploying the test : An internal/remote agent
Outputs of the test : One set of results for each database created on the target database server instance being monitored
| Parameter | Description |
|---|---|
|
Test period |
How often should the test be executed |
|
Host |
The IP address of the DB2 server |
|
Port |
|
|
User |
Specify the name of the user who is authorized to access the target database server and collect the required metrics in this text box. You can create a separate user on the OS hosting the DB2 server for this purpose. The steps for the same are detailed in the Creating a Special User for Monitoring DB2 Server |
|
Password |
Enter the password of the specified USER in the PASSWORD text box. |
|
Confirm Password |
Confirm the Password by retyping it in the Confirm Password text box. |
|
Database |
Specify the name of the database on the monitored DB2 server to be used by this test. |
|
Include DB |
Specify a comma-separated list of databases that you wish to monitor in the Include DB text box. |
|
Exclude DB |
Specify a comma-separated list of databases that need to be excluded from monitoring in the Exclude DB text box. |
|
SSL |
If the target database server is SSL-enabled, then set the SSL flag to Yes. If not, then set the SSL flag to No. |
|
Trust Store File Name |
This parameter is applicable only if the target DB2 UDB database is SSL-enabled, if not, set this parameter to none. Specify the file name of the client-side SSL truststore that contains the server certificate required for establishing an SSL connection. The truststore is used to verify the identity of the server and enable a secure communication channel. By default, the truststore file should be placed in:<EG_INSTALL_DIR>/jre/lib/security/mytruststore.jks Here, mytruststore.jks is the Truststore file name. You may change this to any valid file name. By default, none is specified against this text box. |
|
Trust Store Password |
This parameter is applicable only if the target DB2 UDB database is SSL-enabled, if not, set this parameter to none. If a Truststore File name is provided, then, in this text box, provide the password that is used to obtain the associated certificate details from the Truststore File. By default, this parameter is set to none. |
|
Confirm Password |
Confirm the Password by retyping it in the Confirm Password text box. |
|
DD Frequency |
Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency. |
|
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
| Measurement | Description | Measurement Unit | Interpretation |
|---|---|---|---|
|
Log sequence apply lagging RPO |
Indicates the amount of data ( in terms of bytes) that was lost on this database when a switch over of database happened. |
Bytes |
If too many log gaps are detected in the sequence of the log files, then, it implies that the primary and the standby databases are not up-to-date. A consistent increase in the value of this measure affects the availability of data in the database. |
|
Log transport lagging durations RTO |
Indicates the time lag noticed in the transport of logs to this database with respect to the generation of logs in the primary database. |
Seconds |
Given enough resources, in particular network bandwidth, a DB2 UDB standby database can maintain pace with very high workloads. In cases where resources are constrained, the standby can begin to fall behind, resulting in a transport or apply lag. A transport lag is the amount of data, measured in time, that the standby has not received from the primary. A low value is desired for this measure. |