PostgreSQL Replication RTO Test

Recovery Time Objective is a metric that helps to calculate how quickly you need to recover your Application, database and other services following a disaster (crash) in order to maintain business continuity. If data and infrastructure are not recovered following a disaster within the time duration set for the Recovery Time Objective, then, businesses could suffer irreparable data loss and integrity. In a high availability setup, if the replica server is not restored within a shorter duration, the data in the master may not be in sync with the replica server. To avoid such unpleasant eventualities and to ensure that their business is back to normal in a very short duration, administrators may have to periodically keep track on the RTO of the target PostgreSQL server. The PostgreSQL Replication RTO test helps administrators in this regard!

In a high availability setup where the target database server is a replica server, this test reports the Recovery Time Objective i.e., replication lag duration (the time lag between the master and the replica server). Using this test, administrators can gain a fair knowledge on how long they can manage their business post a disaster/crash of the replica server.

Note:

This test will report metrics only if the monitored target PostgreSQL database server is a replica server in a high availability setup.

Target of the test : PostgreSQL server

Agent deploying the test: An internal/remote agent

Outputs of the test : One set of results for the target PostgreSQL server

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed

Host

The IP address of the host for which this test is to be configured.

Port

The port on which the server is listening. The default port is 5432.

Username

In order to monitor a PostgreSQL server, you need to manually create a special database user account in every PostgreSQL database instance that requires monitoring. When doing so, ensure that this user is vested with the superuser privileges. The sample script we recommend for user creation for eG monitoring is:

CREATE ROLE eguser LOGIN

ENCRYPTED PASSWORD {‘eguser password’}

SUPERUSER NOINHERIT NOCREATEDB NOCREATEROLE;

Specify the name of this user in the Username text box.

Password

The password associated with the above Username (can be ‘NULL’). Here, ‘NULL’ means that the user does not have any password.

Confirm Password

Confirm the Password (if any) by retyping it here.

DB Name

The name of the database to connect to. The default is “postgres”.

SSL

If the PostgreSQL server being monitored is an SSL-enabled server, then set the SSL flag to Yes. If not, then set the SSL flag to No.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Replication lag duration

Indicates the Recovery Time Objective i.e., the time lag noticed between the master and the target database server which is the replica server.

Seconds

The detailed diagnosis of this measure lists the Current time and the Last transaction replayed time.