PostgreSQL Cluster Members Test

A PostgreSQL cluster consists of multiple nodes that collectively manage data replication, failover, and load balancing. Ensuring each node functions as expected is vital for maintaining data consistency, minimizing downtime, and handling large volumes of concurrent requests in enterprise environments. This test monitors the state and configuration of each node in the cluster and also tracks the count of nodes in different roles. It checks whether replication is enabled, whether a node is a master or standby, if its role has changed, and whether it is in read-write or read-only mode. These status checks help identify misconfigurations, failover events, or changes in node responsibilities. The node count measures provide a complete view of the cluster composition, including the number of master, slave, read-write, read-only, and unavailable nodes. These insights are critical for validating failover mechanisms, confirming role assignments, and ensuring optimal role distribution. By proactively detecting inconsistencies or role mismatches, this test helps administrators maintain cluster health, scalability, and performance.

Target of the test : A PostgreSQL Cluster

Agent deploying the test: An external agent

Outputs of the test :One set of results for each node on the target PostgreSQL cluster being monitored.

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed

Host

The IP address of the host for which this test is to be configured.

Port

The port on which the server is listening. The default port is 5432.

Username

To monitor a PostgreSQL cluster, you must manually create a dedicated database user account on each PostgreSQL instance that you wish to monitor. To know how to create such a user based on where the target PostgreSQL cluster is installed (whether on-premises or hosted on Cloud), refer to How does eG Enterprise Monitor PostgreSQL Server?.

Password

The password associated with the above Username (can be ‘NULL’). Here, ‘NULL’ means that the user does not have any password.

Confirm Password

Confirm the Password (if any) by retyping it here.

DB Name

The name of the target database to connect to. The default is “postgres”.

SSL

This indicates that the eG agent will communicate with the PostgreSQL cluster via HTTPS or not. By default, this flag is set to No, as the target PostGreSQL database is not SSL-enabled by default. If the target cluster is SSL-enabled, then set this flag to Yes.

Verify CA

If the eG agent is required to establish an encrypted connection with the target PostgreSQL cluster by authenticating the server's identity through verifying the server CA certificate, set Verify CA flag to Yes. By default, this flag is set to No.

CA Cert File

This parameter is applicable only if the target PostgreSQL cluster is SSL-enabled.The certificate file is a public-key certificate following the x.509 standard. It contains information about the identity of the server, such as its name, geolocation, and public key. Each nodes of the target cluster can have individual certificate files or a single certificate can be used to access all the nodes in the cluster. Essentially, it’s a certificate that the server serves to the connecting users to prove that they are what they claim to be. Therefore, specify the full path to the server root certificate or certificate file that is signed by the CA in .crt file format for all/each node in the CA Cert File text box. For example, the location of this file may be: C:\app\eGurkha\JRE\lib\security\PostGreQL-test-ca.crt. By default, this parameter is set to none.

This parameter specification differs according to the type of cluster and configuration:

If the certificate file is available for each node of the PostgreSQL Cluster then, provide a comma-separated list of full path to the certificates in CA Cert File text box:

For example:C:\app\eGurkha\JRE\lib\security\postgresql-test-ca.crt,C:\app\eGurkha\JRE\lib\security\postgresql-test-ca2.crt,C:\app\eGurkha\JRE\lib\security\postgresql-test-ca3.crt

Specify the full path to the certificate file of the target PostgreSQL Cluster if a single certificate is used to access all nodes.

For example: C:\app\eGurkha\JRE\lib\security\postgresql-test-ca.crt

Client Cert File

This parameter is applicable only if the target PostgreSQL Cluster is SSL-enabled. In order to collect metrics from the target PostgreSQL Cluster, the eG agent requires client certificate in .p12 format. Hence, specify the full path to the Client certificate file in .p12 format in the Client Cert File text box. For example, the location of this file may be: C:\app\eGurkha\JRE\lib\security\test-client.p12.

Client Key File

A client key file refers to a file containing the private key that corresponds to the public key used by a client. Provide full path of the file containing client key.

Include Available Nodes

In the Include Available Nodes text box, provide a comma-separated list of all the available nodes to be included for monitoring. This way, the test monitor and collect metrics from all the available nodes in the cluster. By default, this parameter is set to none. The format of this configuration is: HOSTNAME:PORT, for example, 172.16.8.136:3306,172.16.8.139:3306

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Is master?

Indicates whether this node is currently serving as the master node.

 

This measure is not reported for the Summary descriptor.

The values reported by this measure and their numeric equivalents are available in the table below:

Measure Value Numeric Value
No 0
Yes 1

Note:

This measure reports the Measure Values listed in the table above to indicate whether/not the node is a master node. However, in the graph, this measure is indicated using the Numeric Values listed in the table above.

Is hot standby?

Indicates whether or not this node is a hot standby.

 

This measure is not reported for the Summary descriptor.

A hot standby node is ready to take over if the master fails. This measure reports Yes, if this node is a standby node.

The values reported by this measure and their numeric equivalents are available in the table below:

Measure Value Numeric Value
No 0
Yes 1

Note:

This measure reports the Measure Values listed in the table above to indicate whether/not the node is a standby node. However, in the graph, this measure is indicated using the Numeric Values listed in the table above.

Is replication enabled?

Indicates whether/not replication is currently enabled on this node.

 

This measure is not reported for the Summary descriptor.

The values reported by this measure and their numeric equivalents are available in the table below:

Measure Value Numeric Value
No 0
Yes 1

Note:

This measure reports the Measure Values listed in the table above to indicate whether/not replication is enabled on the node. However, in the graph, this measure is indicated using the Numeric Values listed in the table above.

Has the role of this node changed?

Indicates whether or not this node has switched roles (e.g., master to standby or vice versa).

 

This measure is not reported for the Summary descriptor.

The values reported by this measure and their numeric equivalents are available in the table below:

Measure Value Numeric Value
No 0
Yes 1

Note:

This measure reports the Measure Values listed in the table above to indicate whether/not role of the node has switched.. However, in the graph, this measure is indicated using the Numeric Values listed in the table above.

Use the detailed diagnosis of this measure to find out the Host, Current state, Previou state, Port, and Access mode.

Is this node in read-write mode?

Indicates whether or not this node is currently operating in read-write mode.

 

This measure is not reported for the Summary descriptor.

The values reported by this measure and their numeric equivalents are available in the table below:

Measure Value Numeric Value
No 0
Yes 1

Note:

This measure reports the Measure Values listed in the table above to indicate whether/not the node is in read-write mode. However, in the graph, this measure is indicated using the Numeric Values listed in the table above.

Total nodes

Indicates the total number of nodes configured in the cluster.

Number

This measure is reported only for the Summary descriptor.

Sudden drop in total nodes may indicate node failure or connectivity issues.

Master nodes

Indicates the number of nodes currently serving as master in the cluster.

Number

This measure is reported only for the Summary descriptor.

Use the detailed diagnosis of this measure to find out the Node IP, Current state, Previou state, Port, and Access mode.

Slave nodes

Indicates the number of nodes currently serving as slave nodes in the cluster.

Number

This measure is reported only for the Summary descriptor.

Use the detailed diagnosis of this measure to find out the Node IP, Current state, Previou state, Port, and Access mode.

Unavailable nodes

Indicates the number of nodes that are currently unreachable or down in the cluster.

Number

This measure is reported only for the Summary descriptor.

A high value for this measure reflect potential hardware, network, or software failures.

Use the detailed diagnosis of this measure to find out the Node IP, Current state, Previou state, Port, and Access mode.

Read-write nodes

Indicates the number of nodes that are accepting both read and write operations in the cluster.

Number

This measure is reported only for the Summary descriptor.

A drop in this measure value may cause transactional delays or write contention.

Read-only nodes

Indicates the number of nodes restricted to read-only operations in the cluster.

Number

This measure is reported only for the Summary descriptor.

A higher count helps with query load balancing. Sudden drops affect reporting performance.

Health state

Indicates the current health state of this cluster.

 

This measure is reported only for the Summary descriptor.

The values reported by this measure and their numeric equivalents are available in the table below:

Measure Value Numeric Value
Good 2
Degraded 1
Critical 0

Note:

By default, this measure reports current health of the cluster. The graph of this measure however, is represented using the numeric equivalents only - 0 to 2.

A status other than healthy could impact cluster performance or availability.