PostgreSQL Background I/O Test

Checkpoints are points in the sequence of transactions at which it is guaranteed that the heap and index data files have been updated with all information written before the checkpoint. At checkpoint time, all dirty data pages are flushed to disk and a special checkpoint record is written to the log file. In the event of a crash, the crash recovery procedure looks at the latest checkpoint record to determine the point in the log (known as the redo record) from which it should start the REDO operation. Any changes made to data files before that point are guaranteed to be already on disk.

The checkpoint requirement of flushing all dirty data pages to disk can cause a significant I/O load. To minimize this I/O, there is a separate server process called the background writer in PostgreSQL, whose sole function is to issue writes of “dirty” shared buffers. The background writer will continuously trickle out dirty pages to disk, so that only a few pages will need to be forced out when checkpoint time arrives, instead of the storm of dirty-buffer writes that formerly occurred at each checkpoint. However, there is a net overall increase in I/O load, because where a repeatedly-dirtied page might before have been written only once per checkpoint interval, the background writer might write it several times in the same interval.

You hence need to continuously track how often the background writer performs checkpointing and how much I/O load it imposes on the server, so that you can proactively detect potential overload conditions, appropriately fine-tune the checkpointing activity performed by the background writer to minimize the I/O, and thus prevent the performance degradation that may otherwise occur on the server. The PostgreSQL Background I/O test helps achieve all of the above. In the process, the test also reports useful statistics related to shared buffers.

Target of the test : PostgreSQL server

Agent deploying the test: An internal/remote agent

Outputs of the test : One set of results for the target PostgreSQL server

Configurable parameters for the test
Parameter	Description
Test period	How often should the test be executed
Host	The IP address of the host for which this test is to be configured.
Port	The port on which the server is listening. The default port is 5432.
Username	In order to monitor a PostgreSQL server, you need to manually create a special database user account in every PostgreSQL database instance that requires monitoring. To know how to create such a user based on where the target PostgreSQL server is installed (whether on-premises or hosted on Cloud), refer to How does eG Enterprise Monitor PostgreSQL Server?.
Password	The password associated with the above Username (can be ‘NULL’). Here, ‘NULL’ means that the user does not have any password.
Confirm Password	Confirm the Password (if any) by retyping it here.
DB Name	The name of the database to connect to. The default is “postgres”.
SSL	If the PostgreSQL server being monitored is an SSL-enabled server, then set the SSL flag to Yes. If not, then set the SSL flag to No.
Verify CA	If the eG agent is required to establish an encrypted connection with the target PostGreSQL Database server by authenticating the server's identity through verifying the server CA certificate, set Verify CA flag to Yes. By default, this flag is set to No.
CA Cert File	This parameter is applicable only if the target PostGreSQL Database is SSL-enabled.The certificate file is a public-key certificate following the x.509 standard. It contains information about the identity of the server, such as its name, geolocation, and public key. Each nodes of the target cluster can have individual certificate files or a single certificate can be used to access all the nodes in the cluster. Essentially, it’s a certificate that the server serves to the connecting users to prove that they are what they claim to be. Therefore, specify the full path to the server root certificate or certificate file that is signed by the CA in .crt file format for all/each node in the CA Cert File text box. For example, the location of this file may be: C:\app\eGurkha\JRE\lib\security\PostGreQL-test-ca.crt. By default, this parameter is set to none. This parameter specification differs according to the type of cluster and configuration: If the certificate file is available for each node of the PostGreSQL Cluster then, provide a comma-seperated list of full path to the certificates in CA Cert File text box: For example:C:\app\eGurkha\JRE\lib\security\postgresql-test-ca.crt,C:\app\eGurkha\JRE\lib\security\postgresql-test-ca2.crt,C:\app\eGurkha\JRE\lib\security\postgresql-test-ca3.crt Specify the full path to the certificate file of the target PostGreSQL Database if a single certificate is used to access all nodes. For example: C:\app\eGurkha\JRE\lib\security\postgresql-test-ca.crt
Client Cert File	This parameter is applicable only if the target PostGreSQL Database is SSL-enabled. In order to collect metrics from the target MongoDB cluster, the eG agent requires client certificate in .p12 format. Hence, specify the full path to the Client certificate file in .p12 format in the Client Cert File text box. For example, the location of this file may be: C:\app\eGurkha\JRE\lib\security\test-client.p12.
Client Key File	A client key file refers to a file containing the private key that corresponds to the public key used by a client. Provide full path of the file containing client key.

Measurements made by the test
Measurement	Description	Measurement Unit	Interpretation
Checkpoint requests:	Indicates the number of checkpoint requests received by the server during the last measurement period.	Number	A checkpoint request is generated every checkpoint_segments log segments, or every checkpoint_timeout seconds, whichever comes first. While checkpoint_segments denotes the maximum number of log file segments between automatic WAL checkpoints, the checkpoint_timeout indicates the maximum time between WAL checkpoints. The default settings are 3 segments and 300 seconds (5 minutes), respectively. Reducing checkpoint_segments and/or checkpoint_timeout causes checkpoints to occur more often. This allows faster after-crash recovery (since less work will need to be redone). However, one must balance this against the increased cost of flushing dirty data pages more often. If full_page_writes is set (as is the default), there is another factor to consider. To ensure data page consistency, the first modification of a data page after each checkpoint results in logging the entire page content. In that case, a smaller checkpoint interval increases the volume of output to the WAL log, partially negating the goal of using a smaller interval, and in any case causing more disk I/O. Checkpoints are fairly expensive, first because they require writing out all currently dirty buffers, and second because they result in extra subsequent WAL traffic as discussed above. It is therefore wise to set the checkpointing parameters high enough that checkpoints don’t happen too often.
Check point time outs:	Indicates the number of scheduled checkpoints that did not occur even after the checkpoint_timeout setting was violated during the last measurement period.	Number	Ideally, the value of this measure should be low. A consistent increase in this value is a cause of concern, as it indicates that checkpoints are not occurring in the desired frequency. This in turn will significantly slowdown after-crash recovery, as more work will have to be redone.
Buffers freed:	Indicates the total number of buffers that were released for re-use from the buffer cache during the last measurement period, when the checkpoint_segments setting was violated; this typically causes the background writer to automatically write dirty buffers to the disk.	Number	A high value is desired for this measure. A low value could indicate a checkpointing bottleneck, owing to which the background writer is unable to write updated index and heap files to the disk at an optimal rate. In such cases, the buffer cache may not have adequate free buffers to service subsequent write requests. This is a cause for concern in write-intensive database environments.
Buffers cleaned:	Indicates the number of buffer that were written to the disk during the last measurement period in anticipation of being allocated in the future.	Number	The background writer typically stalls some other process for a moment while it writes out dirty data. To keep that from happening as often, the background writer process scans forward looking for blocks that might be allocated in the near future that are dirty and that have a low usage count (alternatively called the Least Recently Used or LRUblocks). When it finds them, it writes some of them out pre-emptively, based on historical allocation rates.
Max written:	Indicates the maximum number of dirty buffers that can be written into the buffer cache during the last measurement period.	Number	If this measure indicates a high value it indicates that adequate buffers are not free in the cache. To optimize the value of this measure, you can increase the value of the bgwriter_lru_maxpages parameter.
Buffers freed by connections:	Indicates the number of buffers that were released from the cache for re-use during the last measurement period, when users wrote data directly to the disk.	Number	A high value is desired for this measure, as it reduces the need for an I/O-intensive operation such as ‘checkpointing’.
Buffers allocated:	Indicates the total number of calls to allocate a new buffer for a page (whether or not it was already cached) during the last measurement period.	Number