PostgreSQL Background I/O Test

Checkpoints are points in the sequence of transactions at which it is guaranteed that the heap and index data files have been updated with all information written before the checkpoint. At checkpoint time, all dirty data pages are flushed to disk and a special checkpoint record is written to the log file. In the event of a crash, the crash recovery procedure looks at the latest checkpoint record to determine the point in the log (known as the redo record) from which it should start the REDO operation. Any changes made to data files before that point are guaranteed to be already on disk.

The checkpoint requirement of flushing all dirty data pages to disk can cause a significant I/O load. To minimize this I/O, there is a separate server process called the background writer in PostgreSQL, whose sole function is to issue writes of “dirty” shared buffers. The background writer will continuously trickle out dirty pages to disk, so that only a few pages will need to be forced out when checkpoint time arrives, instead of the storm of dirty-buffer writes that formerly occurred at each checkpoint. However, there is a net overall increase in I/O load, because where a repeatedly-dirtied page might before have been written only once per checkpoint interval, the background writer might write it several times in the same interval.

You hence need to continuously track how often the background writer performs checkpointing and how much I/O load it imposes on the server, so that you can proactively detect potential overload conditions, appropriately fine-tune the checkpointing activity performed by the background writer to minimize the I/O, and thus prevent the performance degradation that may otherwise occur on the server. The PostgreSQL Background I/O test helps achieve all of the above. In the process, the test also reports useful statistics related to shared buffers.

Target of the test : PostgreSQL server

Agent deploying the test: An internal/remote agent

Outputs of the test : One set of results for the target PostgreSQL server

Configurable parameters for the test
  1. TEST PERIOD – How often should the test be executed.
  2. Host – The IP address of the server.
  3. Port – The port on which the server is listening. The default port is 5432.
  4. User – In order to monitor a PostgreSQL server, you need to manually create a special database user account in every PostgreSQL database instance that requires monitoring. When doing so, ensure that this user is vested with the superuser privileges. The sample script we recommend for user creation for eG monitoring is:

    CREATE ROLE eguser LOGIN

    ENCRYPTED PASSWORD {‘eguser password’}

    SUPERUSER NOINHERIT NOCREATEDB NOCREATEROLE;

    The name of this user has to be specified in the USERNAME text box.

  5. Password- The password associated with the above user name (can be ‘NULL’). Here, ‘NULL’ means that the user does not have any password.
  6. Confirm password – Confirm the password (if any) by retyping it here.
  7. dbname - The name of the database to connect to. The default is “postgres”.
  8. ssl - The name of this user has to be specified in the USERNAME text box.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Checkpoint requests:

Indicates the number of checkpoint requests received by the server during the last measurement period.

Number

A checkpoint request is generated every checkpoint_segments log segments, or every checkpoint_timeout seconds, whichever comes first. While checkpoint_segments denotes the maximum number of log file segments between automatic WAL checkpoints, the checkpoint_timeout indicates the maximum time between WAL checkpoints. The default settings are 3 segments and 300 seconds (5 minutes), respectively. Reducing checkpoint_segments and/or checkpoint_timeout causes checkpoints to occur more often. This allows faster after-crash recovery (since less work will need to be redone). However, one must balance this against the increased cost of flushing dirty data pages more often. If full_page_writes is set (as is the default), there is another factor to consider.

To ensure data page consistency, the first modification of a data page after each checkpoint results in logging the entire page content. In that case, a smaller checkpoint interval increases the volume of output to the WAL log, partially negating the goal of using a smaller interval, and in any case causing more disk I/O. Checkpoints are fairly expensive, first because they require writing out all currently dirty buffers, and second because they result in extra subsequent WAL traffic as discussed above. It is therefore wise to set the checkpointing parameters high enough that checkpoints don’t happen too often.

Check point time outs:

Indicates the number of scheduled checkpoints that did not occur even after the checkpoint_timeout setting was violated during the last measurement period.

Number

Ideally, the value of this measure should be low. A consistent increase in this value is a cause of concern, as it indicates that checkpoints are not occurring in the desired frequency. This in turn will significantly slowdown after-crash recovery, as more work will have to be redone.

Buffers freed:

Indicates the total number of buffers that were released for re-use from the buffer cache during the last measurement period, when the checkpoint_segments setting was violated; this  typically causes the background writer to automatically write dirty buffers to the disk.

Number

A high value is desired for this measure. A low value could indicate a checkpointing bottleneck, owing to which the background writer is unable to write updated index and heap files to the disk at an optimal rate. In such cases, the buffer cache may not have adequate free buffers to service subsequent write requests. This is a cause for concern in write-intensive database environments. 

Buffers cleaned:

Indicates the number of buffer that were written to the disk during the last measurement period in anticipation of being allocated in the future.

Number

The background writer typically stalls some other process for a moment while it writes out dirty data. To keep that from happening as often, the background writer process scans forward looking for blocks that might be allocated in the near future that are dirty and that have a low usage count (alternatively called the Least Recently Used or LRUblocks). When it finds them, it writes some of them out pre-emptively, based on historical allocation rates.

Max written:

Indicates the maximum number of dirty buffers that can be written into the buffer cache during the last measurement period.

Number

If this measure indicates a high value it indicates that adequate buffers are not free in the cache. To optimize the value of this measure, you can increase the value of the bgwriter_lru_maxpages parameter.

Buffers freed by connections:

Indicates the number of buffers that were released from the cache for re-use during the last measurement period, when users wrote data directly to the disk.

Number

A high value is desired for this measure, as it reduces the need for an I/O-intensive operation such as ‘checkpointing’. 

Buffers allocated:

Indicates the total number of calls to allocate a new buffer for a page (whether or not it was already cached) during the last measurement period.

Number