SQL AlwaysOn Recovery Point Test
Recovery Point Objective (RPO) is defined as the amount of acceptable data loss or the point in time up to which the data can be recovered. Whenever a failover is detected, the administrators may want the secondary database to take over quickly from the primary database. If large quantity of data is not transferred to the secondary database from the primary database, then the users have to wait for a longer period to access the databases during failover. Often there would be a minimal data loss when a failover is in progress. This data loss may be due to the time lag that occurs during synchronization that happens between the primary and secondary databases. If the time taken is too long, it indicates that the synchronization process between the primary and secondary databases is taking too long to complete. This in turn will affect the users who will be compelled to wait for a prolonged time period to access the databases. To avoid such scenarios, it is essential to monitor the recovery point objective of the SQL server. The SQL AlwaysOn Recovery Point test helps administrators in this regard.
This test reports the amount of logs that had not been synchronized with the secondary database and the amount of hardened logs that are yet to be applied to the secondary database. In addition, this test helps administrators to analyze the time duration for which the log records were waiting in the redo queue before being rolled to the secondary database. This way, administrators may be proactively alerted to fine tune the time taken to roll the log to the secondary database so that the synchronization process completes in a quick and hassle free manner.
This test is disabled by default. To enable the test, go to the enable / disable tests page using the menu sequence : Agents -> Tests -> Enable/Disable, pick Microsoft SQL as the desired Component type, set Performance as the Test type, choose the test from the disabled tests list, and click on the < button to move the test to the ENABLED TESTS list. Finally, click the Update button.
Target of the test : A Microsoft SQL server
Agent deploying the test : An internal agent
Outputs of the test : One set of results for each database on the Microsoft SQL server monitored
|
Measurement | Description | Measurement Unit | Interpretation |
---|---|---|---|
Log bytes flushed: |
Indicates the rate at which log bytes were flushed to the secondary database to complete synchronization since the last recovery point. |
Flushes/sec |
If the value of this measure is consistently increasing, then it indicates that the potential data loss can increase indefinitely. |
Log send queue size: |
Indicates the amount of log that had not been sent to the secondary database from this database to complete synchronization. |
KB |
Ideally, the value of this measure should be zero. A high value for this measure indicates that this much of data is unavailable in the secondary database during failover which directly implies that the customers would experience this data loss equal to this measure. |
Redo queue size: |
Indicates the total number of kilobytes of hardened log that currently remain to be applied to the secondary database to roll it forward. |
KB |
A low value is desired for this measure. |
Redo rate: |
Indicates the rate at which log records were rolled forward on the secondary database from this database. |
KB/sec |
|
Pending logs recovery time: |
Indicates the time duration for which the log records were waiting in the redo queue until being rolled forward to the secondary database. |
Secs |
Ideally, the value of this measure should be low. |
Pending logs flushed time: |
Indicates the time duration for which the logs were in the send queue until being flushed completely to the secondary database. |
Secs |
|