ESE Database Engine Instances Test

The database engine used in Exchange Server is the Extensible Storage Engine (ESE). The Extensible Storage Engine (ESE) is an advanced indexed and sequential access method (ISAM) storage technology. ESE enables applications to store and retrieve data from tables using indexed or sequential cursor navigation.

The ESE database looks like a single file to Windows. Internally however, the database is a collection of 32KB pages arranged in a balanced B+ tree structure. In this database structure, all data is stored in leaves. At the root-level there are only pointers to internal pages. The internal pages contain pointers to the leaves, and the leaves are linked. ESE databases are organized into groups called instances. 

Transactions to the ESE databases are processed (i.e., created) in server memory – in particular, the ESE cache, the log buffers, and the version store. The ESE cache helps reduce I/O operations. The version store is tied to the ESE cache and keeps track of transactions to the database while they are created. When transactions are created, they are stored in a particular log buffer. The log buffer represents a particular log file that belongs to a specific ESE database. Once the log buffer fills up with transactions, the entire log buffer is written to the log file, the log file is closed, and a new one is created. If a transaction fails for any reason, then the ESE database, once mounted, reads a checkpoint file to identify which log file is the checkpoint log, replays that log file, writes all completed transactions in that log file that have not already been written to the database files, and reverses any incomplete transactiions.

Typically, the performance of an ESE database instance can degrade due to a lot of factors. Ineffective cache usage, poor I/O processing, high transaction load, and improper log buffer sizing are to name a few. Since a user’s experience with his/her Exchange mailbox relies upon the error-free functioning of the ESE database instances in the backend, an Exchange administrator should keep an eye on the performance of every database instance, accurately identify those instances and databases that are performing poorly, and rapidly isolate the reasons for the same, so that the road-blocks can be removed quickly and the desired performance levels can be ensured. This is where the ESE Database Engine Instances test helps. This test auto-discovers the ESE database instances, and reports the following for every instance:

  • How well each database instance is using its cache;
  • How quickly an ESE database instance processes I/O requests;
  • How swiftly an ESE database instance recovers from failures;
  • Whether/not the log buffers are adequately sized;

This way, this test accurately pinpoints that database instance, the performance of which is bottlenecked, leads administrators to where the bottleneck lies, and thus hastens remedial action.   

Target of the test : A Microsoft Exchange 2013/2016 server

 Agent deploying the test : An internal agent

Outputs of the test : One set of results for each ESE database instance on the Exchange 2013/2016 server being monitored

Configurable parameters for the test
  1. Test period - How often should the test be executed
  2. Host - The host for which the test is to be configured.
  3. port – The port at which the host listens.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Database cache hit ratio:

Indicates the percentage of database requests to this database instance that were fulfilled by the database cache without incurring disk input/output activity.

Percent

A high value is indicative of optimal cache usage, which translates into minimal or no direct disk accesses and high database performance.

A low value on the other hand is indicative of poor cache usage. This in turn implies that direct disk accesses are high, thus escalating processing overheads and degrading database performance. You may want to determine the reason for the poor cache usage and fix it to make sure that database performance does not suffer too long. One of the common reasons for this anomaly is the insufficient cache size. In the absence of adequate memory, the Exchange server will not be able to provide adequate memory for caching objects, resulting in a high percentage of cache misses.

Database cache size:

Indicates the amount of system memory, in megabytes (MB), used by the database cache manager associated with this database instance to hold commonly used information from the database files to prevent file operations.

MB

Typically, the Exchange server reserves 25% of the total RAM for caching purposes. If the value of this measure grows closer to this allocated size, it could indicate that the cache would soon run out of memory. If this happens, then the cache will not be able to accommodate more frequently-referenced items. In the long run, this may result in a high rate of cache misses, which in turn can cause direct disk I/O to increase. To avoid this, consider increasing the cache size. 

Database defragmentation task:

Indicates the number of database defragmentation tasks currently pending for this database instance.

Number

If this measure reports a very high value, it could denote that many defragmentation tasks are yet to be executed on the database; this in turn implies that the database is still largely fragmented. This will delay data retrieval from the database. Compare the value of this measure across database instance to know which instance is the most fragmented. 

Average database read latency:

Indicates the amount of time this database instance took to perform a read operation.

Secs

A low value is desired for this measure. A consistent increase in this value is indicative of a bottleneck when reading from the database. Compare the value of this measure across database instances to know which instance is the slowest when reading and why.

Average database write latency:

Indicates the amount of time this database instance took to perform a write operation.

Secs

A low value is desired for this measure. A consistent increase in this value is indicative of a bottleneck when writing to this database instance. Compare the value of this measure across instances to know which instance is the slowest when writing and why.

Average log read latency:

Indicates the average time this database instance takes to read from a log file.

Secs

A high value is a cause for concern as it impacts how quickly transaction recovery is performed in the event of a transaction failure. If the transaction log file is read slowly, then it will delay the replay of the unwritten transactions and the writing of the transactions to the database file. This in turn will significantly slow down transaction recovery.

Average log write latency:

Indicates the average time this database instance takes to write data from a log file.

Secs

A high value is a cause for concern as it impacts how quickly transaction recovery is performed in the event of a transaction failure. If the contents of the transaction log file are not written to the database file quickly, transaction recovery will be delayed.

Database read operations rate:

Indicates the rate at which this database instance completes read operations.

Reads/Sec

A consistent drop in this value can indicate a reduction in the number of read requests. It can also indicate a reading bottleneck. In the case of the latter, compare the value of this measure across database instances to know which instance is the slowest when processing read requests.

Database write operations rate:

Indicates the rate at which this database instance completes write operations.

Writes/Sec

A consistent drop in this value can indicate a reduction in the number of write requests. It can also indicate a writing bottleneck. In the case of the latter, compare the value of this measure across database instances to know which instance is the slowest when processing write requests.

Log read operations rate:

Indicates the rate at which this database instance completed log file read operations.

Reads/Sec

A consistent drop in the value of this measure is indicative of a slowdown when reading from a log file.

Log write operations rate:

Indicates the rate at which this database instance completed log write operations.

Writes/Sec

 

Log record waits:

Indicates the number of log records that cannot be added to the log buffers of this database instance because the log buffers are full.

Records/Sec

This measure should be as close to zero as possible.

If it is not, it might indicate that the size of the log buffer might be a bottleneck. Increasing the memory may solve this problem.

Log thread waits:

Indicates the number of threads waiting in this database instance for their data to be written to the log buffer so that the update of the database can be completed.

Number

This measure should be as low as possible.

A high value for this measure may indicate that the log buffer might be a bottleneck. Increasing the memory may solve this problem.

Transaction log files:

Indicates the amount of work, expressed in terms of the number of log files, that needs to be redone or undone to the database files of this database instance if the process fails.

 

Number

The value of this measure should be below 500 at all times. For a healthy server, this measure should report a value between 20 and 30 for each database instance. If this measure increases continually, this indicates either a long-running transaction, (which will impact the version store), or a bottleneck involving the database disks. In this case, in the event of a failure, transaction recovery will take a considerably long time. Under such circumstances, you may want to consider decreasing the checkpoint depth, so that transactions are written to the database quickly.

Session in use:

Indicates the number of sessions to this database instance that are currently open for use by client threads.

Number

 

Session used:

Indicates the percentage of sessions to this database instance that are currently open for use by client threads.

Percent

A high value is desired for this measure. A very low value could indicate that too many open sessions are idle.

Database tables cache hit ratio:

Indicates the percentage of tables in this database instance that were opened using the cached schema information.

Percent

A significantly low value indicates that the Exchange server is not having enough free memory. Increasing the memory may solve this problem.

Database tables opened:

Indicates the number of tables in this database instance that were opened per second.

Opens/sec

 

Version buckets allocated:

Indicates the total number of version buckets allocated to this database instance.

Number

The "version buckets" are the message queue database transactions that are kept in memory. All changes that are made to the message queue database stay in memory until those changes can be committed to transaction log files.

Factors that can increase the version buckets may be virus issues, integrity of the message queue database, or hard drive performance.

The default maximum version bucket count is 16,384. If version buckets reach 70% of maximum, the server is at risk of running out of the version store.