Oracle RAC Commits Test
A wait class is a grouping of wait events, and every wait event belongs to a wait class. The main wait classes of the Oracle database server are:
- System I/O
- User I/O
The Commit wait class comprises of only one wait event - wait for redo log write confirmation after a commit (that is, ‘log file sync’ event). Commit is not complete until LGWR (log writer) writes log buffers including commit redo records to log files. In a nutshell, after posting LGWR to write, user or background processes waits for LGWR to signal back with 1 sec timeout. User process charges this wait time as ‘log file sync’ event.
This test reports the number of sessions to each instance, which are waiting for a redo log write confirmation after a commit. This way, the test sheds light on the open sessions to a instance, and the reason for the sessions remaining.
This test is disabled by default. To enable the test, go to the enable / disable tests page using the menu sequence : Agents -> Tests -> Enable/Disable, pick the Oracle RAC as desired Component type, set Performance as the Test type, choose the test from the disabled tests list, and click on the >> button to move the test to the ENABLED TESTS list. Finally, click the Update button.
Target of the test : Oracle RAC
Agent deploying the test : An internal agent
Outputs of the test : One set of results for every instance in the monitored Oracle RAC.
Configurable parameters for the test
- TEST PERIOD - How often should the test be executed.
- Host – The host for which the test is to be configured.
- Port - The port on which the server is listening.
- orasid - The variable name of the oracle instance.
service name - A ServiceName exists for the entire Oracle RAC system. When clients connect to an Oracle cluster using the ServiceName, then the cluster routes the request to any available database instance in the cluster. By default, the service name is set to none. In this case, the test connects to the cluster using the orasid and pulls out the metrics from that database instance which corresponds to that orasid. If a valid service name is specified instead, then, the test will connect to the cluster using that service name, and will be able to pull out metrics from any available database instance in the cluster.
To know the ServiceName of a cluster, execute the following query on any node in the target cluster:
select name, value from v$parameter where name =’service_names’
User – In order to monitor an Oracle database server, a special database user account has to be created in every Oracle database instance that requires monitoring. A Click here hyperlink is available in the test configuration page, using which a new oracle database user can be created. Alternatively, you can manually create the special database user. When doing so, ensure that this user is vested with the select_catalog_role and create session privileges.
The sample script we recommend for user creation (in Oracle database server versions before 12c) for eG monitoring is:
create user oraeg identified by oraeg ;
create role oratest;
grant create session to oratest;
grant select_catalog_role to oratest;
grant oratest to oraeg;
The sample script we recommend for user creation (in Oracle database server 12c) for eG monitoring is:
alter session set container=<Oracle_service_name>;
create user <user_name>identified by <user_password> container=current default tablespace <name_of_default_tablespace> temporary tablespace <name_of_temporary_tablespace>;
Grant create session to <user_name>;
Grant select_catalog_role to <user_name>;
The name of this user has to be specified here.
- Password – Password of the specified database user
- Confirm password – Confirm the password by retyping it here.
- ISPASSIVE – If the value chosen is yes, then the Oracle server under consideration is a passive server in an Oracle cluster. No alerts will be generated if the server is not running. Measures will be reported as “Not applicable’ by the agent if the server is not up.
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.
The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
- The eG manager license should allow the detailed diagnosis capability
- Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Number of Sessions:
Indicates the number of sessions to this instance that are waiting for redo log confirmation after commit.
A steady increase in the value of this measure is a cause of concern, as it indicates the following:
- Many sessions are forced to stay open owing to the commit wait events, and this may cause a session overload;
Many “log file sync” wait events are occuring, causing the performance of the Oracle RAC to deteriorate. The root cause for ‘log file sync’ waits are as follows:
- LGWR is unable to complete writes fast enough - this could be because, the disk I/O performance to log files is not good enough or, the LGWR is starving for CPU resources or, the LGWR paged out due to memory starvation issues or, due to file system or unix buffer cache limitations
- LGWR is unable to post the processes fast enough, due to excessive commits.
- IMU undo/redo threads
- LGWR is suffering from other database contention such as enqueue waits or latch contention
- Various bugs
Use the detailed diagnosis of this measure to view the details of the sessions affected by log file sync waits.