Oracle RAC Cluster Interconnects Test

A cluster database comprises two or more nodes that are linked by an interconnect. The interconnect serves as the communication path between the nodes in the cluster database. Each Oracle instance uses the interconnect for the messaging that synchronizes each instance’s use of shared resources. Oracle also uses the interconnect to transmit data blocks that the multiple instances share.

The non-availability of the interconnect on any cluster node can impair that node’s communication with other nodes in the cluster. As a result, fail-over operations will be hampered and the cluster service will be forced to distribute session/request load across the remaining clusters in the node; this in turn may overload the other nodes in the cluster. In the aftermath of this, mission-critical business services using the clustered resources may experience prolonged outages or slowdowns, resulting in considerable loss of revenue and reputation.

To avoid this, administrators need to continuously monitor the availability of the cluster interconnect on each node, analyze how session/process load is distributed across the nodes via the interconnect, and proactively detect the following:

  • The sudden unavailability of the interconnect on a node;
  • How the unavailability of an interconnect affects the load on the other nodes in the cluster;

For this purpose, you can use the Oracle Cluster Interconnects test. This test periodically verifies whether the nodes in the cluster are able to communicate via the cluster interconnect, and promptly reports the non-availability of the interconnect. In addition, the test also keeps tabs on the session and process load on each node in the cluster, thus promptly revealing the impact of the unavailability of a cluster interconnect on the load and performance of other nodes in the cluster.

Target of the test : Oracle RAC

Agent deploying the test : An internal agent

Outputs of the test : One set of results for each clusternodeID_<IP_address_used_for_internode_communication> in the Oracle cluster.

Configurable parameters for the test
  1. TEST PERIOD - How often should the test be executed.
  2. Host – The host for which the test is to be configured.
  3. Port - The port on which the server is listening.
  4. orasid - The variable name of the oracle instance.
  5. service name - A ServiceName exists for the entire Oracle RAC system. When clients connect to an Oracle cluster using the ServiceName, then the cluster routes the request to any available database instance in the cluster. By default, the service name is set to none. In this case, the test connects to the cluster using the orasid and pulls out the metrics from that database instance which corresponds to that orasid. If a valid service name is specified instead, then, the test will connect to the cluster using that service name, and will be able to pull out metrics from any available database instance in the cluster.

    To know the ServiceName of a cluster, execute the following query on any node in the target cluster:

    select name, value from v$parameter where name =’service_names’

  6. User – In order to monitor an Oracle database server, a special database user account has to be created in every Oracle database instance that requires monitoring. A Click here hyperlink is available in the test configuration page, using which a new oracle database user can be created. Alternatively, you can manually create the special database user. When doing so, ensure that this user is vested with the select_catalog_role and create session privileges.

    The sample script we recommend for user creation (in Oracle database server versions before 12c) for eG monitoring is:

    create user oraeg identified by oraeg ;

    create role oratest;

    grant create session to oratest;

    grant select_catalog_role to oratest;

    grant oratest to oraeg;

    The sample script we recommend for user creation (in Oracle database server 12c) for eG monitoring is:

    alter session set container=<Oracle_service_name>;

    create user <user_name>identified by <user_password> container=current default tablespace <name_of_default_tablespace> temporary tablespace <name_of_temporary_tablespace>;

    Grant create session to <user_name>;                                 

    Grant select_catalog_role to <user_name>;

    The name of this user has to be specified here.

  7. Password – Password of the specified database user
  8. Confirm password – Confirm the password by retyping it here.
  9. hide ip – This test reports a set of metrics for that IP address on each cluster node using which a node communicates with other nodes in the cluster. The descriptors of this test therefore will be of the following format by default: clusternodeID_<IP_used_for_internode_communication>. Accordingly, the hide ip parameter is set to No by default. High security environments however may not want to expose the IP address that cluster nodes use for internal communication. In such environments, you can set the hide ip flag to No, so that the descriptors of this test do not include the <IP_used_for_internode_communication>. In such cases therefore, only the clusternodeID will be displayed as the descriptors of this test. 
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Cluster interconnect percentage:

Indicates whether the cluster interconnect isavailable on this node or not.

 

Percent

The value 0 for this measure indicates that this node is unable to communicate with other nodes in the cluster via the cluster interconnect. The value 100 indicates that the interconnect is available and is enabling this node to communicate with the other cluster nodes.

Logon rate :

Indicates the rate at which user logons occurred on this node.

Logons/Sec

 

Processes utilization :

Indicates the number of processes currently running on this cluster node.

Number

As long as the value of this measure is much lower than the value of the processes setting in the database parameter file, the node will be able to handle the process load.

Processes utilization percentage :

Of the maximum number of processes this node can handle, what percentage is currently active on this cluster node. 

 

Percent

Ideally, the value of this measure should be low. If this measure value is close to 100%, it could mean that the node is about to exhaust its processing limit and may not be able to handle any more processes. On the other hand, if the value of this measure is consistently high for a cluster node, then check the processes setting in the database parameter file to figure out whether/not the node has been configured with adequate processing capability. If this check reveals that the node has been configured with a limited number of processes than it can handle, you may want to increase the processes setting to suit the node’s capacity.

Session utilization :

Indicates the number of sessions that are currently active on this node.

Number

As long as the value of this measure is much lower than the value of the sessions setting in the database parameter file, the node will be able to  handle the session load. If the value of this measure is unusually high for any cluster node, then compare the value of this measure across nodes to figure out whether/not load is uniformly distributed across all cluster nodes. If session load on most of the cluster nodes is high, then the sudden increase in session load could be attributed to an unavailable cluster interconnect. Because of the unavailability, the cluster service may not have been unable to contact the affected cluster node and may have been compelled to distribute the load amongst the remaining cluster nodes. This may have caused load on the other nodes to suddenly increase. To confirm this, check the value of the Interconnect availability percentage measure of all nodes.

On the other hand, if no interconnect is unavailable, and if Session utilization is abnormally high on a particular node only, it could mean that that node is indeed overloaded.

Session utilization percentage :

Of the maximum number of sessions this node can handle, what percentage is currently active on this cluster node. 

 

 

Percent

Ideally, the value of this measure should be low. If this measure value is close to 100%, it could mean that the node may not be able to handle any more sessions. On the other hand, if the value of this measure is consistently high for a cluster node, then check the sessions setting in the database parameter file to figure out whether/not the node has been configured with adequate session-handling capability. If this check reveals that the node has been configured with a limited number of sessions than it can handle, you may want to increase the sessions setting to suit the node’s true capacity.