Oracle RAC Instances Test

In most production environments, it is essential to monitor the uptime of critical servers in the infrastructure. By tracking the uptime of each of the servers, administrators can determine what percentage of time a server has been up. Comparing this value with service level targets, administrators can determine the most trouble-prone areas of the infrastructure.

In some environments, administrators may schedule periodic reboots of their servers. By knowing that a specific server has been up for an unusually long time, an administrator may come to know that the scheduled reboot task is not working on a server.

This test monitors the uptime of every node in an Oracle cluster.

Target of the test : Oracle Cluster

Agent deploying the test : An internal agent

Outputs of the test : One set of results for each node in the Oracle cluster being monitored

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed

Host

The host for which the test is to be configured.

Port

The port on which the server is listening.

Service Name

A ServiceName exists for the entire Oracle RAC system. When clients connect to an Oracle cluster using the ServiceName, then the cluster routes the request to any available database instance in the cluster. By default, the Service Name is set to none. In this case, the test connects to the cluster using the ORASID and pulls out the metrics from that database instance which corresponds to that ORASID. If a valid service name is specified instead, then, the test will connect to the cluster using that Service Name, and will be able to pull out metrics from any available database instance in the cluster.

To know the Service Name of a cluster, execute the following query on any node in the target cluster:

select name, value from v$parameter where name =’service_names’

ORASID

The variable name of the oracle instance.

Username

In order to monitor an Oracle database server, a special database user account has to be created in every Oracle database instance that requires monitoring. A Click here hyperlink is available in the test configuration page, using which a new oracle database user can be created. Alternatively, you can manually create the special database user. When doing so, ensure that this user is vested with the select_catalog_role and create session privileges.

The sample script we recommend for user creation (in Oracle database server versions before 12c) for eG monitoring is:

create user oraeg identified by oraeg

create role oratest;

grant create session to oratest;

grant select_catalog_role to oratest;

grant oratest to oraeg;

The sample script we recommend for user creation (in Oracle database server 12c) for eG monitoring is:

alter session set container=<Oracle_service_name>;

create user <user_name>identified by <user_password> container=current default tablespace <name_of_default_tablespace> temporary tablespace <name_of_temporary_tablespace>;

Grant create session to <user_name>;                                

Grant select_catalog_role to <user_name>;

The name of this user has to be specified here.

Password

Specify the password of the specified database user.

Confirm Password

Confirm the Password by retyping it here.

Report Manager Time

By default, this flag is set to Yes, indicating that, by default, the detailed diagnosis of this test, if enabled, will report the shutdown and reboot times of the device in the manager’s time zone. If this flag is set to No, then the shutdown and reboot times are shown in the time zone of the system where the agent is running (i.e., the system being managed for agent-based monitoring, and the system on which the remote agent is running - for agentless monitoring).

SSL

By default, this flag is set to No, as the target Oracle cluster is not SSL-enabled by default. If the target cluster is SSL-enabled, then set this flag to Yes.

SSL Cipher

This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none. A cipher suite is a set of cryptographic algorithms that are used before a client application and server exchange information over an SSL/TLS connection. It consist of sets of instructions on how to secure a network through SSL (Secure Sockets Layer) or TLS (Transport Layer Security). In this text box, provide a comma-seperated list of cipher suites that are allowed for SSL/TLS connection to the target cluster. By default, this parameter is set to none.

Truststore File

This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none. TrustStore is used to store certificates from Certified Authorities (CA) that verify and authenticate the certificate presented by the server in an SSL connection. Therefore, the eG agent should have access to the truststore where the certificates are stored to authenticate and connect with the target cluster and collect metrics. For this, first import the certificates into the following default location <eG_INSTALL_DIR>/lib/security/mytruststore.jks. To know how to import the certificate into the truststore, refer toPre-requisites for monitoring Oracle Cluster. Then, provide the truststore file name in this text box. For example: mytruststore.jks. By default, none is specified against this text box.

Truststore Type

This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none.Specify the type of truststore that contains the certificates for server authentication in this text box. For eg.,JKS. By default, this parameter is set to the value none.

Truststore Password

This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none. If a Truststore File name is provided, then, in this text box, provide the password that is used to obtain the associated certificate details from the Truststore File. By default, this parameter is set to none.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Has Oracle server been restarted?:

Indicates whether this node has been rebooted during the last measurement period or not.

Boolean

This measure is not reported for Summary descriptor.

If this measure shows 1, it means that the node was rebooted during the last measurement period. By checking the time periods when this metric changes from 0 to 1, an administrator can determine the times when this node was rebooted.  The detailed diagnosis of this measure, if enabled, indicates the date/time at which the node was shutdown, the date on which it was restarted, the duration of the shutdown, and whether the node was shutdown as part of a maintenance outine. 

Uptime since the last measurement:

Indicates the time period that this node has been up since the last time this test ran.

Secs

This measure is not reported for Summary descriptor.

If the node has not been rebooted during the last measurement period and the agent has been running continuously, this value will be equal to the measurement period. If the node was rebooted during the last measurement period, this value will be less than the measurement period of the test. For example, if the measurement period is 300 secs, and if the node was rebooted 120 secs back, this metric will report a value of 120 seconds.  The accuracy of this metric is dependent on the measurement period – the smaller the measurement period, greater the accuracy.

Uptime:

Indicates the total time that the node has been up since its last reboot.

Mins

This measure is not reported for Summary descriptor.

Administrators may wish to be alerted if a node has been running without a reboot for a very long period. Setting a threshold for this metric allows administrators to determine such conditions.

Instance open mode

Indicates the mode in which this instance was opened.

 

This measure is not reported for Summary descriptor.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value Numeric Value
Mounted 0
Read write 1
Read only 2
Read only with apply 3

Note:

This measure reports the Measure Values listed in the table above indicates the open mode status of the instance. The graph of this measure however, indicates the same using the numeric equivalents only.

Number of mounted nodes

Indicates the number of instances that were opened in mounted mode.

Number

This measure is reported only for Summary descriptor.

The instance mounts a database to associate the database with that instance. When an instance is mounted and open; this mode allows any valid user to connect to the database and perform typical data access operations. In most cases, standby databases in mounted mode carry out the recovery process.

Number of read write nodes

Indicates the number of instances that were opened in read write mode.

Number

This measure is reported only for Summary descriptor.

If an instance is in read write mode, then users can make changes to the data, generating redo in the online redo log.

By default, the database opens in read/write mode, unless a standby database is accessed.

The value of this measure helps to identify failure of primary databases in the cluster. If failure of primary database occurs after failure of all standby databases, then it can affect availability leading to data loss.

Number of read only nodes

Indicates the number of instances that were opened in read only mode.

Number

This measure is reported only for Summary descriptor.

A standby database maintains a duplicate copy of your primary database and provides continued availability in the event of a disaster. A standby database can be opened only in read-only mode to use it as a temporary reporting database. Standby database cannot be opened in read/write mode. In a cluster with more than one standby databases, failure of one of the standby will transfer the load to the existing database. But in case of failure of all the standby databases, it can lead to shut down of the primary leading to data loss and unavailability.

Number of read only with apply nodes

Indicates the number of instances that were opened in read only with apply mode.

Number

This measure is reported only for Summary descriptor.

If the Data guard feature is enabled in the cluster, then the standby databases that carry out the recovery process will be in Read only with apply mode.

Total number of nodes

Indicates the total number of nodes in this cluster.

Number

This measure is reported only for Summary descriptor.