Oracle RAC Instances Test
In most production environments, it is essential to monitor the uptime of critical servers in the infrastructure. By tracking the uptime of each of the servers, administrators can determine what percentage of time a server has been up. Comparing this value with service level targets, administrators can determine the most trouble-prone areas of the infrastructure.
In some environments, administrators may schedule periodic reboots of their servers. By knowing that a specific server has been up for an unusually long time, an administrator may come to know that the scheduled reboot task is not working on a server.
This test monitors the uptime of every node in an Oracle cluster.
Target of the test : Oracle Cluster
Agent deploying the test : An internal agent
Outputs of the test : One set of results for each node in the Oracle cluster being monitored
Parameter | Description |
---|---|
Test period |
How often should the test be executed |
Host |
The host for which the test is to be configured. |
Port |
The port on which the server is listening. |
SCAN Name |
SCAN stands for Single Client Access Name, it is a feature used in Oracle RAC environments that provide a single name for clients to access any Oracle Database running in the cluster. You can provide SCAN as an alternative to IP/Host Name. If this parameter value is provided, it will be used for connectivity otherwise IP/Hostname will be used. |
Service Name |
A ServiceName exists for the entire Oracle RAC system. When clients connect to an Oracle cluster using the ServiceName, then the cluster routes the request to any available database instance in the cluster. By default, the Service Name is set to none. In this case, the test connects to the cluster using the ORASID and pulls out the metrics from that database instance which corresponds to that ORASID. If a valid service name is specified instead, then, the test will connect to the cluster using that Service Name, and will be able to pull out metrics from any available database instance in the cluster. To know the Service Name of a cluster, execute the following query on any node in the target cluster: select name, value from v$parameter where name =’service_names’ |
ORASID |
The variable name of the oracle instance. |
Username |
In order to monitor an Oracle database server, a special database user account has to be created in every Oracle database instance that requires monitoring. A Click here hyperlink is available in the test configuration page, using which a new oracle database user can be created. Alternatively, you can manually create the special database user. When doing so, ensure that this user is vested with the select_catalog_role and create session privileges. The sample script we recommend for user creation (in Oracle database server versions before 12c) for eG monitoring is: create user oraeg identified by oraeg create role oratest; grant create session to oratest; grant select_catalog_role to oratest; grant oratest to oraeg; The sample script we recommend for user creation (in Oracle database server 12c) for eG monitoring is: alter session set container=<Oracle_service_name>; create user <user_name>identified by <user_password> container=current default tablespace <name_of_default_tablespace> temporary tablespace <name_of_temporary_tablespace>; Grant create session to <user_name>; Grant select_catalog_role to <user_name>; The name of this user has to be specified here. |
Password |
Specify the password of the specified database user. |
Confirm Password |
Confirm the Password by retyping it here. |
Report Manager Time |
By default, this flag is set to Yes, indicating that, by default, the detailed diagnosis of this test, if enabled, will report the shutdown and reboot times of the device in the manager’s time zone. If this flag is set to No, then the shutdown and reboot times are shown in the time zone of the system where the agent is running (i.e., the system being managed for agent-based monitoring, and the system on which the remote agent is running - for agentless monitoring). |
SSL |
By default, this flag is set to No, as the target Oracle cluster is not SSL-enabled by default. If the target cluster is SSL-enabled, then set this flag to Yes. |
SSL Cipher |
This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none. A cipher suite is a set of cryptographic algorithms that are used before a client application and server exchange information over an SSL/TLS connection. It consist of sets of instructions on how to secure a network through SSL (Secure Sockets Layer) or TLS (Transport Layer Security). In this text box, provide a comma-seperated list of cipher suites that are allowed for SSL/TLS connection to the target cluster. By default, this parameter is set to none. |
Truststore File |
This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none. TrustStore is used to store certificates from Certified Authorities (CA) that verify and authenticate the certificate presented by the server in an SSL connection. Therefore, the eG agent should have access to the truststore where the certificates are stored to authenticate and connect with the target cluster and collect metrics. For this, first import the certificates into the following default location <eG_INSTALL_DIR>/lib/security/mytruststore.jks. To know how to import the certificate into the truststore, refer toPre-requisites for monitoring Oracle Cluster. Then, provide the truststore file name in this text box. For example: mytruststore.jks. By default, none is specified against this text box. |
Truststore Type |
This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none.Specify the type of truststore that contains the certificates for server authentication in this text box. For eg.,JKS. By default, this parameter is set to the value none. |
Truststore Password |
This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none. If a Truststore File name is provided, then, in this text box, provide the password that is used to obtain the associated certificate details from the Truststore File. By default, this parameter is set to none. |
Measurement | Description | Measurement Unit | Interpretation | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Has Oracle server been restarted?: |
Indicates whether this node has been rebooted during the last measurement period or not. |
Boolean |
This measure is not reported for Summary descriptor. If this measure shows 1, it means that the node was rebooted during the last measurement period. By checking the time periods when this metric changes from 0 to 1, an administrator can determine the times when this node was rebooted. The detailed diagnosis of this measure, if enabled, indicates the date/time at which the node was shutdown, the date on which it was restarted, the duration of the shutdown, and whether the node was shutdown as part of a maintenance outine. |
||||||||||
Uptime since the last measurement: |
Indicates the time period that this node has been up since the last time this test ran. |
Secs |
This measure is not reported for Summary descriptor. If the node has not been rebooted during the last measurement period and the agent has been running continuously, this value will be equal to the measurement period. If the node was rebooted during the last measurement period, this value will be less than the measurement period of the test. For example, if the measurement period is 300 secs, and if the node was rebooted 120 secs back, this metric will report a value of 120 seconds. The accuracy of this metric is dependent on the measurement period – the smaller the measurement period, greater the accuracy. |
||||||||||
Uptime: |
Indicates the total time that the node has been up since its last reboot. |
Mins |
This measure is not reported for Summary descriptor. Administrators may wish to be alerted if a node has been running without a reboot for a very long period. Setting a threshold for this metric allows administrators to determine such conditions. |
||||||||||
Instance open mode |
Indicates the mode in which this instance was opened. |
|
This measure is not reported for Summary descriptor. The values reported by this measure and its numeric equivalents are mentioned in the table below:
Note: This measure reports the Measure Values listed in the table above indicates the open mode status of the instance. The graph of this measure however, indicates the same using the numeric equivalents only. |
||||||||||
Number of mounted nodes |
Indicates the number of instances that were opened in mounted mode. |
Number |
This measure is reported only for Summary descriptor. The instance mounts a database to associate the database with that instance. When an instance is mounted and open; this mode allows any valid user to connect to the database and perform typical data access operations. In most cases, standby databases in mounted mode carry out the recovery process. |
||||||||||
Number of read write nodes |
Indicates the number of instances that were opened in read write mode. |
Number |
This measure is reported only for Summary descriptor. If an instance is in read write mode, then users can make changes to the data, generating redo in the online redo log. By default, the database opens in read/write mode, unless a standby database is accessed. The value of this measure helps to identify failure of primary databases in the cluster. If failure of primary database occurs after failure of all standby databases, then it can affect availability leading to data loss. |
||||||||||
Number of read only nodes |
Indicates the number of instances that were opened in read only mode. |
Number |
This measure is reported only for Summary descriptor.
A standby database maintains a duplicate copy of your primary database and provides continued availability in the event of a disaster. A standby database can be opened only in read-only mode to use it as a temporary reporting database. Standby database cannot be opened in read/write mode. In a cluster with more than one standby databases, failure of one of the standby will transfer the load to the existing database. But in case of failure of all the standby databases, it can lead to shut down of the primary leading to data loss and unavailability. |
||||||||||
Number of read only with apply nodes |
Indicates the number of instances that were opened in read only with apply mode. |
Number |
This measure is reported only for Summary descriptor.
If the Data guard feature is enabled in the cluster, then the standby databases that carry out the recovery process will be in Read only with apply mode. |
||||||||||
Total number of nodes |
Indicates the total number of nodes in this cluster. |
Number |
This measure is reported only for Summary descriptor. |