Oracle RAC Dataguard RTO Test

Oracle Data Guard redo transport performance is directly dependent on the performance of the primary and standby systems, the network that connects them, and the I/O subsystem. As changes occur on the primary database of an Oracle Cluster, redo is generated and sent to the standby database. The frequency of shipping redo to the standby is determined by whether the remote destination is using synchronous or asynchronous redo transport. If redo apply was started using real-time apply, redo generated by the primary database is applied to the standby database as soon as it is received (i.e., there is no wait for the database to switch logs). Sometimes, the standby database may start lagging owing to a poor network connection or due to a sudden surge in I/O operations. If the standby database lags for a duration that is longer than the permissible time duration, then, data will not be up-to-date between the primary database and the standby database. If during such time, the primary database fails, the standby database may not contain a significant amount of data thus resulting in data loss which may in turn lead to interruption in business services. To avoid such data loss, it is essential to keep track on the time lag noticed between the standby database and the primary database. The Oracle RAC Dataguard RTO test helps administrators in this regard!

This test periodically monitors the target Oracle Cluster database server and reports the time lag noticed on the standby database when redo logs are applied and the time lag noticed when the redo is transported from the primary server. Using this test, administrators can estimate the time required for the standby database and the primary database to be in sync and the time required to start the standby database when the primary database fails.

Note:

This test will report metrics only when the database server being monitored is the standby database of a database instance in the Oracle Cluster database server on which Data Guard feature is enabled.

Target of the test : An Oracle Cluster database server on which Data Guard feature is enabled

Agent deploying the test : An internal/external agent

Outputs of the test : One set of results for the target Oracle Cluster Database server being monitored.

Configurable parameters for the test
Parameter	Description
Test period	How often should the test be executed
Host	The host for which the test is to be configured.
Port	The port on which the server is listening.
SCAN Name	SCAN stands for Single Client Access Name, it is a feature used in Oracle RAC environments that provide a single name for clients to access any Oracle Database running in the cluster. You can provide SCAN as an alternative to IP/Host Name. If this parameter value is provided, it will be used for connectivity otherwise IP/Hostname will be used.
Service Name	A ServiceName exists for the entire Oracle RAC system. When clients connect to an Oracle cluster using the ServiceName, then the cluster routes the request to any available database instance in the cluster. By default, the Service Name is set to none. In this case, the test connects to the cluster using the ORASID and pulls out the metrics from that database instance which corresponds to that ORASID. If a valid service name is specified instead, then, the test will connect to the cluster using that Service Name, and will be able to pull out metrics from any available database instance in the cluster. To know the Service Name of a cluster, execute the following query on any node in the target cluster: select name, value from v$parameter where name =’service_names’
ORASID	The variable name of the oracle instance.
Username	In order to monitor an Oracle database server, a special database user account has to be created in every Oracle database instance that requires monitoring. A Click here hyperlink is available in the test configuration page, using which a new oracle database user can be created. Alternatively, you can manually create the special database user. When doing so, ensure that this user is vested with the select_catalog_role and create session privileges. The sample script we recommend for user creation (in Oracle database server versions before 12c) for eG monitoring is: create user oraeg identified by oraeg create role oratest; grant create session to oratest; grant select_catalog_role to oratest; grant oratest to oraeg; The sample script we recommend for user creation (in Oracle database server 12c) for eG monitoring is: alter session set container=<Oracle_service_name>; create user <user_name>identified by <user_password> container=current default tablespace <name_of_default_tablespace> temporary tablespace <name_of_temporary_tablespace>; Grant create session to <user_name>; Grant select_catalog_role to <user_name>; The name of this user has to be specified here.
Password	Specify the password of the specified database user.
Confirm Password	Confirm the Password by retyping it here.
SSL	By default, this flag is set to No, as the target Oracle cluster is not SSL-enabled by default. If the target cluster is SSL-enabled, then set this flag to Yes.
SSL Cipher	This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none. A cipher suite is a set of cryptographic algorithms that are used before a client application and server exchange information over an SSL/TLS connection. It consist of sets of instructions on how to secure a network through SSL (Secure Sockets Layer) or TLS (Transport Layer Security). In this text box, provide a comma-seperated list of cipher suites that are allowed for SSL/TLS connection to the target cluster. By default, this parameter is set to none.
Truststore File	This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none. TrustStore is used to store certificates from Certified Authorities (CA) that verify and authenticate the certificate presented by the server in an SSL connection. Therefore, the eG agent should have access to the truststore where the certificates are stored to authenticate and connect with the target cluster and collect metrics. For this, first import the certificates into the following default location <eG_INSTALL_DIR>/lib/security/mytruststore.jks. To know how to import the certificate into the truststore, refer toPre-requisites for monitoring Oracle Cluster. Then, provide the truststore file name in this text box. For example: mytruststore.jks. By default, none is specified against this text box.
Truststore Type	This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none.Specify the type of truststore that contains the certificates for server authentication in this text box. For eg.,JKS. By default, this parameter is set to the value none.
Truststore Password	This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none. If a Truststore File name is provided, then, in this text box, provide the password that is used to obtain the associated certificate details from the Truststore File. By default, this parameter is set to none.

Measurements made by the test
Measurement	Description	Measurement Unit	Interpretation
Redo apply lagging duration	Indicates the time elapsed or time lag noticed on the standby database when redo logs are applied from the primary database.	Minutes	An apply lag is the difference, in elapsed time, between when the last applied change became visible on the standby and when that same change was first visible on the primary. Redo apply performance depends mainly on the type of workload that is being recovered and the system resources allocated to recovery. For standby databases on symmetric hardware and configuration, the apply lag should less than 10 seconds. A high value for this measure indicates that the standby database is lagging far behind the primary database and in case of failure of the primary database, there may be a considerable amount of data loss.
Redo transport lagging duration	Indicates the time lag noticed in the transport of redo logs to the standby database with respect to the generation of logs in the primary database.	Minutes	Given enough resources, in particular network bandwidth, an Oracle Data Guard standby can maintain pace with very high workloads. In cases where resources are constrained, the standby can begin to fall behind, resulting in a transport or apply lag. A transport lag is the amount of data, measured in time, that the standby has not received from the primary. A low value is desired for this measure.
Estimated redo apply finish time on standby	Indicates the estimated amount of time required to apply the redo on the standby database so that both the standby and primary databases are in sync.	Minutes	A low value is desired for this measure.
Estimated startup time of standby	Indicates the time duration needed to start the standby database.	Minutes	A low value is desired for this measure. A high value for this measure indicates that during fail over, mission-critical business services may be affected for the time duration reported by this measure.