Oracle RAC Undo Usage SqlId Test

Every Oracle Database must have a method of maintaining information that is used to roll back, or undo, changes to the database. Such information consists of records of the actions of transactions, primarily before they are committed. These records are collectively referred to as undo. Undo records are used to:

Roll back transactions when a ROLLBACK statement is issued
Recover the database
Provide read consistency
Analyze data as of an earlier point in time by using Oracle Flashback Query
Recover from logical corruptions using Oracle Flashback features

When a ROLLBACK statement is issued, undo records are used to undo changes that were made to the database by the uncommitted transaction. During database recovery, undo records are used to undo any uncommitted changes applied from the redo log to the datafiles. Undo records provide read consistency by maintaining the before image of the data for users who are accessing the data at the same time that another user is changing it.

Oracle provides a fully automated mechanism, referred to as automatic undo management, for managing undo information and space. In this management mode, you create an undo tablespace, and the server automatically manages undo segments and space among the various active sessions.

Each instance in the RAC system can only use one undo tablespace at a time. In other words, instances cannot share undo tablespaces. Each instance in the cluster, being an independent transaction-processing environment, maintains its own UNDO area for undo management. The RAC system allows the creation and use of several undo tablespaces. When the instance is started, it uses the first available undo tablespace. A second instance will use another undo tablespace. Thus, each instance in a RAC system will have exclusive access to a particular undo tablespace at a given time. The undo tablespace cannot be shared among the instances at the same time. Only once an undo tablespace is released by an instance, it can be assigned to another instance. However, all instances can read blocks from any or all undo tablespaces for the purpose of constructing read-consistency images.

If instances take too long a time to read from or write to a undo tablespace, it will unnecessarily delay the recovery/rollback process and sometimes even abnormally terminate it, causing serious data inconsistencies. To avoid such adversities, it is imperative that you monitor how long the undo tablespace takes to execute queries, and promptly detect latencies in query execution. The Oracle RAC Undo Usage SqIID test enables you to achieve the same. The test also points you to time-consuming SQL queries to the undo tablespace, so that you can fine-tune them. In the process, the test also monitors the contents of the undo tablespace used by each instance and their undo retention period.

Target of the test : Oracle RAC

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for the undo tablespace of each instance managed by the monitored Oracle RAC

Configurable parameters for the test
TEST PERIOD - How often should the test be executed. Host – The host for which the test is to be configured. Port - The port on which the server is listening. SCAN Name - SCAN stands for Single Client Access Name, it is a feature used in Oracle RAC environments that provide a single name for clients to access any Oracle Database running in the cluster. You can provide SCAN as an alternative to IP/Host Name. If this parameter value is provided, it will be used for connectivity otherwise IP/Hostname will be used. orasid - The variable name of the oracle instance. service name - A ServiceName exists for the entire Oracle RAC system. When clients connect to an Oracle cluster using the ServiceName, then the cluster routes the request to any available database instance in the cluster. By default, the service name is set to none. In this case, the test connects to the cluster using the orasid and pulls out the metrics from that database instance which corresponds to that orasid. If a valid service name is specified instead, then, the test will connect to the cluster using that service name, and will be able to pull out metrics from any available database instance in the cluster. To know the ServiceName of a cluster, execute the following query on any node in the target cluster: select name, value from v$parameter where name =’service_names’ User – In order to monitor an Oracle database server, a special database user account has to be created in every Oracle database instance that requires monitoring. A Click here hyperlink is available in the test configuration page, using which a new oracle database user can be created. Alternatively, you can manually create the special database user. When doing so, ensure that this user is vested with the select_catalog_role and create session privileges. The sample script we recommend for user creation (in Oracle database server versions before 12c) for eG monitoring is: create user oraeg identified by oraeg ; create role oratest; grant create session to oratest; grant select_catalog_role to oratest; grant oratest to oraeg; The sample script we recommend for user creation (in Oracle database server 12c) for eG monitoring is: alter session set container=<Oracle_service_name>; create user <user_name>identified by <user_password> container=current default tablespace <name_of_default_tablespace> temporary tablespace <name_of_temporary_tablespace>; Grant create session to <user_name>; Grant select_catalog_role to <user_name>; The name of this user has to be specified here. Password – Password of the specified database user Confirm password – Confirm the password by retyping it here. SSL- By default, this flag is set to No, as the target Oracle cluster is not SSL-enabled by default. If the target cluster is SSL-enabled, then set this flag to Yes. SSL Cipher-This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none. A cipher suite is a set of cryptographic algorithms that are used before a client application and server exchange information over an SSL/TLS connection. It consist of sets of instructions on how to secure a network through SSL (Secure Sockets Layer) or TLS (Transport Layer Security). In this text box, provide a comma-seperated list of cipher suites that are allowed for SSL/TLS connection to the target cluster. By default, this parameter is set to none. TRUSTSTORE FILE- This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none. TrustStore is used to store certificates from Certified Authorities (CA) that verify and authenticate the certificate presented by the server in an SSL connection. Therefore, the eG agent should have access to the truststore where the certificates are stored to authenticate and connect with the target cluster and collect metrics. For this, first import the certificates into the following default location <eG_INSTALL_DIR>/lib/security/mytruststore.jks. To know how to import the certificate into the truststore, refer toPre-requisites for monitoring Oracle Cluster. Then, provide the truststore file name in this text box. For example: mytruststore.jks. By default, none is specified against this text box. TRUSTSTORE TYPE-This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none.Specify the type of truststore that contains the certificates for server authentication in this text box. For eg.,JKS. By default, this parameter is set to the value none. TRUSSTORE PASSWORD-This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none. If a Truststore File name is provided, then, in this text box, provide the password that is used to obtain the associated certificate details from the Truststore File. By default, this parameter is set to none. To make diagnosis more efficient and accurate, eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled: The eG manager license should allow the detailed diagnosis capability Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

Measurements made by the test
Measurement	Description	Measurement Unit	Interpretation
Maximum query time:	Indicates the time taken for executing a query on this tablespace in this instance.	Secs	Ideally, the value of this measure should be low. An unusually high value for this measure could indicate that one/more queries are taking too long to execute on the undo tablespace. Use the detailed diagnosis of this measure to identify the the SQL IDs of teh top queries to the undo tablespace in terms of duration of execution. Inefficient queries can thus be isolated. Fine-tuning these queries will enable the optimal usage of the undo tablespace. Query processing will also be delayed if the undo tablespace is improperly sized. If the undo tablespace has insufficient space, many transactions to the tablespace may terminate before completion, and many more transactions may even hang; this will result in a long line of long-running queries. Also, since Oracle automatically tunes the undo retention period based on undo tablespace size and system activity, if the undo tablespace runs out of space, it will grossly affect the auto retention capability of Oracle, once again causing query failures. When available space for new transactions becomes short, the database begins to overwrite expired undo. If the undo tablespace has no space for new transactions after all expired undo is overwritten, the database may begin overwriting unexpired undo information. If any of this overwritten undo information is required for consistent read in a current long-running query, the query could fail with the snapshot too old error message.
Active blocks:	Indicates the number of active blocks available in this undo tablespace.	Number	These blocks consist of undo data that supports active transactions and are required in the event of rollback.
Unexpired blocks:	Indicates the number of unexpired blocks available in this undo tablespace.	Number	After a transaction is committed, undo data is no longer needed for rollback or transaction recovery purposes. However, for consistent read purposes, long-running queries may require this old undo information for producing older images of data blocks. Such type of blocks that are needed for supporting the UNDO_RETENTION parameter is called as unexpired blocks.
Expired blocks:	Indicates the number of expired blocks available in this undo tablespace.	Number	The undo information that is no longer needed for rollback is stored in these expired blocks. These type of blocks will be over written by fresh information as and when required.
Tuned undo retention:	Indicates the time taken for undo retention of data blocks in this undo tablespace.	Secs	After a transaction is committed, undo data is no longer needed for rollback or transaction recovery purposes. However, for consistent read purposes, long-running queries may require this old undo information for producing older images of data blocks. Furthermore, the success of several Oracle Flashback features can also depend upon the availability of older undo information. For these reasons, it is desirable to retain the old undo information for as long as possible. When automatic undo management is enabled, there is always a current undo retention period, which is the minimum amount of time that Oracle Database attempts to retain old undo information before overwriting it. Old (committed) undo information that is older than the current undo retention period is said to be expired. Old undo information with an age that is less than the current undo retention period is said to be unexpired. Oracle Database automatically tunes the undo retention period based on undo tablespace size and system activity. You can specify a minimum undo retention period (in seconds) by setting the UNDO_RETENTION initialization parameter. The database makes its best effort to honor the specified minimum undo retention period, provided that the undo tablespace has space available for new transactions. When available space for new transactions becomes short, the database begins to overwrite expired undo. If the undo tablespace has no space for new transactions after all expired undo is overwritten, the database may begin overwriting unexpired undo information. If any of this overwritten undo information is required for consistent read in a current long-running query, the query could fail with the snapshot too old error message. The following points explain the exact impact of the UNDO_RETENTION parameter on undo retention: The UNDO_RETENTION parameter is ignored for a fixed size undo tablespace. The database may overwrite unexpired undo information when tablespace space becomes low. For an undo tablespace with the AUTOEXTEND option enabled, the database attempts to honor the minimum retention period specified by UNDO_RETENTION. When space is low, instead of overwriting unexpired undo information, the tablespace auto-extends. If the MAXSIZE clause is specified for an auto-extending undo tablespace, when the maximum size is reached, the database may begin to overwrite unexpired undo information.