Oracle RAC Dead Kill Processes Test

If one/more sessions or processes on the Oracle server are obstructing the execution of a few other sessions/processes, then, it is quiet natural for administrators to want to kill the blocking sessions/processes to ensure the smooth execution of critical database transactions. Typically, these ‘dead’ sessions/processes continue to consume resources, until the PMON process automatically cleans up these sessions/processes. If cleanup is delayed, then the Oracle instance will not be able to release those objects and resources that have been locked by the dead sessions/processes for long time periods. In such situations, administrators often resort to killing these dead sessions/processes at the operating system-level, so as to hasten the release of valuable resources. Before attempting the OS-level kill, administrators should first figure out which sessions/processes are ‘dead’ presently and how long they have been ‘dead’. This can be ascertained using the Oracle RAC Dead Kill Processes Test.

This test auto-discovers the dead processes/sessions on each node and reports the current cleanup state of each process/session. In addition, the test reveals the duration for which each process/session remained dead and the count of processes that are being blocked by that dead process/session. This way, administrators can determine whether/not cleanup is occurring as per schedule, and if not, how badly the delay in cleanup is affecting other processes. Alongside, administrators can figure out whether an OS-level process kill is justified or not.

Note:

This test is applicable only for CDB(Container Database) configuration of Oracle Clusters with Multi-tenant support.

Target of the test : Oracle RAC

Agent deploying the test : An internal/external agent

Outputs of the test :One set of results for deadprocessaddress_deadsessionaddress on each node in the target Oracle cluster being monitored.

Configurable parameters for the test
Parameter	Description
Test period	How often should the test be executed
Host	The host for which the test is to be configured.
Port	The port on which the server is listening.
SCAN Name	SCAN stands for Single Client Access Name, it is a feature used in Oracle RAC environments that provide a single name for clients to access any Oracle Database running in the cluster. You can provide SCAN as an alternative to IP/Host Name. If this parameter value is provided, it will be used for connectivity otherwise IP/Hostname will be used.
Connecting Port	Monitoring of RAC server involves two components namely PDB and CDB where each component uses separate premium licenses. In order to overcome this, one premium license is now utilized for a single server (with multi-tenant architecture) involving multiple PDB/CDB components. In such case, Connecting Port parameter has been implemented in test configuration page to use the port of this single server to collect the required metrics for multiple PDB/CDB components. Specify the port number of such server in the Connecting Port text box. Note: If Connecting Port parameter specified in the test configuration page represents a single server with multiple components hierarchy, then Port parameter will not be applicable.
Service Name	A ServiceName exists for the entire Oracle RAC system. When clients connect to an Oracle cluster using the ServiceName, then the cluster routes the request to any available database instance in the cluster. By default, the Service Name is set to none. In this case, the test connects to the cluster using the ORASID and pulls out the metrics from that database instance which corresponds to that ORASID. If a valid service name is specified instead, then, the test will connect to the cluster using that Service Name, and will be able to pull out metrics from any available database instance in the cluster. To know the Service Name of a cluster, execute the following query on any node in the target cluster: select name, value from v$parameter where name =’service_names’
ORASID	The variable name of the oracle instance.
Username	In order to monitor an Oracle database server, a special database user account has to be created in every Oracle database instance that requires monitoring. A Click here hyperlink is available in the test configuration page, using which a new oracle database user can be created. Alternatively, you can manually create the special database user. When doing so, ensure that this user is vested with the select_catalog_role and create session privileges. The sample script we recommend for user creation (in Oracle database server versions before 12c) for eG monitoring is: create user oraeg identified by oraeg create role oratest; grant create session to oratest; grant select_catalog_role to oratest; grant oratest to oraeg; The sample script we recommend for user creation (in Oracle database server 12c) for eG monitoring is: alter session set container=<Oracle_service_name>; create user <user_name>identified by <user_password> container=current default tablespace <name_of_default_tablespace> temporary tablespace <name_of_temporary_tablespace>; Grant create session to <user_name>; Grant select_catalog_role to <user_name>; The name of this user has to be specified here.
Password	Specify the password of the specified database user.
Confirm Password	Confirm the Password by retyping it here.
SSL	By default, this flag is set to No, as the target Oracle cluster is not SSL-enabled by default. If the target cluster is SSL-enabled, then set this flag to Yes.
SSL Cipher	This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none. A cipher suite is a set of cryptographic algorithms that are used before a client application and server exchange information over an SSL/TLS connection. It consist of sets of instructions on how to secure a network through SSL (Secure Sockets Layer) or TLS (Transport Layer Security). In this text box, provide a comma-seperated list of cipher suites that are allowed for SSL/TLS connection to the target cluster. By default, this parameter is set to none.
Truststore File	This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none. TrustStore is used to store certificates from Certified Authorities (CA) that verify and authenticate the certificate presented by the server in an SSL connection. Therefore, the eG agent should have access to the truststore where the certificates are stored to authenticate and connect with the target cluster and collect metrics. For this, first import the certificates into the following default location <eG_INSTALL_DIR>/lib/security/mytruststore.jks. To know how to import the certificate into the truststore, refer toPre-requisites for monitoring Oracle Cluster. Then, provide the truststore file name in this text box. For example: mytruststore.jks. By default, none is specified against this text box.
Truststore Type	This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none.Specify the type of truststore that contains the certificates for server authentication in this text box. For eg.,JKS. By default, this parameter is set to the value none.
Truststore Password	This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none. If a Truststore File name is provided, then, in this text box, provide the password that is used to obtain the associated certificate details from the Truststore File. By default, this parameter is set to none.
Keystore File	This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none. Keystore contains the private keys for the certificates that the client can provide to the server upon request. eG agent requires access to the keystore where client certificate is stored to send that to the server so that the server validates the certificate against the one contained in its trustore. For this purpose, first create the client certificate in the following default location EG_INSTALL_DIR/jre/lib/security/mykeystore.jks. Then, provide the keystore file name in this text box. For example: mykeystore.jks. By default, none is specified against this text box.
Keystore Password	This parameter is applicable only if the target Oracle Cluster is SSL-enabled, if not, set this parameter to none. If a Keystore File name or file path is provided, then, in this text box, provide the password that is used to obtain the associated certificate details from the Keystore File.
Confirm Password	Confirm the Password for Keystore by retyping it here.

Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Process state

Indicates the current cleanup state of this process on this node.

The values reported by this measure and their numeric equivalents are available in the table below:

Measure Value	Description	Numeric Value
unsafe to attempt	Occurs for a killed session that has not been moved, so no cleanup can occur on it yet	1
cleanup pending	Occurs for a dead process / killed session that can be cleaned up, but PMON has not yet made an attempt	2
resources freed	Occurs for a dead process / killed session where all children have been freed, but the process / killed session itself is not yet freed	3
resources freed – pending ack	Occurs for a killed session where all children have been freed, but the session itself cannot be freed until the owner has acknowledged it	4
partial cleanup	Occurs if some of the children have been cleaned up	5

Note:

By default, this measure reports the Measure Values listed in the table above indicate the current cleanup state of a dead process. However, in the graph, this measure is indicated using the Numeric Values listed in the table above.

Dead time

Indicates how long it has been since this process was marked dead or this session was marked killed on this node.

Seconds

A consistent increase in the value of this measure is a cause for concern as it indicates that auto-cleanup has not occurred. This can cause the dead process/session to continue consuming resources and blocking object, thereby degrading server performance.

Number blocked

Indicates the count of processes that are blocked by this process on this node.

Number

A high value indicates that the dead process is impeding the execution of many other processes, some of which may also be mission-critical.

If the Dead time of such a process is also very high, it is a matter of great concern, and must be looked into immediately.

In such circumstances, you may want to consider killing the process at the OS-level. On a Unix system, you can issue the KILL -9 <PID> command at the Shell prompt to kill the process at that level.