HA Events Test

A high availability (HA) deployment of two Citrix ADC appliances can provide uninterrupted operation in any transaction. With one appliance configured as the primary node and the other as the secondary node, the primary node accepts connections and manages servers while the secondary node monitors the primary. If, for any reason, the primary node is unable to accept connections, the secondary node takes over.

The secondary node monitors the primary by sending periodic messages (often called heartbeat messages or health checks) to determine whether the primary node is accepting connections. If a health check fails, the secondary node retries the connection for a specified period, after which it determines that the primary node is not functioning normally. The secondary node then takes over for the primary (a process called failover). When the secondary takes over from the primary, the configuration of both the nodes should be the same. If there exists a non-sync between the configuration of the devices, then the performance of the devices will be affected due to various external reasons like network connectivity, authentication failure etc. To avoid such non-synchronization, administrators have to frequently monitor the success/failure of the command propagation feature which helps in the synchronization process. The HA Events test helps administrators in this regard!

By carefully analyzing the syslog file, this test reports the number of times the ADC system in a HA setup has stopped and the number of times the command propagation failed/was successful. In addition, this test reports the number of times the ADC device has switched over from primary to secondary in a HA setup. Using this test, administrators may be able to figure out the effectiveness of the High availability setup of the ADC device.

For this test to run and report metrics, the ADC appliance should be configured to create a Syslog file in a remote Syslog server, where the details of all interactions with the ADC appliance will be logged. To know how to configure a remote Syslog server for the use of the ADC appliance, refer to Creating a Syslog file in a remote Syslog server topic.

This test is disabled by default. To enable the test, follow the Agents -> Tests -> Enable/Disable menu sequence in the eG administrative interface, pick Citrix ADC VPX/MPX as the Component type, select Performance as the Test type, choose this test from the list of disabled tests list, and click on the < button.

Target of the test : An ADC VPX/MPX

Agent deploying the test : A remote agent

Outputs of the test : One set of results for the ADC appliance being monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed

Host

The IP address of the host for which the test is being configured.

Port

The port at which the host listens. By default, this is NULL.

Log File Path

This test reports metrics by parsing a Syslog file. Specify the full path to the Syslog file here.

Search String

By default, the Syslog file may contain information relating to a number of servers that are inter linked with the target ADC appliance. In order to obtain the metrics of the target ADC appliance alone, specify the hostname or the IP address of the target ADC appliance for which the logs are to be read from the syslog file, in the Search String text box. Using this search string the information in the Syslog file may be parsed and metrics may be collected.

Search String Index

Here, specify the cursor position after which the eG agent should search for the specified Search String (or the position up to which the eG agent should ignore while searching for the specified Search String) in the syslog file. For example, if the specified Search String appears in the syslog file at the 17th position, then you may need to specify the Search String as 16.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD Frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

ADC system stopped events

Indicates the number of times the ADC system in a HA setup was stopped.

Number

 

HA propagation failed

Indicates the number of times the HA Command Propagation failed.

Number

Command propagation is a feature of the ADC appliance that ensures that the commands run on the primary ADC appliance of the high availability setup are automatically run on the secondary ADC appliance.

When you run a command on the primary appliance, this feature ensures that the command runs on the secondary appliance before it runs on the primary appliance.

Ideally, the value of this measure should be zero. A HA Propagation may occur due to the following reasons:

  1. Network connectivity issues between the primary and secondary ADC appliances;
  2. Authentication failure between the primary and secondary appliances;
  3. Resources, such as Secure Socket Layer (SSL) certificates and initialization script customization are missing on the secondary appliance.

Administrators therefore are required to do the following in order to maintain the least possible value for this measure:

  1. Check the network connectivity between the primary and secondary ADC appliances;
  2. Verify the Remote Procedure Call (RPC) node settings on both the appliances.
  3. Run the command directly on the secondary appliance and verify the error message. The error might have occurred because a resource required for the command exists on the primary appliance but not on the secondary appliance. Ensure that the required resource exists on the secondary appliance as well.

If command execution fails on the secondary or times out when executing on the secondary, it may cause a non-sync between the configuration of the primary and the secondary.

HA propagation successful

Indicates the number of times the HA Command Propagation was successful.

Number

A high value is desired for this measure. A high success rate indicates that the configuration of the primary and secondary are in sync.

HA state changed

Indicates the number of times the HA state has changed for the ADC device i.e, the ADC device has changed from primary to secondary and vice versa.

Number

Frequent change in the high availability state of a ADC device indicates serious load balancing and network issues which may sometimes lead to non – synchronization between the primary and secondary devices.

Cluster state changed

Indicates the number of times the cluster state has changed.

Number