Citrix HDX User Active Sessions Test

In order to ensure that the user experience with applications/desktops deployed on a XenApp/XenDesktop environment remains ‘superlative’ at all times, administrators should be able to proactively detect potential slowdowns when accessing applications/desktops, precisely pinpoint the user session affected by the slowdown, accurately isolate the root-cause of such slowness, and rapidly initiate measures to eliminate the root-cause. The Citrix HDX User Active Sessions test facilitates all the above, and thus assures users of uninterrupted application/desktop access!

For a user session that is currently active on a XenApp server or a XenDesktop virtual desktop, this test measures session latencies and leads you to the probable cause of session slowness (if any) - is it the network? the server hosting the applications/desktops? or are the applications (in the case of sessions to a XenApp server) taking too long to startup? If a latent network is causing the slowness, then the test provides administrators with detailed insights into network performance and enables them to rapidly figure out where the bottleneck lies - on the client-side network? or on the server-side network? This way, the test promptly leads administrators to slow user sessions, and also reveals what is causing the slowness, so that administrators can initiate the right steps to enhance user experience with applications/desktops.

Target of the test : An AppFlow-enabled NetScaler Appliance

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each session for every user to a XenApp server / XenDesktop virtual desktop

First level descriptor: User name

Second level descriptor: Session GUID

An application session is identified by a single session GUID, regardless of the number of applications accessed by a user during that session.

A desktop session is identified by a separate session GUID - one each for every desktop that is accessed

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed. It is recommended that you set the test period to 5 minutes. This is because, the eG AppFlow Collector is capable of capturing and aggregating AppFlow data related to the last 5 minutes only.

Host

The host for which the test is to be configured.

Cluster IPs

This parameter applies only if the NetScaler appliance being monitored is part of a NetScaler cluster. In this case, configure this parameter with a comma-separated list of IP addresses of all other nodes in that cluster.

If the monitored NetScaler appliance is down/unreachable, then the eG AppFlow Collector uses the Cluster IPs configuration to figure out which other node in the cluster it should connect to for pulling AppFlow statistics. Typically, the collector attempts to connect to every IP address that is configured against Cluster IPs, in the same sequence in which they are specified. Metrics are pulled from the first cluster node that the collector successfully establishes a connection with.

Enable Logs

This flag is set to No by default. This means that, by default, the eG agent does not create AppFlow logs. You can set this flag to Yes to enable AppFlow logging. If this is done, then the eG agent automatically writes the raw AppFlow records it reads from the collector into individual CSV files. These CSV files are stored in the <EG_AGENT_INSTALL_DIR>\NetFlow\data\<IP_of_Monitored_NetScaler>\hdxappflow\actual_csv folder on the eG agent host. These CSV files provide administrators with granular insights into the HDX appflows, thereby enabling effective troubleshooting.

Note:

By default, the eG agent creates a maximum of 10 CSV files in the actual_csv folder. Beyond this point, the older CSV files will be automatically deleted by the eG agent to accommodate new files with current data. Likewise, a single CSV file can by default contain a maximum of 99999 records only. If the records to be written exceed this default value, then the eG agent automatically creates another CSV file to write the data.

If required, you can overwrite these default settings . For this, do the following:

  1. Login to the eG agent host.
  2. Edit the Netflow.Properties file in the <EG_AGENT_INSTALL_DIR>\NetFlow\config directory.
  3. In the file, look for the parameter, csv_file_retention_count.
  4. This is the parameter that governs the maximum number of CSV files that can be created in the auto_csv folder. By default, this parameter is set to 10. If you want to retain more number of CSV files at any given point in time, you can increase the value of this parameter. If you want to retain only a few CSV files, then decrease the value of this parameter.
  5. Next, look for the parameter, csv_max_flow_record_per_file.
  6. This is the parameter that governs the number of flow records that can be written to a single CSV. By default, this parameter is set to 99999. If you want a single file to accommodate more records, so that the creation of new CSVs is delayed, then increase the value of this parameter. On the other hand, if you want to reduce the capacity of a CSV file, so that new CSVs are quickly created, then decrease the value of this parameter.
  7. Finally, save the file.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD Frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Active applications

Indicates the number of applications currently accessed by this user session.

Number

To know which applications are active in this user session, use the detailed diagnosis of this measure.

This measure is reported only for application sessions, and not desktop sessions.

Application launches

Indicates the number of applications that were launched by this user session.

Number

To know which applications were launched by this user session, use the detailed diagnosis of this measure.

This measure is reported only for application sessions, and not desktop sessions.

Application terminates

Indicates the number of applications terminated by this user session.

Number

To know which applications were terminated in this user session, use the detailed diagnosis of this measure.

This measure is reported only for application sessions, and not desktop sessions.

Session status

Indicates the current status of this session.

 

The values that this measure can take and the numeric values that correspond to each measure value are listed in the table below:

Measure Value Numeric Value
Active 0
SR successful 1000
Existing ICA session got terminated 1001
Existing ICA connection got terminated and SR failed 1002
Existing ICA connection terminated and SR failed and client is trying to do ACR and is successful 1003

Note:

Typically, this test reports the Measure Values in the table above to indicate session status. In the graph of this measure however, the same is indicated using the numeric equivalents only.

Average application startup duration

Indicates the average time that elapsed between when an application accessed by this session was launched and when it started running.

Msecs

A high value for this measure indicates that one/more applications are starting up slowly on the server. In this case, use the detailed diagnosis of the Active applications measure to know which application is the slowest in starting up.

This measure is reported only for application sessions, and not desktop sessions.

RTT

Indicates the screen lag experienced by this session while interacting with applications/desktops.

Msecs

A high value for this measure is indicative of the poor quality of a user’s experience with applications/desktops.

To know the reason for this below-par UX, compare the value of the WAN latency, DC latency, and Host delay measures of that session.

WAN latency

Indicates the average latency experienced by this user session due to problems with the client side network.

Msecs

A high value for this measure indicates that the client side network is slow.

If the value of the RTT measure is abnormally high for a session, you can compare the value of this measure with that of the DC latency and Host delay, and measures of that user session to know what is causing the slowness – is it the client side network? the server side network? or the server hosting the applications/desktops? 

DC latency

Indicates the average latency experienced by this session due to problems with the server side network.

Msecs

A high value for this measure indicates that the server side network is slow.

If the value of the RTT measure is abnormally high for a session, you can compare the value of this measure with that of the WAN latency and Host delay, cy measures of that session to know what is causing the slowness – is it the client side network? the server side network? or the server hosting the applications/desktops? 

Host delay

Indicates the delay that this session experienced when waiting for the host to process the packets. 

Msecs

A high value for this measure indicates a processing bottleneck with the server hosting the applications.

If the value of the RTT measure is abnormally high for a session, you can compare the value of this measure with that of the WAN latency and DC latency, measures to know what is causing the slowness – is it the client side network? the server side network? or the server hosting the applications/desktops? 

Bandwidth

Indicates the rate at which data is transferred over this ICA session.

Kbps

Ideally, the value of this measure should be low.

A high value indicates excessive bandwidth usage by the session.

Compare the value of this measure across sessions to know which session is consuming bandwidth excessively.

Bytes

Indicates the total bytes consumed by this session.

Bytes

Compare the value of this measure across sessions to know which session has the maximum throughput and which has the least.

Client side retransmits

Indicates the number of packets retransmitted on the client side connection during the last measurement period.

Number

Ideally, the value of these measures should be 0.

 

Server side retransmits

Indicates the number of packets retransmitted on the server side connection during the last measurement period.

Number

Client side 0 window count

Indicates how many times in this session the client advertised a zero TCP window during the last measurement period.

Number

TCP Zero Window is when the Window size in a machine remains at zero for a specified amount of time.

TCP Window size is the amount of information that a machine can receive during a TCP session and still be able to process the data. Think of it like a TCP receive buffer. When a machine initiates a TCP connection to a server, it will let the server know how much data it can receive by the Window Size.

In many Windows machines, this value is around 64512 bytes. As the TCP session is initiated and the server begins sending data, the client will decrement it's Window Size as this buffer fills. At the same time, the client is processing the data in the buffer, and is emptying it, making room for more data. Through TCP ACK frames, the client informs the server of how much room is in this buffer. If the TCP Window Size goes down to 0, the client will not be able to receive any more data until it processes and opens the buffer up again.

The machine (client/server) alerting the Zero Window will not receive any more data from the host. This is why, ideally, the value of these measures should be 0.

A non-zero value warrants an immediate investigation to determine the reason for the Zero Window. It could be that the client/server was running too many processes at that moment, and its processor is maxed. Or it could be that there is an error in the TCP receiver, like a Windows registry misconfiguration. Try to determine what the client was doing when the TCP Zero Window happened.

These measures are reported only for application sessions, and not desktop sessions.

Server side 0 window count

Indicates how many times in this session the server advertised a zero TCP window during the last measurement period.

Number

Client RTO

Indicates how many times during the last measurement period the retransmit timeout got invoked in this session on the client side connection.

Number

An RTO occurs when the sender is missing too many acknowledgments and decides to take a time out and stop sending altogether. After some amount of time, usually at least one second, the sender cautiously starts sending again, testing the waters with just one packet at first, then two packets, and so on. As a result, an RTO causes, at minimum, a one-second delay on your network. A low value is hence desired for these measures.

These measures are reported only for application sessions, and not desktop sessions.

Server RTO

Indicates how many times during the last measurement period the retransmit timeout got invoked in this session on the server side connection.

Number

ACR counts

Indicates the total number of times the client automatically reconnected the user to this session.

Number

The Automatic Client Reconnect (ACR) policy setting, when enabled, allows automatic reconnection by the same client after a connection has been interrupted. Allowing automatic client reconnect allows users to resume working where they were interrupted when a connection was broken. Automatic reconnection detects broken connections and then reconnects the users to their sessions.

Session reconnects

Indicates the number of times this session reconnected.

Number

This measure includes only those times a user reconnected to a disconnected session by mechanisms other than the ACR setting.

Client SRTT

Indicates the RTT (round-trip time or screen lag time) of this session  smoothed over the client side connection. 

 

MSecs

TCP implementations attempt to predict future round-trip times by sampling the behavior of packets sent over a connection and averaging those samples into a ‘‘smoothed’’ round-trip time estimate, SRTT. When a packet is sent over a TCP connection, the sender times how long it takes for it to be acknowledged, producing a sequence, S, of round-trip time samples: s1, s2, s3.... With each new sample, si, the new SRTT is computed from the formula:

SRTTi+1 = (α x SRTTi) + (1 − α )xsi

Here, SRTTi is the current estimate of the round-trip time, SRTTi+1 is the new computed value, and α is a constant between 0 and 1 that controls how rapidly the SRTT adapts to change. The retransmission time-out (RTOi), the amount of time the sender will wait for a given packet to be acknowledged, is computed from SRTTi. The formula is:

RTOi = β x SRTTi

Here, β is a constant, greater than 1, chosen such that there is an acceptably small probability that the round-trip time for the packet will exceed RTOi.

These measures are reported only for application sessions, and not desktop sessions.

Server SRTT

Indicates the RTT (round-trip time or screen lag time) of this session, smoothed over the server side connection. 

MSecs

Client side NS delay

Indicates the average latency experienced by this session, which was caused by the NetScaler appliance when ICA traffic flowed from client network to server network.

Msecs

A high value for these measures indicates a processing bottleneck with the NetScaler appliance.

If the value of the WAN latency measure is abnormally high for an application session, you can compare the value of the Client side NS delay measure with the value of the Client jitter measure for that session to determine what could have caused network delays on the client side - a NetScaler appliance that was slow in processing traffic from the client? or a traffic congestion on the client side?

If the value of the DC latency measure is abnormally high for an application session, you can compare the value of the Server side NS delay measure with the value of the Server jitter measure for that session to determine what could have caused network delays on the server side - a NetScaler appliance that was slow in processing traffic from the server? or a traffic congestion on the server network?

Server side NS delay

Indicates the average latency experienced by this session, which was caused by the NetScaler appliance when ICA traffic flowed from server network to client network.

Msecs

Client jitter

Indicates the client side jitter.

Msecs

Jitter is defined as a variation in the delay of received packets. At the sending side, packets are sent in a continuous stream with the packets spaced evenly apart. Due to network congestion, improper queuing, or configuration errors, this steady stream can become lumpy, or the delay between each packet can vary instead of remaining constant.

A high value for these measures therefore is indicative of a long time gap between ICA packets. To know where the delay is longer – whether on the client side or on the server side - compare the value of the Client jitter measure with that of the Server jitter measure.

Also, if the value of the Round trip time – RTT measure is abnormally high for a user, then you can compare the values of these measures with that of the WAN latency and DC latency measures to know what is causing the problem – the client side network? or the server side network?

These measures are reported only for application sessions, and not desktop sessions.

Server jitter

Indicates the server side jitter.

Msecs

Use the detailed diagnosis of the Active applications measure to know which applications are being actively used by a user session. The application startup time, startup duration, application uptime, and module path are displayed for each active application. From this, you can quickly identify applications that took too long to startup and applications that restarted recently, and initiate investigations to find the reasons for the same.

Figure 14 : The detailed diagnosis of the Active applications measure reported by the Citrix HDX User Active Sessions test

Use the detailed diagnosis of the Application launches measure to know which applications were launched during a user session.

Figure 15 : The detailed diagnosis of the Application launches measure reported by the Citrix HDX User Active Sessions test

The detailed diagnosis of the Session status measure provides additional details of a user session. If the status of a session is abnormal, you can use these details to know from which client the user is connecting, the client type and version, which server the user is connecting to, the start time, and the uptime of the session. This will help in troubleshooting the abnormal session status.

Figure 16 : The detailed diagnosis of the Session status measure reported by the Citrix HDX User Active Sessions test