Citrix application and desktop virtualization technologies are widely used by organizations that are embarking on digital transformation initiatives. The success of these initiatives is closely tied to ensuring a great user experience for end users as they access their virtual apps and desktops. Given the multitude of components and services that make up the Citrix delivery architecture, administrators constantly face an uphill challenge in measuring performance and knowing what key performance indicators (KPIs) to monitor.
Authored by George Spiers:
I’m happy to be writing this blog on the eG Innovations website. Thanks for the opportunity. This topic is very current and a need-of-the-hour for Citrix administrators. Whether it’s a new Citrix deployment or migration or expansion, a frequent question Citrix administrators ask is, “What performance metrics or KPIs should I track to know how my Citrix infrastructure is running well?” I have listed below the top 10 metrics that I think are very important for any Citrix administrator to track and monitor periodically.
#1 Logon Times
In terms of metrics that make up the user experience, logon times are one of the most important for an end-user and for the organization. Citrix administrators often struggle with managing logon times effectively, mainly due to the complexity of their environments and the various components that need to be performing optimally.
According to the 2018 Citrix Migration Survey, 59% of 795 Citrix professionals voted that slow logons were the number one problem and the most common complaint they received through to the helpdesk. Slow logon was voted higher than all other common problems, such as printing issues, frozen sessions, slow application launches and so on.
Logon times can be controlled by effectively using and managing Group Policies, only applying the settings a user/desktop or application needs to run and to be secure. Logon scripts, on one hand, should be minimised or preferably eliminated completely, and drive mapping, printer mapping, application shortcut creation and so on can be controlled with technologies like Citrix Workspace Environment Management, which shifts processing of these items until after the logon has completed.
Compute also affects logon time performance. So, flash storage and adequately sized application or desktop virtual machines will go a long way to achieving a more positive effect on logon.
With the importance placed on logon times, you must monitor logon performance in real time and make use of reporting capabilities to review logon time statistics over a period of time against initial baselines. Logon time optimization requires ongoing effort throughout the life of a Citrix deployment.
#2 Failed User Connections/Application Launch Failures
Failed user connections or application launches could indicate a problem with the VDAs. They also highlight that a user is not able to be productive, with the business losing out on that productivity. This, in turn, affects the service quality and results in lost revenue. When a user attempts to connect to their desktop or application and fails to do so, Citrix Director will flag up on the main dashboard the number of times connections have failed over the past hour.
User connections and application launch can fail due to a problem user-side, server-side, or somewhere in the middle. Some of the examples below are common causes of connection failures:
- The end-user device runs an old version of Citrix Workspace app (formerly Citrix Receiver), Workspace app is corrupt, or does not exist on the user device at all. For example, the user received a new device, but the Workspace app was not installed.
- The end-user has clicked to launch an application or desktop, but the launch.ica file has not launched automatically and has asked the user to save the file. The user has saved the file but forgot to actually double-click it. When launch.ica has not opened automatically, this is normally due to a browser configuration issue.
- Firewalls can block the communication between an end-user device and their virtual desktop, and this can prevent a connection from completing.
- A network communication issue is preventing session establishment from completing.
- The connection takes too long to complete and eventually times out. This can be due to poor network performance or other problems in the infrastructure.
You will always get the odd instance where a user connection has failed. Sometimes technology just decides not to work on the first attempt but the immediate second attempt goes through. However, you should be monitoring the failure rate closely; if several failures happen at the same time, it can indicate a more serious problem in your environment.
#3 Failed Server or Desktop VDAs
Failed server or desktop VDAs result in more sessions being handled by the other healthy VDAs that offer the same resource. This could lead to resource constraint on the healthy VDAs and slow their response times, or even more critically, lead to denial of service.
When a desktop or server VDA has failed, or is simply not online, depending on the workload type, Citrix Delivery Controllers may try and power them on after discovering that they are unregistered. If registration fails for a prolonged period of time, the VDA may be placed in Maintenance Mode. Citrix Director will also show the amount of VDAs that are unregistered but not in Maintenance Mode.
If virtual desktops have failed, the pool of worker machines providing access to your critical resources has just shrunk in size. This means that there is more demand essentially being placed on the remaining healthy desktops.
Hence, you need to monitor failed server or desktop VDAs so that your users do not face denial of service or a slower in-session experience due to increased demand on resources that are left to deal with the burden of failed workers.
#4 Connections to SQL Server from Delivery Controllers
Citrix Delivery Controllers are the only component in FMA architecture that talks to the SQL Database. Generally, per best practice, a Citrix environment will be backed up by a SQL solution that is highly available using technologies such as AlwaysOn. Citrix Virtual Apps and Desktops also provides redundancy with the Local Host Cache feature, which allows for users to access their resources when a connection to the SQL database is not available.
If a Delivery Controller loses connection to SQL, the Citrix Site will enter the Local Host Connection mode. While this is OK and per design, you do not want to over-rely on LHC. The primary goal is to re-establish the connection between Delivery Controllers and SQL. If you find that LHC is being initiated too frequently, it indicates a problem with the Delivery Controllers, SQL, or the communication in between.
It is important to monitor Local Host Cache events in case there is a repetitive connection problem that needs addressed. When Local Host Connection is in use, the Application log of Event Viewer will contain this information.
#5 Application Launch Times
If an application launch is slow, you will likely hear about it from your end-users. The problem escalates when users go to a meeting or go on lunch etc. and then return and have to re-launch the application. If a user moves between multiple machines throughout the workday and uses different credentials, this may also mean the application launch has to take place from scratch.
Application launch time can also be impacted by logon times, which was described above, but slow logons are not all that impacts the overall application launch time.
In recent versions of Citrix Virtual Apps and Desktops, for example, Citrix has optimised brokering code that allows the brokering process to complete quicker. The example below shows the brokering improvements beginning in Citrix XenApp & XenDesktop 7.11.
|Before XA/XD 7.11
(90ms la tency)
|Brokering request per sec||3.7||12.6|
|Time to launch 10k users||44m 55s||13m 10s|
Other factors that cause slow application launch aside from slow logons and latency are virtual apps and desktops that are experiencing frequently high CPU and RAM consumption, overloaded hypervisors, slow storage, or incorrectly configured Operating System settings.
Session Prelaunch can help improve the speed of application launch times, but this should not be used as a workaround to slow application launch times, so it is essential to monitor the times and take appropriate action.
#6 Sessions That Have Been Disconnected a Long Time
Disconnected sessions that elapse the idle and disconnected timers set by Citrix policies or Group Policies may indicate a hung session that needs to be cleared before affected users can log on again. With Citrix Director and Studio, you can quickly determine how many sessions are in a disconnected state and for how long they have been in that state.
In many environments, Group Policy or Citrix policies will be configured to disconnect idle sessions after a period of time, and afterwards, reset those sessions. Most of the time, sessions will reset. However, I have seen in certain environments instances where a disconnected session has not reset when it should have. This causes the session to remain disconnected indefinitely without the input of a Citrix administrator to often reboot the virtual desktop or server. Sessions could remain disconnected due to hung processes that would not end or logoffs that have hung or have not processed properly.
When a session is in this state, affected users will not be able to log on, so it is imperative that you keep an eye on these disconnected sessions and clear them before end-users are impacted. This is a short-term action, and the long term should be to investigate why sessions end up in this state in the first place.
#7 Round Trip Time (RTT)
ICA round trip time is measured from when a user presses a key or within their virtual desktop screen or application until the response is displayed back to the user on their endpoint. High ICA round trip time can indicate poor in-session performance. Poor in-session performance is one of the most frustrating experiences for an end-user, and unfortunately can be common in a virtual end-user computing environment.
At times, RTT will be impacted by the end-user network (WAN), but other times it can be impacted by an underperforming Internet Service Provider, datacentre network, Citrix Gateway, VDAs, or even the end-user device.
When a session experiences high latency, the end-user will notice that in simple terms their keystrokes and mouse clicks will appear lagged. This results in a frustrating experience for the end-user and reduces end-user productivity.
Your users will most likely experience high latency when working remotely. It is impossible as a Citrix administrator to control the network an end-user uses to connect to Gateway and the hops an ICA connection takes to reach Gateway. Citrix has released some features, such as Adaptive Throughput, Adaptive Display v2 and Adaptive Transport, that can help with the end-user experience under the most challenging network environments. However, not every Citrix customer will run the required Virtual Apps and Desktops release to make use of these features or turn them on. That said, these features are there to help, but certainly not eliminate, a problem such as high latency.
High round-trip time can also be caused by overloaded VDAs, among many things. If a virtual machine is slow to respond, the round-trip time will effectively increase. If bandwidth is constrained or saturated, round-trip time will understandably also increase. Even if a user sits too far away from their router when using wireless, or their wireless network adapter is experiencing issues, latency can be the undesirable result.
Round-trip time can be monitored by Citrix Director, or Citrix Application Delivery Management if dealing with Citrix Gateway and remote workers. It is important you closely monitor latency metrics, especially if multiple users begin to experience high round trip time, which could be caused by an issue within the datacentre.
#8 Session Counts on Virtual Apps VDAs
If you have a Delivery Group that evenly balances out sessions across each VDA that is a member of that group, you would expect a Delivery Group of 5 VDAs and 40 concurrent sessions to result in roughly 8 users per VDA. However, what if one member VDA is hosting much fewer sessions, i.e. 1 or 2? That would potentially indicate an issue with that VDA. I have seen this before in environments, and it is hard to spot otherwise as the VDA in question remains registered and online.
However, because the VDA is still registered, online, and subsequently not in Maintenance Mode, users still have the potential to be brokered to that VDA. If a user is brokered to that VDA, they may encounter an issue where they see an indefinite black screen or their logon splash screen spins around forever. In my experience, users have experienced just that, and the only way to resolve it is for an administrator to restart the virtual desktop.
As such, it is essential to check VDA machines and their session load, and make sure that any machine encountering this issue is restarted or repaired before more users are impacted.
As mentioned before, because these desktops are difficult to identify through Citrix Director as having a problem, it potentially becomes a daily morning check to use Studio and physically browse to the Delivery Group and visualise the session count, or have some monitoring logic that can check it for you.
#9 PVS vDisk Replication
Provisioning Servers are great at streaming operating system images to Target Devices. However, there is plenty of opportunity for the streaming service to stop, such as during patching reboots, scheduled reboots, or even just faults that cause the service to crash or the server to reboot.
Because such scenarios are possible, we need to make sure that our PVS solution is highly available. We do this simply by building out more PVS servers and making sure each vDisk (image) is available for streaming from each server. This is accomplished by replicating the vDisk across the store locations that each PVS server is pointed to. These store locations can be local disks attached to the PVS servers, or network storage.
If a Provisioning Server stops streaming to a Target Device, the streams should failover to another server. However, if a vDisk is not replicated across your multiple PVS servers, you are reducing the redundancy rate and potential for a PVS server to take over streaming if a PVS server fails. Depending on how many PVS servers you have, this could be a minor to major outage.
For this reason, you should make sure that a vDisk is replicated correctly across all PVS servers that it should be replicated to. The PVS Console can easily tell you if a disk has been replicated and is accessible from your multiple PVS servers.
#10 Remote Desktop Services CALs
Remote Desktop Services Client Access Licenses (CALs) are just as important as Citrix licenses when running published apps or published shared desktops. If you exceed your CAL count, users could be denied access to their resources.
First and foremost, it is important that you, likely via Group Policy, have all your Virtual Apps servers pointing to a Remote Desktop License server that contains enough CALs to service your user base. CALs come in the form of per-user or per-computer based format. Per-computer based CALs are assigned to each device and cannot be overallocated. Per-user CALs are assigned to each connecting user in Active Directory and can be overallocated, however you do breach Microsoft licensing in doing so. You can revoke 20% of per-computer CALs, but 0% of per-user CALs.
If a CAL is not available for a connecting user, particularly with per-computer licensing, the connection will abruptly close and, in some cases, no error message will be displayed to the end user. This sets confusion upon the end user, causing them to repeatedly attempt to relaunch the application or published desktop before eventually ringing in to the service desk and logging a ticket.
The Remote Desktop Licensing Manager allows you to create CAL usage reports on per-user CALs, but not for per-computer. Alternatively, you can check the RD Licensing Manager console to see how many CALs are issued per-user or per-computer, and how many are available. Some third-party solutions are also used to proactively monitor CAL usage.
I recommend that you start by tracking these metrics. There are of course many more things to monitor. Having a Citrix monitoring tool in place would probably be the best thing to do. There are various options in the market. eG Innovations also has one that can monitor and proactively alert you to problem before your Citrix end users are affected.