All posts by Vinod Mohan

Top 7 Performance Problems in .NET Applications and How to Solve Them

Microsoft .NET Framework is one of the most popular application development platforms and programming languages. C# and ASP.NET frameworks are used by millions of developers for building Windows client applications, XML Web services, distributed components, client-server applications, database applications, and so on. It’s no surprise that ensuring top-notch performance of .NET applications is a foremost need for most application owners and developers.

There can be numerous reasons for why .NET applications can be slow. These include incorrect memory sizing, GC pauses, code-level errors, excessive logging of exceptions, high usage of synchronized blocks, IIS server bottlenecks, and so on. In this blog, we will look at some of the top performance problems in .NET applications and provide tips to troubleshoot and resolve them.

Top 7 .NET Application Performance Problems

  1. Exceptions and Logs One Too Many
  2. Overuse of Thread Synchronization and Locking
  3. The Dreadful Application Hangs
  4. Frequent Garbage Collection Pauses
  5. IIS Server Bottlenecks
  6. Slow Database Calls
  7. Infrastructure at Fault: Not a .NET Problem, But Still a Problem for .NET!

#1 Exceptions and Logs One Too Many

.NET exceptions are not a bad thing. Only errors are bad. This is what most developers believe. And it is true if exceptions are properly handled, i.e., thrown, caught and addressed (and NOT ignored). Too many cooks spoil the broth and it’s the same with exceptions. Too many unhandled exceptions can make the code inefficient and affect application performance. Hidden exceptions are worse: a minefield. When left unchecked, they can affect web page load times.

Another .NET problem is excessively logging the exceptions. Logging could be a great tool in your debugging arsenal to identify abnormalities recorded at the time of application processing. But when logging is set up to catch exceptions at every tier of the application architecture, one could end up having the same exception logged at the web, service and data tiers. This could add additional load to the application code and increase response time. In production environments, one needs to be careful to only log fatal events and errors. Logging everything including informational messages, debugs and warnings can easily bloat your production log file and in turn affect code processing.

Helpful Troubleshooting Tips:
  • Make sure your C# code has “try catch finally” blocks to handle exceptions.
  • Leverage exception filters available in C# 6 and above, which allows specifying a conditional clause for each catch block
  • Check for null values and use TryParse to avoid potential exceptions.
  • Pay attention to second-chance exceptions, as these indicate that the first-chance exception came up and it wasn’t handled properly.
  • Use exception handling and logging libraries such as Enterprise Library, NLog, Serilog, or log4net to log exceptions to a file or a database.
  • Make sure you log exceptions only as much as needed and not end up bloating the log file.

#2 Overuse of Thread Synchronization and Locking

The .NET Framework offers many thread synchronization options such as inter-process mutexes, Reader/Writer locks, etc. There will be times when a .NET developer would write the code in such a way that only one thread could get serviced at a given time and the other parallel threads coming in for processing will have to wait in a queue. For example, a checkout application, according to its business logic, should process items one request at a time. Synchronization and locking help serialize the incoming threads for execution. By creating a synchronized block of code and applying a lock on a specific object, an incoming thread is required to wait until the lock on the synchronized object is available. While this strategy helps in certain situations, it should not be overused. Too much serialization of threads will increase wait time of incoming threads and end up slowing down user transactions.

Helpful Troubleshooting Tips:
  • Use synchronized coding and locks only when necessary. Understand the need of the code execution before deciding to use locks.
  • Scope the duration of locks optimally so that they are acquired late and released early and does not hold other threads in wait for a long time.
  • To reduce concurrency issues, consider using loose coupling. Event-delegation models can also be used to minimize lock contention.
  • Monitor the .NET code using code profiling tools to identify if thread locks are causing slow application processing.

#3 The Dreadful Application Hangs

When a specific URL is slow, it is one thing to handle. But when the IIS web site just hangs and all or most web pages take forever to load, it couldn’t get any worse. Typically, when an application is overloaded or deadlocked, a hang could occur. There are two types of application hang scenarios that .NET applications usually encounter.

A hard hang (IIS issue): This usually happens at the beginning of the request processing pipeline – where the request is queued. Because of an application deadlock, all available threads could get blocked, causing subsequent incoming requests to end up in a queue waiting to be serviced. This can also happen when the number of active requests exceed the concurrency limit configured on the IIS Server. Such a hang would manifest as requests getting timed out and receiving 503 Service Unavailable errors. A hard hang affects all URLs and the whole web application itself.

Helpful Troubleshooting Tips:
  • Constantly track number of requests in queue in the IIS Server (Http Service Request Queues\ArrivalRate in Windows performance monitor). This should never exceed the request processing limit configured for the worker process.
  • Also track how long requests are waiting in queues (Http Service Request Queues\MaxQueueItemAge in Windows performance monitor). This will help detect if the application is facing a potential hang.
  • Also watch out for service unavailable and connection timeout errors by monitoring IIS server events.

A soft hang (ASP.NET issue): This usually happens due to a bad application code in a specific segment, impacting only a few URLs and not the full website. Typically, a hang caused by the ASP.NET controller or page happens at the ExecuteRequestHandler stage. To confirm this, you might want to break out a debugger to see exactly where the request is stuck. Check the module name, stage name and URL. The URL will indicate the controller/page causing the hang.

Helpful Troubleshooting Tips:
  • Verify whether IIS is the issue or not by checking the Http Service Request Queues\CurrentQueueSize counter in the Windows performance monitor. If it’s 0, then there is no request stuck in an IIS queue.
  • If it’s not an IIS issue, it must be a code-level problem in ASP.NET controller/page.
  • Identify which URL(s) are hanging and get a detailed request trace using any code profiling. Verify the module name and stage name at which the request hangs to confirm it’s an ASP.NET issue.
  • Code profiling using a transaction tracing tool can help identify the exact line of code where the problem exists.

#4 Frequent Garbage Collection Pauses

Garbage collection (GC) in .NET CLR is initialized when the memory used by the allocated objects on the managed heap exceeds the accepted threshold configured by the application developer. This is when the GC.Collect method jumps into action and reclaims the memory occupied by dead objects. GC in the CLR usually happens in Generation 0 heap where short-lived objects are stored. It is called Full GC when GC happens in Generation 2 heap, where long-lived objects are contained. Every time GC happens it adds a lot of CPU load on the CLR and slows down application processing. So, in the event of longer and more frequent GC pauses, the application tends to suffer slowdowns.

Helpful Troubleshooting Tips:
  • Size GC heap memory properly and make sure GC limits are set as required.
  • Avoid using objects and large strings where they are not needed.
  • Track instances of GC, time taken for GC, and % of GC time spent by the JVM.
  • Look for times when Full GC happens. This can cause application slowness.
  • Judiciously use server GC or workstation GC based on application needs.
  • Monitor the CLR layer end to end to identify memory usage, GC activity, CPU spikes, etc.

#5 IIS Server Bottlenecks

Microsoft IIS Server is a critical part of the .NET Framework. IIS is the web server which hosts the web application or web site built on .NET and runs W3WP process that is responsible to respond to the incoming requests. IIS also incorporates the Common Language Runtime (CLR), which is responsible to mete out resources for thread processing. Because IIS has various moving parts, a bottleneck in IIS could have a direct negative impact on the .NET application performance.

Commonly faced IIS Server problems:

  • Server overload due to overutilization of resources such as memory, CPU, etc.
  • High concurrent connections and connection drops
  • Application pool failure
  • Expiry of SSL certificates
  • High response time of the ASP.NET request handling service
  • High CLR wait time
  • Improper caching
  • HTTP errors including static and dynamic content errors and connection errors

Helpful Troubleshooting Tips:
  • Right-size the IIS server so there’s no resource contention or overutilization of resources.
  • Load balance with more IIS servers based on rate of incoming requests.
  • Track SSL certificate validity and get alerted proactively before a certificate expires.
  • Monitor all aspects of IIS performance, application pools, web sites and identify improper configuration and performance deviations.

#6 Slow Database Calls

It’s not always a .NET code issue that’s affecting application </performance. Slow-running queries are often a common cause. But it’s usually the .NET application developers who get blamed for slow application performance. The reason for this is there’s no contextual visibility of how SQL performance affects .NET application processing. ADO.NET and ODP.NET connectivity issues could be one reason for query processing slowness, but the common reason is that the queries are not well-formed. Improper execution plans, missing indexes, poorly designed schema, small buffer pools, missing joins, improper caching, connections not being pooled properly, etc. are also reasons why query processing by the database could be affected.

While the DBAs are responsible for the database performance and query creation, the .NET application owner needs to track down query-level issues during application processing. This will help distinguish between code-level and database problems and not have the .NET developers spend cycles looking for issues in the code.

Helpful Troubleshooting Tips:
  • Monitor query processing in context of application transactions to identify slow queries.
  • Plan database sizing and configuration properly to ensure the consistent performance.
  • Use database monitoring tools to identify and fix missing indexes, optimize the database layout by re-indexing, etc.
  • Track database connectivity with the application to isolate any connection issues.

BONUS TIP: In addition to slow database calls, there could also be slowness due to external calls, such as HTTP, Web Service, WCF. .NET code profiling will help catch .NET method-level issues, database query-level problems and slow remote procedure calls.

Distributed transaction tracing providing stack trace of .NET code processing

#7 Infrastructure at Fault: Not a .NET Problem, But Still a Problem for .NET!

.NET Framework is not a standalone tier. An application using .NET Framework will have many dependencies with the underlying infrastructure, such as any virtualized servers, containers or cloud infrastructure. Then, there could be backend storage devices. While these are not .NET problems directly, but a problem in any of these infrastructure components could affect .NET performance just the same.

Just like how we saw IIS Servers and database could have bottlenecks, a VM could be running out of resources, a SAN array could be experience high IOPS that it cannot handle, or if the .NET application is hosted on Azure there could be an App Service that is not running properly.

Network-related complaints top the chart in most application environments. There is always the blame game between whether it’s a network issue or the application issue. Network congestions, packet drops, or device failures could impact application performance and connectivity.

Helpful Troubleshooting Tips:

Total performance assurance of .NET application environment requires correlated visibility of dependencies between the application and the supporting infrastructure. Make sure you implement a converged application and infrastructure monitoring strategy to catch infrastructure issues.

While you focus on catching and troubleshooting all these issues, it is also important to keep in mind that writing clean and efficient code solves many problems on the .NET side. Write good code, keep your systems and infrastructure in good health, and implement necessary tools for monitoring automation. This will help you deliver high-performing .NET applications and digital experience.

How Can I Use Microsoft SCOM for End-to-End Performance Monitoring

Microsoft System Center Operations Manager (SCOM) deployments have been on the rise in recent years mainly due to its tight integration with other Microsoft servers and applications in the enterprise. While SCOM has native support to monitor Microsoft infrastructures, there is no out-of-the-box support for non-Microsoft technologies, such as VMware, Oracle, Citrix, SAP, and so on. Therefore, IT administrators are forced to take up a multi-tool approach and rely on third-party tools, which leads to monitoring tool sprawl incurring increased costs and maintenance efforts.

In this blog, we will look at the challenges involved in using multiple management packs with SCOM and explain how to make SCOM an end-to-end performance monitoring solution spanning Microsoft and non-Microsoft technologies.

The Multi-Management Pack Monitoring Challenge with Microsoft SCOM

Organizations using SCOM typically rely on management packs developed by third-party vendors and integrate them into SCOM to extend SCOM’s monitoring capability beyond Microsoft workloads. While this is a widely practiced approach and there are many management packs out there in the market, the biggest challenge comes in when there is a heterogeneous environment to monitor – say, for example, you have an environment with Java web-based applications, VMware vSphere hypervisor, EMC SAN arrays, F5 load balancer, Oracle backend databases, and Citrix VDI.

  • For each of these infrastructure tiers, Microsoft SCOM administrators would need to buy individual management packs from third-party vendors, deploy and manage them separately.
  • These management packs would be used to monitor the non-Microsoft components, gather performance metrics, and send them over to SCOM.

Ideally, SCOM administrators want to make SCOM a one-stop shop solution delivering centralized visibility of their entire environment. But despite the many management packs used, SCOM architecture does not allow for unified end-to-end monitoring. Every management pack would be managed as a separate folder in the SCOM console with little or no correlated visibility with one another or with the Microsoft tiers monitored by SCOM.

In a recent webinar, we asked our attendees about their biggest challenges when using SCOM for performance monitoring of their IT infrastructure, and the answers point to the need to use SCOM and SCOM management packs more effectively:

Biggest Challenges with Microsoft SCOM for End-to-End Performance Monitoring Source: Poll results from eG Innovations webinar
59% cite lack of visibility into non-Microsoft systems as their toughest challenge. Although Microsoft SCOM has built-in management packs for Microsoft applications (Active Directory, SQL server, Microsoft Exchange, IIS, etc.) and Windows servers, it has limited support for non-Microsoft servers and applications (e.g., Citrix XenApp and XenDesktop, VMware vSphere, SAP, Nutanix, etc.).
52% feel it is cumbersome to use SCOM with multiple management packs. The need for multiple management packs stems from the lack of support for non-Microsoft applications and platforms in SCOM. To monitor every tier of the IT infrastructure, users often need to purchase multiple management packs from different vendors – one for each non-Microsoft tier – and there is no integration between them.

Addressing the Multi-Management Pack Challenge with Microsoft SCOM

eG Enterprise is an end-to-end IT performance monitoring solution which offers a Universal Management Pack for SCOM augmenting Microsoft SCOM’s capabilities. Packed with performance insight from over 180+ applications, 10+ operating systems, 10+ virtualization platforms, and 20+ storage devices, the Universal Management Pack enables SCOM administrators to use SCOM as a one-stop shop monitoring solution for all their end-to-end monitoring needs. View supported technologies »

Instead of a multiple management pack approach, SCOM administrators should consider a universal management pack strategy, wherein they just use one management pack that has the capability to monitor all non-Microsoft tiers in their infrastructure. Combining the power of many management packs into one unified solution, SCOM administrators get access to health, availability and performance data from heterogeneous infrastructures and multi-vendor devices inside the SCOM console.

How a Universal Management Pack Strategy Can Make Microsoft SCOM an End-to-End Performance Monitoring Solution

One monitor to know it all: Universal management pack is a single, unified monitoring technology. So, there is no need to purchase, deploy and manage additional management packs from third-party vendors. eG Enterprise’s native API integrations transforms SCOM from a Microsoft system data collection console into an integrated end-to-end performance diagnosis engine.

One monitor to visualize the entire IT topology: It incorporates a unique infrastructure-wide unified management topology map that presents a correlated visualization of your heterogeneous environment on a single screen. With individual management packs, this is not possible. Problems can easily be identified from the topology and the SCOM administrator can zoom into the SCOM Health Explorer for further analysis.

One monitor for centralized alerting and correlated visibility: It adds performance alerts to SCOM that not only indicate where problems exist but also automatically prioritizes alerts in the SCOM console, so you can see exactly how a bottleneck in one tier is impacting other tiers or the end-user experience.

One monitor to optimize and right-size the infrastructure: It provides SCOM administrators many preset and custom reports from every part of their infrastructure, across all silos. Understanding all factors that impact performance leads to smart growth and maximum IT value.

One monitor for deep-dive diagnostics: Easily drill down from SCOM into the eG Enterprise web console for deep-dive diagnostics about performance of a problematic application, server, or device. SCOM administrators can also publish customized monitoring dashboards in eG Enterprise as part of the SCOM console.

Learn how to make #Microsoft #SCOM a truly end-to-end performance monitoring solution for both Microsoft and non-Microsoft infrastructures.
#SystemCenter #Windows #Azure #OMS #Cloud

With a multi-management pack strategy, gaining these capabilities would require a new siloed management pack for each new service/component and extensive manual intervention to perform event correlation and issue diagnosis. The eG Universal Management Pack transforms SCOM into a unified, end-to-end solution for understanding and managing all performance-impacting factors across your entire infrastructure, saving you time and allowing you to proactively solve issues before they affect your end users.

Helpful Resources:

Using Microsoft SCOM to Monitor Citrix XenApp and XenDesktop: What Are Your Choices?

Not surprisingly, many organizations that use Citrix technologies for remote desktop and application access are Microsoft Windows shops. Many of these organizations already have a deployment of Microsoft System Center Operations Manager (SCOM) for infrastructure monitoring.

While SCOM is a great tool for monitoring Microsoft applications, there is no native support for monitoring Citrix XenApp and XenDesktop. For organizations that need to monitor their Citrix XenApp and XenDesktop using Microsoft SCOM, there are only two management pack options available in the market. In this blog, we will compare the capabilities of both of these options.

Option 1: The Citrix SCOM Management Pack for XenApp and XenDesktop

This is a native management pack available to Citrix Platinum customers. Formerly owned by Comtrade, Citrix acquired this management pack in January 2016.

Key Capabilities:
  • Monitors all Citrix tiers such as XenApp, XenDesktop, XenServer, StoreFront, NetScaler, Provisioning Servers, License servers, etc.
  • Auto-discovers all Citrix components in your Citrix farm and adds them to SCOM
  • Organizes Citrix components in SCOM’s native folder-based topology structure
  • Displays performance alerts for Citrix components within the SCOM console
  • Offers performance reports for historical trends and analytics
Limitations:
  • The Citrix SCOM Management Pack is available at no cost to Citrix Platinum customers with active Subscription Advantage or Software Maintenance. However, if you are run Enterprise, Advanced, Fundamentals, or other forms of Citrix licensing, you must upgrade to Citrix Platinum in order to integrate with SCOM.
  • Frequently, Citrix administrators are blamed for Citrix slowdowns and are forced to prove that it is caused by one of the tiers supporting Citrix (e.g., virtualization, storage, network, desktop, etc.). And, because the Citrix SCOM Management Pack provides monitoring of the Citrix tiers only, in order to monitor the other infrastructure tiers that support Citrix – e.g., VMware, Nutanix, storage, etc. – you will need to find additional third-party management packs for SCOM from other vendors.
  • Without the advantage of end-to-end monitoring across the entire Citrix infrastructure, administrators using the Citrix Management Pack for SCOM lack correlated visibility. Therefore, manual effort and expertise are required to pinpoint the root cause of Citrix issues, adding time and complexity to troubleshooting performance issues.
  • For monitoring just the Citrix tiers, the Citrix SCOM Management Pack is still not a complete solution. As one example, it lacks the ability to simulate user logons for synthetically monitoring and alerting on the Citrix infrastructure.
  • The topology views available in SCOM represent groupings of the different tiers, but not the inter-dependencies between tiers, so they cannot be used to pinpoint the root cause of problems by linking the cause and effects of problems.

Option 2: The eG Universal Management Pack for SCOM

With the eG Universal Management Pack for SCOM, SCOM administrators can monitor all the Citrix tiers and the supporting infrastructure from the same console. They can see any issues detected in the infrastructure, drill down to see from which layer/server the problem originates and even diagnose further to see how the problem can be corrected quickly. Customized dashboards can also be exported from eG Enterprise into SCOM, so different stakeholders can see the KPIs of interest from inside the SCOM console itself.

This Management Pack is just one component eG Enterprise: a comprehensive performance monitoring solution for Citrix environments, with out-of-the-box support for heterogeneous infrastructures and multi-vendor platforms. Unlike the Citrix SCOM Management Pack, there are no Citrix license restrictions on the eG Universal Management Pack for SCOM. It is available for all Citrix customers.

The eG Universal Management Pack for SCOM gathers performance insights from all Citrix tiers and non-Microsoft tiers (such as VMware, Oracle, Nutanix, SAP, etc.) and presents them inside the SCOM console, delivering single-pane-of-glass visibility for SCOM administrators. Without requiring multiple management packs from different vendors, SCOM administrators can get centralized and unified visibility of the entire Citrix and non-Citrix infrastructures, all from inside Microsoft SCOM.

Key Capabilities:

With the eG Universal Management Pack for SCOM, you can:

  • Monitor of all aspects of the Citrix infrastructure (XenApp, XenDesktop, XenServer, NetScaler, StoreFront, PVS, etc.) and the non-Citrix tiers supporting Citrix access (e.g., network, virtualization, storage, database, cloud, etc.)
  • Auto-discover all Citrix and non-Citrix components, and get performance insights within the SCOM console
  • See a consolidated listing of alerts for all IT components, all available within the SCOM console. Leverage the automatic root cause diagnosis functionality of eG Enterprise to trigger intelligent alerts, thus reducing alert volume and false positives in SCOM.
  • Incorporate logon simulation and full session simulation in your arsenal of tools for Citrix monitoring, thereby enabling proactive problem detection even when no users are connected to your Citrix farm
  • Access purpose-built dashboards in the SCOM console for quick and easy problem diagnosis
  • Conveniently drill down into eG Enterprise for further analysis and diagnosis of performance problems. Also access historical performance reports for trending, forecasting, capacity planning, infrastructure optimization, right-sizing, and compliance.

Helpful Resources:

Two Easy Methods for Troubleshooting Citrix Logon Issues

One of the most challenging tasks for a Citrix administrator is troubleshooting Citrix logon issues. Citrix session logon is the very first step that a user performs when connecting to a Citrix farm to access virtual desktops or applications. When logon fails or is slow, it directly impacts the productivity of the user.

52% of respondents to the recent survey of Citrix professionals by eG Innovations and DABCC ranked slow logon as the most common user complaint.

There are over a dozen steps in the Citrix logon process (this is true for both Citrix XenApp and XenDesktop), and slowness in any one of these steps can make Citrix logon slow. Furthermore, many of these steps rely on external servers and services (e.g., authentication on Active Directory, profile loading from a profile server, etc.). This makes logon troubleshooting a challenge for Citrix admins, as they must determine the root cause of the problem for effective resolution. In this article, we will look at two ways to easily detect, diagnose and triage Citrix logon issues.

#1 Reactive Troubleshooting with Real User Logon Monitoring

When a Citrix user calls in to report a logon issue, the IT staff must be able to immediately diagnose it in real time and identify the cause. An essential requirement for fast response is a Citrix real-time monitoring and historical reporting solution that provides immediate visibility into every Citrix user logon. Ideally, the monitoring solution must be able to track the overall logon time and, additionally, provide a breakdown of which step in the logon process was slow.

Slow logon is one of the toughest problems for #Citrix admins to solve. Learn how you to troubleshoot slow #XenApp and #XenDesktop logons in two easy ways.

 

With this data in place, it is then possible to equip operations personnel with the details they need to troubleshoot slow Citrix logons. For example, when a user “Keith” calls, complaining of slow Citrix logons, the administrator can access a Citrix logon performance report from the web console that provides details on Keith’s recent logons (past hour or any custom period). This analysis of Keith’s historical logon details can be performed even if he is not still logged into the Citrix farm.

There are various steps involved in enabling Citrix XenApp logon: brokering, VM start, HDX connection establishment, AD authentication, Group Policy processing, logon script execution, profile loading, interactive session processing, etc. A problem in any one of these steps will cause logon slowdown.

Using a logon performance report, the Citrix admin can view the processing time for each logon step and identify the exact step that is causing a slowdown – allowing the team to focus on troubleshooting the issue rather than spending time diagnosing it.

From the above example, we can see that the interactive session duration for the user Keith is very high in comparison to the other logon steps. This gap allows the administrator to confirm the issue reported by Keith and, further, pinpoint the exact step of the logon causing the slowdown.

Going deeper into analyzing the cause of slowness, the administrator can diagnose why interactive session duration is so high. With the new FlexCast Management Architecture (FMA) in Citrix XenApp and XenDesktop 7.x, logon processing occurs in two places:

  • The Delivery Controller, which is responsible for managing user access, brokering and optimizing connections
  • The XenApp server or XenDesktop VM where the Citrix session gets created, the drive gets mapped, and user credentials are authenticated

To troubleshoot Citrix logon issues, visibility into logon processing actions in both the Delivery Controller and the XenApp server/XenDesktop VM is necessary.

  • The breakdown of logon time as shown above is the view obtained from the Citrix Delivery Controller.
  • The figure below shows the drilldown capability available – for viewing the metrics obtained from the XenApp server/XenDesktop VM to further troubleshoot this issue.

In this example, clearly most of the processing time is spent handling Group Policies. The details provided in this drilldown highlight that one of the client-side extensions (CSE) is responsible for the slowdown.

This example highlights how, in just a couple of clicks, the Citrix admin can identify the exact cause of Citrix logon slow performance. Using an effective Citrix monitoring solution, helpdesk personnel can identify logon issues without assistance, reducing the troubleshooting effort required by Citrix administrators.

#2 Proactive Troubleshooting with Synthetic Logon Monitoring

Citrix administrators are constantly looking for ways to be proactive – detecting and fixing performance issues before users notice. Real user monitoring, as shown earlier, is useful to detect issues after they happen. But to help Citrix administrators proactively identify logon issues, a Citrix logon simulator can be used.

Using synthetically simulated logon scenarios and testing them from remote locations, administrators can proactively find and fix issues before real users and business services are affected. A logon simulator tracks the time taken for each step of the logon process during the simulation and highlights the exact step causing a slowdown. This can also be used to:

  • Benchmark logon performance
  • Compare logon performance across locations
  • Test for logon issues before production updates

eG Enterprise is a Citrix Ready performance management suite that combines both real user logon monitoring and synthetic logon simulation to deliver a comprehensive solution for identifying, diagnosing and resolving Citrix logon issues.

Review our Citrix Troubleshooting Guide to solve your most challenging Citrix issues

Top 10 Java Performance Problems and How to Solve Them

Java is one of the most popular technologies for application development. Tens of thousands of enterprise applications are powered by Java and millions of people use them daily. Java has been evolving over many decades and there are so many web frameworks, middleware, data access technologies and protocols built on Java. Compared to C, C++, and other languages where memory management is mostly done manually by the programmers, Java is self-regulating and manages memory (free-up and reclamation) on its own, automatically.

Despite this, performance problems can also occur in Java-based applications and when a problem happens, it can be business-impacting. In this blog, we will look at some popular problems that Java developers and administrators encounter and recommend some best practices to resolve and prevent them.

Top 10 Common Java Performance Problems:
Memory
  1. Out-of-Memory Errors in the JVM
  2. Excessive Garbage Collection
  3. Improper Data Caching
Threads
  1. Thread Deadlocks and Gridlocks
Database
  1. Running Out of Database Connections
  2. Slow Database Calls
Application / Code
  1. Java Code-Level Issues
  2. Java Application Server Bottlenecks
Infrastructure
  1. Server Performance Problems
  2. Network Latency and Connectivity Issues

#1 Out-of-Memory Errors in the JVM

The dreaded java.lang.OutOfMemoryError is an indication that the application is attempting to add more data to the memory, but there is no additional room for it. Out-of-memory errors result in failures that the application cannot recover from and hence, must be avoided at all cost.

There can be many reasons why an out-of-memory error occurs:

Under-provisioned memory: First, the configured heap memory in the JVM may not be not sufficient for the application. The application may attempt to put more data into the heap, but there is no more room for it. Consider the case of an application attempting to read and store a 256 MB file in memory. The JVM needs to be configured with a heap size of at least 256 MB for this to work. While specifying adequate heap memory for the JVM is important, it is equally important to ensure that the other memory spaces used by the JVM also have sufficient memory. For instance, the Oracle JVM has multiple memory spaces:

  • Eden space for all objects initially
  • Survivor space for objects that have survived garbage collection
  • Tenured space for objects that have existed for some time in the survivor space
  • Code cache where memory is used for compilation and storage of native code
  • Permanent generation where class and method objects are stored

Each of these memory spaces has space usage limits that can be individually set. When any of these memory spaces is fully utilized, application errors will occur.

Spike in incoming traffic: Second, a spike in application load can trigger an out-of-memory exception. Consider a load balanced server cluster where each of the JVMs is configured to handle its normal load. When one of the nodes goes down, the other node will need to handle the additional workload. When the memory configured in the JVM in not sufficient to handle the increased workload, out-of-memory exceptions will occur.

Programming error: Third, a memory leak in the application can be caused by a programming error. The Java garbage collector is designed to reclaim the memory consumed by unused objects. But if a program keeps adding memory to the heap (e.g., a continuously growing hash table), an out-of-memory error is inevitable.

Helpful Troubleshooting Tips:
  • The Xmx setting of a JVM controls the maximum heap setting of the JVM. Make sure you have set this setting to be sufficiently high so that your application can handle the expected workload.
  • The limit for the individual memory spaces also must be tuned correctly.
  • Monitor JVM memory spaces and growth patterns continuously to proactively detect situations when there is a memory shortfall.
  • When excessive memory usage is detected, take a heap dump from the JVM, analyze the dump using a tool like the Eclipse Memory Analyzer and identify objects that are taking up an unusual amount of memory. Use this to fix code-level issues that may be causing memory leaks.
  • Fix code-level issues that cause unused objects to use up heap memory

#2 Excessive Garbage Collection

Garbage collection (GC) is a very useful process in the JVM that frees up room to add new data in memory. As much as it is useful, it can also turn undesirable if it happens too often. When garbage collection runs, it can hog CPU, the JVM’s processing may be paused and this may choke the performance of the application. The Oracle JVM supports different garbage collection algorithms: serial collector, parallel collector, the concurrent mark sweep (CMS) collector and the Garbage-first collector (G1GC). The choice of the garbage collector can have an impact on performance.

  • For best performance, garbage collection should be taking a small percentage of CPU time (< 10%).
  • If more than 20% of CPU time is used for garbage collection, it means that the application has a significant memory related performance problem that must be corrected.

Configuring your JVM’s memory to be too large can also be detrimental to performance. In such a case, garbage collection can take a very long time to complete, affecting performance.

Helpful Troubleshooting Tips:
  • Track instances of GC, time taken for GC, and % of GC time spent by the JVM.
  • Look for times when full GC happens. This can cause application slowness.
  • High CPU usage of the JVM can be caused by excessive garbage collection. When you don’t see your application threads taking CPU, check the performance of garbage collection. A memory issue can manifest as high CPU usage, making performance diagnosis difficult.

#3 Improper Data Caching

While caching is an essential process for faster reading of data in-memory (as opposed to making a database call across the network), it is counter intuitive to allocate excessive memory for caching. Sub-optimal memory configuration for caching will lead to more GC pauses and subsequently affect application processing.

Misconfiguration in caching will also lead to problems. Cached objects are stateful in nature, unlike pools that have stateless objects. When caching is not properly set, a recently used object could be mistakenly removed from the cache, to make room for a new object, resulting in a “cache miss” scenario when that object is being called. In addition to memory configuration, cache hit and cache miss configurations are also vital to set properly.

Helpful Troubleshooting Tips:
  • Monitor cache size continuously and get alerted when it falls below or exceeds your accepted threshold
  • Monitor cache hit and miss ratios to track the success of the caching process
  • Ensure proper synchronization of distributed caching happens in multiple servers

#4 Thread Deadlocks and Gridlocks

Java applications, especially web-based applications are often multi-threaded. Multi-threading helps with scalability, but at the same time, when multiple threads need to access shared JVM resources (often memory), locking is used to ensure that access to the shared resources is exclusively provided to each thread. When one thread locks a resource, other threads wait for the lock to be released. The Java programming language makes it easy for developers to implement synchronization between threads. The synchronized keyword can be used to create a block of code that is synchronized. Methods can also be synchronized.

Because it is so easy to create synchronized blocks, many a times developers create synchronized blocks without understanding the performance implication of such code blocks. When hundreds of threads synchronize on the same lock, the Java application’s processing of requests is severely affected and users will experience excessive slowness. When such a situation happens in production and there are hundreds of threads running in the JVM, it is very difficult to determine which lock caused the slowness and which block of code is the culprit.

Another issue with thread locking is deadlocks. For example, thread A which has an object lock waits for the execution of thread B, while thread B has a lock of its own and waits for the execution of thread A. Now, both these threads are deadlocked and will never execute causing application hangs or crashes.

Too much synchronization also takes a toll on performance. By over-synchronizing threads, one could end up facing the problem of a thread gridlock, where many threads are using the same lock, and waiting until the lock gets released.

Helpful Troubleshooting Tips:
  • Monitor the status of threads in the JVM and determine the count of threads in running, blocked and deadlocked state.
  • Use Java performance monitoring tools to help automatically detect blocked threads and deadlocks.
  • Identify the exact module and the line of code at which the locking is happening.

#5 Running Out of Database Connections

Most Java applications use database servers for persistent storage of data. Connections to the database server are used to store and retrieve data. Because establishing a database connection for each request is expensive, a connection pool is often used. The connection pool has an initial setting for the number of connections that will be pre-established when the application starts. When additional connections are required, the pool is dynamically grown subject to a maximum specified limit.

If the number of connections in use reaches the maximum limit, newer requests will have to wait until processing of existing database requests is completed. It is important for developers and DBAs to have a fair estimation of the application workload and set the configuration accordingly. At the same time, specific application modules or web pages may have connection leaks – i.e., a connection is obtained from the pool, but it is not released back into the pool. Such connection leaks will ultimately result in application errors being reported to users. High connection pool usage can also occur during times when the database server has slowed down its processing. Hence, it is important to differentiate performance issues that are a result of database connection leaks as compared to ones that are a result of a database server bottleneck.

Helpful Troubleshooting Tips:
  • Continuously monitor connections to the database: total connections, active connections, etc.
  • Track connection pool metrics, such as allocated, freed, created, closed and managed connections.
  • Correlate application access patterns with database connection pool usage to identify the cause of connection leaks.
  • Get visibility into waiting requests and connection delays, analyze these metrics along with health indicators of the database servers and determine times when database server bottlenecks are affecting application performance.

#6 Slow Database Calls

Database is an integral part of the application architecture. Performance of the application greatly depends on how fast the database responds and executes queries. According to a DZone performance monitoring survey, database problems rank second for the most likely cause of application performance problems. Not just that, but application developers wrongly get blamed for application issues when it is in fact a database query issue that should be addressed by a DBA. There are many reasons how a slow database query can affect application transaction processing:

Slow queries: When developing an application, developers are focused on getting the functionality right rather than on performance. While their database queries may be returning the correct results, these queries may not be designed optimally. For a query to be optimally designed, it must avoid full table scans at the database level. It must make use of database indexes, so the results are returned in the fastest possible manner. Getting queries to be optimally designed often requires the involvement of a database administrator. The DBA can analyze a query’s explain plan and provide recommendations on how to tune it for optimal performance. These could include redesigning the query, using existing indexes, recommendations for new indexes, addition of hints, etc.

Unused indexes: While it is good practice to have an index on every foreign key in a table, you must also bear in mind what kind of queries are being executed. An index may not be needed when you don’t use a specific column for your queries. Unused indexes will occupy space on the disk and the database need update the indexes every time when there is an insertion/deletion of records. This will slow down query processing.

Insufficient database resources: When database is running out of server resources, such as CPU, memory and disk, it will have an adverse impact of query execution.

Helpful Troubleshooting Tips:
  • Analyze database queries issued by the application and identify web pages and corresponding queries that are taking time.
  • Plan database sizing and configuration properly to ensure the consistent performance.
  • Use database monitoring tools to identify and fix missing indexes, optimize the database layout by re-indexing, etc.

#7 Java Code-Level Issues

The DZone performance monitoring survey referenced earlier cites code-level problem as the top cause of application performance issues. Most code-level issues are due to bugs in the code constructs, such as long waits, poor iteration, inefficient code algorithms, bad choice of data structures, etc. For example, iterating through a Vector with hundreds of thousands of records will be inefficient. A HashMap may be a more efficient data structure for this task. In most cases, code-level issues manifest as loops that take up CPU cycles in the JVM.

Then, there could be performance bugs in third-party frameworks use in application development. Ideally, all code-level issues should be captured by the QA team and fixed by the development team before production rollout. But this is not always the case.

Helpful Troubleshooting Tips:
  • Incorporate best practices during the entire software development lifecycle – from design, development, testing, to rollout. Development teams must be skilled to avoid code-level mistakes, and QA teams should be adept to catch issues proactively.
  • Incorporate code optimization practices to ensure the application code meets expected standards.
  • Use transaction profiling tools to isolate code-level problems automatically.
  • Reviewing applications logs could also provide good insights for debugging.

#8 Java Application Server Bottlenecks

The application server is a critical component of a Java application architecture. Popular Java application servers are Oracle WebLogic, IBM WebSphere, JBoss, WildFly, Tomcat, etc. A bottleneck in the application server would directly impact the business transactions and affect application performance and end-user experience. Problems in servlet execution, bean caching, queuing, JDBC connectivity, etc. will affect performance.

Transaction rollback is another issue to deal with. An application rollback is typically the result of pre-designed business logic. But a non-application rollback is a serious issue and needs to be addressed immediately. The application would throw an exception and the transaction would roll back in return, not allowing the end-user using the application to process their request. There are three types of non-application rollbacks:

  • system rollback happens due to a problem in the Java application server
  • time-out rollback happens because a process within the Java container times out
  • resource rollback happens when there is a problem in resource management of the Java container
Helpful Troubleshooting Tips:
  • Use application performance monitoring tools to monitor the health, availability and performance of Java application servers from end to end.
  • Track key application server metrics to understand anomalies and antipatterns.

#9 Server Performance Problems

All applications and supporting middleware and back end are run on enterprise servers. This could be a server operating system on a physical box, a virtual machine, or even in the cloud. A problem in the server hardware or resources will affect the performance of the application running on it. Insufficient CPU, memory, and disk, operating system errors, server hardware faults, high-CPU processes, zombie processes, corrupt services, etc. are some common problems administrators deal with. Especially when the server is virtualized, it is even more difficult to pinpoint problems. A resource contention issue at the virtualized host server can impact all the guest virtual machines and applications running on them. The same with containers and cloud-based application workloads

Helpful Troubleshooting Tips:
  • Constantly monitor the performance of servers and operating system.
  • Size servers optimally for maximum application performance.
  • Understand dependencies between the server (physical/virtual/cloud) and the applications running on them to causal analysis of application slowdowns.

#10 Network Latency and Connectivity Issues

Bandwidth congestion in the network, high latency and packet loss, misconfiguration in a router, DNS failure, etc. could affect the performance of applications. Usually there is a lot of finger-pointing between the application team and network team as to where the root cause of an application problem is and who needs to resolve it. When it is actually a network issue, the application team could be chasing a red herring on the server side.

Helpful Troubleshooting Tips:
  • Continuously monitor the network devices, traffic and configurations.
  • Compare and correlate network performance with application problems to easily know the impact of network performance on the applications and isolate the root cause.

It is important for developers to code efficiently and testers to catch and report issues – no doubt! The IT teams should also act proactively have the necessary performance management measures in place. Monitoring should not be an afterthought for organizations developing, hosting, implementing and using Java-based applications. Constant monitoring of user experience, application transactions, application code, and the supporting infrastructure is vital to detect and resolve problems before they become business-impacting.

Citrix Troubleshooting 101: Frequently Asked Questions

Citrix Virtual Apps and Desktops deployments are performance sensitive. There are many components both datacenter-side and client-side which must be optimally performing together to deliver a consistent and performing virtualized apps and desktops solution. With many different components in play, it can often be a challenging task for a Citrix administrator to determine the impact or cause of a Citrix related problem. Citrix troubleshooting nevertheless becomes challenging.

From the 2018 Migration Survey conducted by eG Innovations, some interesting statistics surfaced:
  • 59% of 795 Citrix professionals voted that slow logons were the number one problem for them.
  • 44% voted that frozen sessions were a problem.
  • 33% voted that slow application launches were almost common as any other fault.

At eG Innovations, we recently joined forces with a Citrix CTP George Spiers to deliver a webinar on the topic “Citrix Troubleshooting 101.” More than 650 people participated in this engaging live webinar. The webinar was hugely popular simply because as mentioned, Citrix administrators want to be able to quickly and more efficiently diagnose issues within the environment.

Citrix issues ultimately result in lost productivity and company revenue. The severity of lost productivity and revenue is mainly determined by the time it takes to resolve an issue. For an administrator to be successful in Citrix troubleshooting, the process of elimination is key. The process of elimination can be applied to three particular troubleshooting tactics that were highlighted in the webinar. Following these tactics will help you to become more efficient at diagnosing Citrix problems:

  1. Determine the scope of the problem: Does the user face an issue with a task they are trying to complete or all tasks?
  2. Determine the magnitude of the problem: How many users are impacted?
  3. Determine the source of the problem: Does the issue reside client-side or within the corporate infrastructure?

Watch Webinar: Citrix Troubleshooting 101 »

We compiled the dozens of participant questions into three groups below: Citrix Troubleshooting, Citrix Optimization and Citrix Monitoring. George Spiers provided all answers based on his real-world Citrix consulting projects. We encourage you to save and regularly reference this Citrix troubleshooting guide for any issue you may encounter.

Questions & Answers for Citrix Troubleshooting 101

Citrix Troubleshooting

1. Are there any tips to improve remote access performance?

  1. Firstly, on Citrix ADC (formerly NetScaler), bind the TCP profile “nstcp_default_XA_XD_profile” to your Gateway virtual server.
  2. Secondly, edit profile “nstcp_default_XA_XD_profile” on ADC and uncheck “Use Nagle’s algorithm”.
  3. Take a look at the “Optimized for WAN” Citrix policy template within Citrix Studio, which will give pointers to configuring policy settings that can help improve performance over WAN.
  4. Consider preparing your ADC Gateway virtual servers and end-user devices to support Adaptive Transport. You can read more here: https://www.jgspiers.com/hdx-enlightened-data-transport/

2. Can you configure Citrix Director’s application probing for published desktops?

No, currently, Citrix Director’s application probing only supports published applications. You may want to consider logon simulators and full session simulators available in the market. See the following links:

3. I was using Citrix Director and could not logoff/disconnect user’s session. What would the next step be?

The next step would be to log on to the VDA and attempt to end the user’s session from there. That may involve you killing hung processes. If that does not work, see https://www.jgspiers.com/user-stuck-citrix-desktop-force-log-off/

4. Regarding brokering times with different versions, have you seen a significant difference between 7.15 to 7.18?

I haven’t personally seen any significant differences, nor have I come across any Citrix publication regarding this.

5. Our user logon times are about 30 seconds, with Internet Explorer initialization taking the most time. What would you advise to help us make logons faster?

To improve logon times on Citrix Virtual Apps and Desktops, you can use several optimization scripts for Windows server and desktop operating systems which I have created, See https://www.jgspiers.com/category/scripts/ Besides this, other common practices for reducing logon times include Group Policy housekeeping, profile management best practice configuration, Write Cache best practice configuration, auto-logon and so on.

You can refer to this webinar “How to Make Citrix Logons 75% Faster” for additional details.

6. How do you quantify slow logon? Is a 30-second logon considered slow?

30 seconds and below is what I like to achieve in all my deployments. I can accept 40 seconds or less, but 30 seconds downwards is the real goal.

7. We are using Citrix Workspace Environment Management (WEM) in our infrastructure. Are there any disadvantages of using WEM over CPM? Also, is WEM available for on-premises XenDesktop?

WEM and CPM are different products and they both have different uses. It can actually be beneficial to run both, as they work together well. Citrix WEM applies printers, mapped drives, registry settings and other actions to a user’s desktop session. Profile Management captures and roams the user profile between desktop or virtual application sessions.

WEM is available with XenDesktop Enterprise (now Virtual Apps & Desktops Advanced) and above subscriptions.

8. We have XenApp 6.5/XenApp 7.6, and we have published the same apps on Windows 2008 and Windows 2016 respectively. But performance is slow on XenApp 7.13/Windows 2016. Why do you think this would be?

One of the reasons for this could be that you have not optimized the Windows Server 2016 image? Default settings in the operating system are not the best. Please refer to my optimization script: https://www.jgspiers.com/windows-server-2016-optimisation-script/ Also keep in mind that out of the box Windows Server 2016 will require more resource than Windows Server 2008. So, you should assign an extra 1-2GB RAM and another 1-2vCPU and see if there is much difference in performance between the two environments.

9. Often our users get the “session interrupted” notification on the corner of their session. Would this be network related? Or is it an issue on the client side?

“Session interrupted” notifications generally occur when there is a network issue between the Virtual Apps server or desktop and the client terminal. I would run through a process of elimination to see if the issue only happens at a specific user location, with specific endpoint clients, with specific Receiver versions and so on.

You could also monitor the VDAs and check to see if there are TCP connection drops being reported. Have your network team run tests on the networking devices that are client-side to see if there is any packet loss.

10. Do you have any thoughts on what’s the main cause of PVS target retries and how to troubleshoot them effectively?

This can be caused by network blips such as spikes in latency/packet loss. A slow performing/saturated storage array where the Target Device is stored or the PVS vDisks are stored can also be the cause of retries.

11. How often do you recommend we reboot XenServer hypervisors?

I only recommend rebooting XenServer hypervisors either during disaster recovery testing phases, or when applying hotfixes to XenServer.

12. How often should VDA’s be rebooted?

I typically like to reboot my virtual apps workers at least every 1-3 days. However, it again depends on how often the VDAs are used and how much resource is assigned to them.

13. Is there any tool that can identify slow printing in a Citrix session?

Third-party products can monitor print servers, VDAs, and the network to inform you if there are problems. Often slow printing can be the result of bad printer routing e.g. a printer and print server with a lot of latency between them or too many hops in the communication path. For this, take a look at the Citrix policy setting “Direct connections to print servers” which is explained in detail here: https://www.jgspiers.com/citrix-universal-printing/.

Other reasons for slow printing can be due to outdated/problematic print drivers in use, or lack of bandwidth/prioritization of the printing virtual channel (ICA).

14. What UDP port is needed for EDT?

UDP ports 1494 and 2598 are required. If you are providing EDT access via Citrix Gateway, then only UDP 2598 is required to be open from the Internet to Gateway.

15. Is there any way of easily finding bandwidth issues with NetScaler? We have a VPX 200 and are wondering whether it is a bottleneck for external users.

The “Packet CPU Usage” counter on the Dashboard of Citrix ADC will show you if the ADC device is reaching its bandwidth limit or not. You can run reports from the Reporting tab. A built-in report named “CPU vs. Memory Usage and HTTP Requests Rate” can help.

Likely the Citrix Gateway is not the bottleneck though. If you are connecting in from a high-speed, broadband link and you still see latency, which would be cause for concern and potentially point towards it being a Gateway or DMZ issue.

Citrix ADM (Application Delivery Management – formerly known as NetScaler Management and Analytics System/MAS) can help track HDX Insight data and get reports on WAN latency, ICA RTT, datacenter latency and so on.

See https://www.jgspiers.com/citrix-netscaler-management-analytics-system/

16. Is it possible to enable Receiver logging on a thin client, i.e. HP thin client or Dell thin client?

There are procedures set out by Citrix on how to enable logging for Windows and Linux etc. Workspace app (Receiver) editions. You should consult with your thin client vendor on how these procedures can be carried out on the thin client.

17. Error 1102: The Citrix Broker Service failed to broker a connection for user ‘Domain.com\user’ to resource ‘Desktop1234’. The virtual machines ‘WIN10-091.domain.com’ rejected the request to prepare itself for a connection. This problem usually indicates that the virtual machine is engaged in an activity such as restarting, entering a suspended state, or processing a recent disconnection or logoff. Do you have any guidance to troubleshoot this?

Determine if this issue only happens to particular VDAs, particular VDA versions etc.

Also, check how many users are currently connected and if you have enough VDAs/resource to handle more users.

If this issue is only experienced during logon storms such as in the morning, then there might be a lack of VDAs to handle the concurrent logon rate (which can be adjusted via policy).

18. I would really like to know if Citrix offers a documented protocol for troubleshooting the software stack. Something starting with what baselines to get when things are working and what corrective actions to take when a given part of the stack is not meeting those baselines at the time people are complaining.

You can capture baselines yourself at the beginning of a deployment which helps when comparing the same once users have been loaded on to the environment. However, you really need third-party monitoring solutions that can alert you when parts of the infrastructure are under stress or down. You can configure alerts when metrics breach defined thresholds. For example, logon times. Citrix Director has some of this capability, but the capability is dictated by the license you own and is limited more to monitoring Citrix VDAs and sessions, and not so much the supporting infrastructure.

19. We have seen many issues with respect to degraded performance and session disconnected issues. We are supporting multiple versions of XenApp & XenDesktop both on-premises and in the cloud. Performance degrades both in published application and VDI. We have not seen any network issues. Users can’t launch the session if the session is in a disconnected state (not all times). I have seen this issue often in the cloud. Do you have any tips to investigate and address this?

It is a problem that could be caused by many things. I would try to eliminate possibilities of it being the image, high CPU/RAM consumption on VDAs, Workspace app version, the client used, network location used, VDA version used and so on. If it happens in the cloud, I assume you have VDAs in Azure or AWS. The degraded performance and disconnects typically relate to network issues but it could be the VDA itself hanging. You will have to start troubleshooting from a high level and work your way down as you rule factors out one by one.

20. How to troubleshoot TDIca.sys BSOD? This is on a Windows Server 2008 R2 image and our current version of Citrix is 7.15.3000. We have them hosted on VMware 5.5 using MCS. When I first created the Site, all was working fine without any issues. I didn’t even have to do a weekly reboot, but now this seems to happen on a weekly basis. Any tips to triage this issue?

I would look at what has changed in the environment. Sometimes it is quicker to build a fresh new image considering the problems you face and the time it may take to troubleshoot them.

21. Recently we upgraded XenApp 7.6 to 7.15 CU3 after which some features inside the published apps are not functioning when users launch URLs from 7.6 dedicated Windows 7 VDI. When they launch the same URLs outside of a VDI desktop (local computer), RDP, vSphere, all app features are opening as expected. Only VDI users are getting this problem. What could be the issue?

You have a XenDesktop 7.6 VDI site running Windows 7 VDAs, and those guys launch published apps from a XenApp 7.15 CU3 site. Now some of the features inside the published apps no longer work, whereas they used to work when running XenApp 7.6.

Has anything else changed on the Windows 7 image such as an upgrade of Receiver for Windows?

22. When I’m undocking the laptop from the disk (wired) and continuing to work under wireless LAN for 1 hour in the conference room, and after returning to the desk and docking the laptop back on the base-station (wired), the Citrix virtualized application session that previously ran then fails to respond. What is the likely cause, how to troubleshoot it? Is there any tool available to detect or even auto-fix the issue?

Have you tried updating to the latest Workspace app, or tried the same scenario using a newer VDA version? If that does not work, I would suggest you contact Citrix support.

23. Launching a virtual application takes forever – crawling slowly that the response was like progressing in each of the launching stages for 5 minutes or more. Though no apparent network bottleneck was suspected when transmitted across the network outside Citrix. On another occasion, the same application just launches within 1-2 minutes. What do you think would be causing the delay?

Monitoring tools such as eG Innovations can give detailed insight into the Citrix logon process and what is causing the various steps to take so long.

You should put one of the affected VDAs into an isolated Active Directory Organizational Unit with no Group Policies applying.

Other things to try are testing the logon time through a console session rather than ICA, disabling profile management (if in use) and so on.

The process of elimination will help find the root cause quicker.

24. Is any script available to auto-delete user profiles from a profile server which will help admins from manually doing it?

Profile Management can auto-delete profiles from a VDA using policy setting “Delete locally cached profiles on logoff”.

When it comes to deleting profiles from a profile server automatically, there isn’t any script out there to do that. The script wouldn’t know which profile to delete and when.

25. Does Citrix Cloud make troubleshooting any easier?

The answer is both yes and no! You have less to troubleshoot because management of the control plane (Delivery Controllers, SQL servers, etc.) is done by Citrix. This said:

– You are still responsible for monitoring and managing the virtual apps servers and virtual desktops;

– And you still have ownership of the overall service performance. When there is slowness, you will still need to understand how to pinpoint where an issue lies. If the issue is with Citrix Cloud, then you must depend on Citrix to fix it.

The eG Innovations webinar “Does Using Citrix Cloud services make performance monitoring easier?” may be something you want to review for more details.

Citrix Optimization

26. I’m optimizing a Windows 10 image using App Layering but unsure which layer I should remove UWP applications from?

You should remove these applications from the OS Layer.

27. What about antivirus solutions, do you recommend installing antivirus on the main image? Or do you recommend deploying antivirus on Delivery Group desktops?

Antivirus agents should be installed on the gold image and all other infrastructure components such as your Delivery Controllers and StoreFront servers.

Hypervisor introspection is a technology that allows for lightweight agents or no agents at all to be placed on the VDA to reduce footprint. This technology can help with scalability.

28. What is the ideal spec for VDA?

There is no ideal specification as it depends on the workloads of your users. Typical “Task Worker”, “Knowledge Worker”, and “Power User” Windows 10 workloads may be able to use “2vCPU/2GB RAM”, “2vCPU/4GB RAM”, and “4vCPU/8GB RAM” configurations respectively, but you need to test these numbers in your own environment.

29. For best performance, you recommended, ‘Have enough DDCs to handle requests.’ What numbers would you recommend?

A Delivery Controller can support up to 5000 VDAs. If you have 10,000 VDAs for example, deploy 3 DDCs minimum. You should always follow the N+1 model. This allows you to endure a Delivery Controller failure without impact.

30. I have seen dramatic differences in the use of Write Cache space if a machine vDisk is ‘optimized’ after imaging or updating a master VM. Can you explain why this is so?

An optimized image is leaner, so there is less going on. That is the reality. As a result, the Write Cache should not be used as much as a bulky image with everything turned on would use it. I suggest regularly performing disk defragmentation on your vDisks as that also drives down Write Cache usage.

31. Will session pre-launch utilize system resources even if the user has not launched the application?

Yes. Some processes on the VDA will be running, ultimately consuming resources. However, the resource utilization should be low given the session will be idle.

Citrix Monitoring

32. For monitoring AppFlow, you mentioned a Premium ADC license is required. Does an Advanced ADC license not give AppFlow monitoring? Is there any other option there to monitor AppFlow?

For HDX Insight, a Premium license offers historical capturing of this data. An Advanced license only provides 1 hour, so basically real-time capturing. Web Insight is different and does not have a licensing requirement.

eG Innovations and other Citrix Ready monitoring partners offer AppFlow monitoring capabilities that will work if you have an advanced ADC license.

33. I have some users who report that their session was slow outside of general business hours and Citrix Director doesn’t show if anything was wrong at that time. What tools can I use to capture historic statistics about each virtual desktop?

Citrix ADM (Application Delivery Management) is useful if the affected users are remote workers and your bottleneck is in the network.

Citrix Director provides visibility into specific parts of the infrastructure. To get complete end-to-end visibility into the Citrix tiers (StoreFront, Virtual app servers, license servers, Delivery Controllers, ADCs, PVS, WEM, etc.) and the supporting infrastructure, you can look at eG Innovations and other third-party monitoring solution vendors.

34. Can eG Enterprise detect issues with NetScaler? What is there are session drops? Can you monitor NetScaler devices and flows?

Yes, eG Enterprise monitors Citrix ADC/NetScaler in-depth.

See https://www.eginnovations.com/citrix-monitoring/netscaler-monitoring. All the key metrics of NetScaler can be monitored agentless. AppFlow data from NetScalers can be exported to eG Enterprise and analyses as well.

35. From a security standpoint, I don’t want to send monitoring data outside my datacenter. Is that possible with eG Enterprise?

Yes, eG Enterprise offers an on-premises solution. The management server, reporting engine and agents can all be deployed on-premise and no data is sent to the cloud.

36. Can Smart Tools only be used if you have an active Citrix Cloud platform?

Smart Tools is available to Citrix cloud customers and on-premises Virtual Apps and Desktops customers that hold a “Customer Success Services – Select” agreement. See https://www.citrix.co.uk/products/smart-tools/feature-matrix.html

37. Which tool(s) can provide a logon breakdown: GPO, a full breakdown of interactive session, etc.?

Citrix Director 7.18 can provide statistics for Profile Load, Brokering time, GPO Processing time and so on. A further enhancement was made to the product in 7.18 that breaks Interactive Session time down into three sub-sections.

Third-party monitoring tools such as eG Enterprise have been providing breakups of Citrix logon time including details of interactive session time. This is useful for administrators to quickly determine which logons are slow because of profile loading and which GPOs are slowing down logons.

See https://www.eginnovations.com/citrix-monitoring/citrix-logon-monitoring

38. Can eG Enterprise be used in conjunction with 3rd party profile management tools such as FSLogix or Liquidware?

eG Enterprise is compatible with all third-party Citrix profile management solutions.

39. In the demo of eG Enterprise, you showed some use cases for detecting slow logon issues, virtualization issues, and non-corporate apps being the cause of resource depletion. What other Citrix problems can eG Enterprise help troubleshoot that Citrix Director cannot?

There are many ways in which eG Enterprise brings value to Citrix customers. These include:

  • Ability to monitor the user experience using synthetic monitoring (logon simulation and full session simulation)
  • More granular insights into why Citrix logons are slow.
  • Real-time monitoring of application launch times and proactive alerting through auto-baselining.
  • Unified visibility into all the Citrix tiers – StoreFront, WEM, PVS, ADC, License server, Delivery Controllers, Virtual App servers and Virtual Desktops
  • Integrated monitoring of all the supporting tiers including network, virtualization, cloud, storage, Active Directory and so on.
  • Embedded auto-correlation and root-cause diagnosis technology that helps easily determine if slowness is due to a Citrix problem or not.

Refer to this blog post for a detailed comparison of Citrix Director and eG Enterprise: https://www.eginnovations.com/blog/citrix-director-end-to-end-citrix-workspace-monitoring/

40. We end up chasing Citrix issues, only to find that it is a problem with a user’s network connection. Can eG Enterprise help us identify these types of problems?

Yes, eG Enterprise monitors ICA round trip time (ICA RTT) which is the latency that the user perceives. In addition, it can report the network latency between the user terminal and the server farm. By comparing these two values, administrators can easily identify if there is a network issue that is affecting Citrix performance. See this short video on how eG Enterprise makes Citrix troubleshooting simple.

41. How can we monitor and identify what is causing long VDI logons? We are currently using VDI’s for about a year now, but since last year December something changed (we don’t know what) that our logon time has increased from about 45 seconds to 2 minutes or more. I know we can use Citrix Director, but I wanted to see if there is another tool that can show more details.

Third-party solutions such as eG Enterprise will provide you with more detail around the logon steps and what is causing them to take a long time. Whilst you say nothing has changed, it would be worth reviewing if any Group Policy settings have been added, or what Windows updates have been installed since December. Other things to investigate is drive maps and printer mapping via Group Policy. Are the printers/drive map locations still available? Does Event Viewer on the VDAs give any hints?

42. I have no data being written to the monitoring database, so I see no data in Director. This is after migrating the database from the local SQL Express database to SQL Server 2016 on another server. What should I check to troubleshoot this?

Check that the connection strings have been correctly updated. You may refer to the following blog for assistance: https://www.citrix.com/blogs/2014/02/05/xendesktop-7-x-database-migration/

Simplify & Automate Citrix Monitoring Like Never Before »

If you have any further questions on the topic of Citrix troubleshooting, or you’d like to let us know how some of the tips shared in this webinar had a positive effect on your ability to troubleshoot, send us an email at info@eginnovations.com.

You can watch the recording of the Citrix Troubleshooting 101 webinar at your convenience: https://www.eginnovations.com/webinar/citrix-troubleshooting.

Helpful Resources:

Citrix Director: What It Is and How it Works

Read Part 2: Is Citrix Director Sufficient for End-to-End Monitoring?

Understanding the Monitoring Capabilities of Director and How to Use It

Citrix Director is a web-based monitoring console for Citrix XenApp and XenDesktop virtualization platforms that allows administrators to control and monitor virtual applications and desktops. Starting with version 7, Citrix Director is the default management tool, replacing the erstwhile Citrix EdgeSight.

In this blog, we will look at the key capabilities of Citrix Director, what it does, how far it goes for Citrix monitoring. In a subsequent post, we will go into analyzing when and for what use cases you may need to look beyond Citrix Director for your performance monitoring needs.

Key Considerations for a Citrix Monitoring Solution: Download Checklist »

Why Use Citrix Director?

  • Because it’s free: Citrix Director is built-into Citrix XenApp and XenDesktop.
  • It doesn’t require any external agents to be deployed: Citrix Director uses instrumentation built into the Citrix FlexCast Management Architecture (FMA). No additional agents need to be deployed for it to work.
  • It integrates with Citrix NetScaler MAS. Director mainly provides insights into server and session performance. If an administrator is interested in network-level visibility into Citrix traffic and has NetScaler MAS already in place, they can see NetScaler MAS metrics in the Director console itself.
  • It can be used for monitoring both on-premise deployments of Citrix XenApp and XenDesktop as well as Citrix Cloud deployments. The actual console in the two cases may be different but administrators can benefit from a similar look and feel of the user interface.

Read this blog to understand the functions of #CitrixDirector and how it can be useful in #Citrix #XenApp and #XenDesktop deployments.

What Does Citrix Director Offer?

Starting off as mainly a helpdesk tool, Citrix Director has enhanced in capabilities over subsequent releases of Citrix XenApp & XenDesktop 7.x.

The key capabilities in Citrix Director include:

 

Monitoring of Real User Logons and Breakdown of Logon Times: From the Director console, administrators can track the logon times that users are experiencing and see which parts of the logon process are causing slowness – from brokering to interactive session. Citrix Director pulls these metrics from the Delivery Controller.

Monitoring of Citrix Connection Failures: Different types of user, desktop and machine connection failures can be tracked from the Director console. (Refer: https://www.citrix.com/blogs/2014/03/25/director-dashboard-explained/)

Monitoring of Server Resources: Key server resources such as CPU and memory of the server OS can be tracked from the Director console. For environments configured with NVIDIA GPUs, Director also can track GPU utilization. (Refer: https://support.citrix.com/article/CTX223925).

Monitoring of Application Failures: Different types of application failures are also reported from the Citrix Director console. (Refer: https://www.citrix.com/blogs/2017/08/23/application-related-session-failure-reporting-in-citrix-director-7-15/).

Monitoring of Sessions: By searching for a specific user, administrators can drill down into a user’s session. The details of the user’s session, endpoint information and virtual channel details are all accessible to the administrator, providing visibility into the complete list of processes being executed by the user.

Control Actions: From the Director console, administrators can control a user’s session. Actions that can be taken include logging off a session, disconnecting it, sending a message to a user, shadowing a session, etc. Helpdesk personnel may find these actions quite useful.

Active Application Probes: Recently, Citrix Director, as part of XenApp & XenDesktop 7.18, included the capability to initiate active synthetic tests against the Citrix StoreFront servers to determine if applications are available or not. Given a StoreFront URL, an active probe initiates a check at a pre-defined time to determine if an application is working or not. One thing to note that this probing functionality is only available for published applications and not desktops. This is also a Platinum license feature and not available to other Citrix customers. (Refer: https://www.citrix.com/blogs/2018/06/18/application-probing-your-proactive-application-monitoring-solution-from-citrix-director/)

Alerting: Starting with Citrix XenApp & XenDesktop 7.7, threshold policies can be set for metrics and email alerts initiated to inform administrators about abnormal conditions. Alerting is only enabled with Platinum license and is not available to other Citrix XenApp & XenDesktop license levels. (Refer: https://www.citrix.com/blogs/2016/01/13/configuring-managing-alerts-and-notifications-using-director/)

Trending: Trending capabilities in Director provide graphical analytics of historical performance over time. But only 7 days’ worth of historical data is typically stored. For longer data retention periods, Citrix customers must be on Enterprise (1-month storage) or Platinum (1-year storage). Data retention for Citrix Director on Citrix Cloud is 90 days.

Director also has predictive capabilities, which might be helpful for capacity planning and forecasting.

How is Citrix Director Licensed?

Director is available with all editions of Citrix XenApp and XenDesktop. Basic functionalities of Director are available for Citrix Advanced license customers. However, additional functionalities like alerting, application probing, desktop usage reporting, NetScaler MAS integration for HDX Insight data, SCOM integration, long-term historical reporting are available only with XenApp and XenDesktop Platinum license. (Refer: https://www.citrix.com/blogs/2016/08/10/citrix-director-features-by-edition-and-version/)

To summarize, this blog provided insights into what Citrix Director can do, its features and licensing information. As a built-in tool available with XenApp and XenDesktop deployments, Director is certainly good for high-level monitoring of user session information. But, how far does Director really go for delivering deep performance visibility, detailed drilldowns, and facilitating triage and troubleshooting of complex performance problems? We will discuss this in the next post and analyze how third-party Citrix monitoring tools can complement Director and extend monitoring capabilities to beyond the Citrix tiers and deliver end-to-end performance insight.

Is Citrix Director Sufficient for End-to-End Monitoring?