Category Archives: Application Performance Monitoring (APM)

What is Application Performance Monitoring?

In today’s digital economy, speed is everything. Especially for applications and websites accessed by end users, when there is slowness, it will have a direct detrimental impact on business productivity, profits and even the brand itself.

If an ecommerce application loads slowly or experiences errors, it will translate into loss of business, and the customer might end up switching to another website – possibly your competitor.

Whether it is a custom web application, VDI application, mobile application, or a packaged enterprise application such as SAP, Siebel, SharePoint, ensuring high performance and user experience in production defines the success of the application.

Why is My Application Slow? Best Practices for Troubleshooting

Given the complexity of today’s application infrastructures, this is not easy to achieve. Mobile app development platforms, cloud-native infrastructures, virtualized and containerized servers, dynamic and ephemeral application architectures, IoT, etc. make performance management a challenge. With the heterogeneous nature of the IT landscape and the various interdependencies between components, it is difficult to identify the cause of application slowness. One of the toughest questions application owners, developers and IT managers face is “Why is my application slow?”.

Slow is the New Down
  • Over 73% of organizations experience loss of productivity due to performance slowdown in the application
  • A 1 sec delay in website loading time can result in a 7% reduction in conversion and up to 16% decrease in customer satisfaction

When an application is slow, you’ll need to discover the following: why is it slow, since when, and what is causing slowness. This is where monitoring techniques come into play. Some of the traditional approaches to performance monitoring include executing ping tests on applications suspected to be slow, Telnetting to application ports for diagnosis, measuring server level metrics (CPU, memory, disk, etc.), and so on. But these are not sufficient to locate the bottlenecks in today’s distributed application environments. Organizations need to be able to connect the user journey with the application infrastructure and understand when, where, and why user experience is affected during application access.

Enter Application Performance Monitoring

Measuring the availability, response time and behaviour of each and every business transaction is key to understanding the user journey. When a user performs a transaction on a digital business service, the application owner needs to know:

  • If the application is responding as it should
  • Whether all the backend processes are being executed as they should
  • If there is any slowness in transaction processing, which part of the application architecture is causing it
  • Whether there is an error/bug in the application code, a problem in the application server or in the web front end, a query executing slowly, a hotspot in the backend database, slow network, etc.

So, monitoring must evolve from just looking at hardware metrics to analyzing application codes and business transactions. The performance of an application should be measured with a user -centric view. This forms the basis of application performance monitoring (APM).

What is Application Performance Monitoring?

Application performance monitoring is the strategy and practice of continuously monitoring and tracking the performance of business applications and the user experience of end users as they access the applications to understand trends, isolate anomalies, and get actionable insight for problem resolution and code optimization.

Various Aspects of Application Performance Monitoring

An APM strategy should ideally comprise the following:

#1 Digital user experience monitoring: This deals with tracking the experience of application users and identifying times when they experience slowness, errors, downtime. There are two popular approaches to doing this: by synthetically simulating user transactions and testing them proactively from different locations; and by passively monitoring user experience of real users as they access applications in real time.

#2 Transaction profiling: This involves instrumenting byte-code during application runtime and analyzing the transaction flow through every tier of the application architecture to isolate where slowness is being caused. Using a tag-and-follow approach, business transactions can be traced through the front end, across the middleware, all the way to the backend database.

#3 Application code-level diagnostics: When transaction tracing reveals slowness is happening at the application server, the application developer needs to if there is a problem in the application code. According to a DZone performance survey, 43% of application performance issues are due an issue in the application code. Transaction profiling usually provides the capability for developers and app owners to drill down into the code and get method-level processing time breakdown. When a user transaction is slow, it is then possible to pinpoint the exact line of code, database query or third-party call that is taking high processing time.

#4 Application deep-dive analysis: When there’s a problem in the application infrastructure – say in the application server when the connection pool is exhausted or there is high wait time for threads, or in the JVM or .NET CLR level when Garbage Collection is happening frequently or there is insufficient heap memory – it will affect application performance. Detailed visibility into the application infrastructure is a must.

Performance troubleshooting for a Java web application

#5 Infrastructure visibility in context of application performance: Many application issues occur due to slow network connectivity, a memory leak in the server, virtualization bottlenecks, storage hotspots, etc. So, monitoring the health and availability of the supporting infrastructure is paramount to ensuring application performance success. Infrastructure monitoring should be in context of application monitoring, and ideally be integrated into an APM solution.

To answer the question “Why is the application slow?”, one needs to gain correlated insight into all aspects of user experience, business transactions, application performance and infrastructure health. A converged application and infrastructure monitoring strategy should be adopted to automate root cause diagnosis and simplify performance troubleshooting. From a single pane of glass, you can get your single source of truth and easily triage problems and ensure high performance of applications.

Application performance monitoring can be a breeze when you know how to do it right. Try eG Enterprise APM and experience the power of converged application and infrastructure monitoring incorporating all the aspects of APM discussed above!

Learn all about #ApplicationPerformanceMonitoring and understand the various factors that comprise an #APM solution and how IT managers and developers can troubleshoot application slowdowns.

Online Demo: Converged Application & Infrastructure Monitoring

Helpful Resources:

Top 10 Java Performance Problems and How to Solve Them

Java is one of the most popular technologies for application development. Tens of thousands of enterprise applications are powered by Java and millions of people use them daily. Java has been evolving over many decades and there are so many web frameworks, middleware, data access technologies and protocols built on Java. Compared to C, C++, and other languages where memory management is mostly done manually by the programmers, Java is self-regulating and manages memory (free-up and reclamation) on its own, automatically.

Despite this, performance problems can also occur in Java-based applications and when a problem happens, it can be business-impacting. In this blog, we will look at some popular problems that Java developers and administrators encounter and recommend some best practices to resolve and prevent them.

Top 10 Common Java Performance Problems:
  1. Out-of-Memory Errors in the JVM
  2. Excessive Garbage Collection
  3. Improper Data Caching
  1. Thread Deadlocks and Gridlocks
  1. Running Out of Database Connections
  2. Slow Database Calls
Application / Code
  1. Java Code-Level Issues
  2. Java Application Server Bottlenecks
  1. Server Performance Problems
  2. Network Latency and Connectivity Issues

#1 Out-of-Memory Errors in the JVM

The dreaded java.lang.OutOfMemoryError is an indication that the application is attempting to add more data to the memory, but there is no additional room for it. Out-of-memory errors result in failures that the application cannot recover from and hence, must be avoided at all cost.

There can be many reasons why an out-of-memory error occurs:

Under-provisioned memory: First, the configured heap memory in the JVM may not be not sufficient for the application. The application may attempt to put more data into the heap, but there is no more room for it. Consider the case of an application attempting to read and store a 256 MB file in memory. The JVM needs to be configured with a heap size of at least 256 MB for this to work. While specifying adequate heap memory for the JVM is important, it is equally important to ensure that the other memory spaces used by the JVM also have sufficient memory. For instance, the Oracle JVM has multiple memory spaces:

  • Eden space for all objects initially
  • Survivor space for objects that have survived garbage collection
  • Tenured space for objects that have existed for some time in the survivor space
  • Code cache where memory is used for compilation and storage of native code
  • Permanent generation where class and method objects are stored

Each of these memory spaces has space usage limits that can be individually set. When any of these memory spaces is fully utilized, application errors will occur.

Spike in incoming traffic: Second, a spike in application load can trigger an out-of-memory exception. Consider a load balanced server cluster where each of the JVMs is configured to handle its normal load. When one of the nodes goes down, the other node will need to handle the additional workload. When the memory configured in the JVM in not sufficient to handle the increased workload, out-of-memory exceptions will occur.

Programming error: Third, a memory leak in the application can be caused by a programming error. The Java garbage collector is designed to reclaim the memory consumed by unused objects. But if a program keeps adding memory to the heap (e.g., a continuously growing hash table), an out-of-memory error is inevitable.

Helpful Troubleshooting Tips:
  • The Xmx setting of a JVM controls the maximum heap setting of the JVM. Make sure you have set this setting to be sufficiently high so that your application can handle the expected workload.
  • The limit for the individual memory spaces also must be tuned correctly.
  • Monitor JVM memory spaces and growth patterns continuously to proactively detect situations when there is a memory shortfall.
  • When excessive memory usage is detected, take a heap dump from the JVM, analyze the dump using a tool like the Eclipse Memory Analyzer and identify objects that are taking up an unusual amount of memory. Use this to fix code-level issues that may be causing memory leaks.
  • Fix code-level issues that cause unused objects to use up heap memory

#2 Excessive Garbage Collection

Garbage collection (GC) is a very useful process in the JVM that frees up room to add new data in memory. As much as it is useful, it can also turn undesirable if it happens too often. When garbage collection runs, it can hog CPU, the JVM’s processing may be paused and this may choke the performance of the application. The Oracle JVM supports different garbage collection algorithms: serial collector, parallel collector, the concurrent mark sweep (CMS) collector and the Garbage-first collector (G1GC). The choice of the garbage collector can have an impact on performance.

  • For best performance, garbage collection should be taking a small percentage of CPU time (< 10%).
  • If more than 20% of CPU time is used for garbage collection, it means that the application has a significant memory related performance problem that must be corrected.

Configuring your JVM’s memory to be too large can also be detrimental to performance. In such a case, garbage collection can take a very long time to complete, affecting performance.

Helpful Troubleshooting Tips:
  • Track instances of GC, time taken for GC, and % of GC time spent by the JVM.
  • Look for times when full GC happens. This can cause application slowness.
  • High CPU usage of the JVM can be caused by excessive garbage collection. When you don’t see your application threads taking CPU, check the performance of garbage collection. A memory issue can manifest as high CPU usage, making performance diagnosis difficult.

#3 Improper Data Caching

While caching is an essential process for faster reading of data in-memory (as opposed to making a database call across the network), it is counter intuitive to allocate excessive memory for caching. Sub-optimal memory configuration for caching will lead to more GC pauses and subsequently affect application processing.

Misconfiguration in caching will also lead to problems. Cached objects are stateful in nature, unlike pools that have stateless objects. When caching is not properly set, a recently used object could be mistakenly removed from the cache, to make room for a new object, resulting in a “cache miss” scenario when that object is being called. In addition to memory configuration, cache hit and cache miss configurations are also vital to set properly.

Helpful Troubleshooting Tips:
  • Monitor cache size continuously and get alerted when it falls below or exceeds your accepted threshold
  • Monitor cache hit and miss ratios to track the success of the caching process
  • Ensure proper synchronization of distributed caching happens in multiple servers

#4 Thread Deadlocks and Gridlocks

Java applications, especially web-based applications are often multi-threaded. Multi-threading helps with scalability, but at the same time, when multiple threads need to access shared JVM resources (often memory), locking is used to ensure that access to the shared resources is exclusively provided to each thread. When one thread locks a resource, other threads wait for the lock to be released. The Java programming language makes it easy for developers to implement synchronization between threads. The synchronized keyword can be used to create a block of code that is synchronized. Methods can also be synchronized.

Because it is so easy to create synchronized blocks, many a times developers create synchronized blocks without understanding the performance implication of such code blocks. When hundreds of threads synchronize on the same lock, the Java application’s processing of requests is severely affected and users will experience excessive slowness. When such a situation happens in production and there are hundreds of threads running in the JVM, it is very difficult to determine which lock caused the slowness and which block of code is the culprit.

Another issue with thread locking is deadlocks. For example, thread A which has an object lock waits for the execution of thread B, while thread B has a lock of its own and waits for the execution of thread A. Now, both these threads are deadlocked and will never execute causing application hangs or crashes.

Too much synchronization also takes a toll on performance. By over-synchronizing threads, one could end up facing the problem of a thread gridlock, where many threads are using the same lock, and waiting until the lock gets released.

Helpful Troubleshooting Tips:
  • Monitor the status of threads in the JVM and determine the count of threads in running, blocked and deadlocked state.
  • Use Java performance monitoring tools to help automatically detect blocked threads and deadlocks.
  • Identify the exact module and the line of code at which the locking is happening.

#5 Running Out of Database Connections

Most Java applications use database servers for persistent storage of data. Connections to the database server are used to store and retrieve data. Because establishing a database connection for each request is expensive, a connection pool is often used. The connection pool has an initial setting for the number of connections that will be pre-established when the application starts. When additional connections are required, the pool is dynamically grown subject to a maximum specified limit.

If the number of connections in use reaches the maximum limit, newer requests will have to wait until processing of existing database requests is completed. It is important for developers and DBAs to have a fair estimation of the application workload and set the configuration accordingly. At the same time, specific application modules or web pages may have connection leaks – i.e., a connection is obtained from the pool, but it is not released back into the pool. Such connection leaks will ultimately result in application errors being reported to users. High connection pool usage can also occur during times when the database server has slowed down its processing. Hence, it is important to differentiate performance issues that are a result of database connection leaks as compared to ones that are a result of a database server bottleneck.

Helpful Troubleshooting Tips:
  • Continuously monitor connections to the database: total connections, active connections, etc.
  • Track connection pool metrics, such as allocated, freed, created, closed and managed connections.
  • Correlate application access patterns with database connection pool usage to identify the cause of connection leaks.
  • Get visibility into waiting requests and connection delays, analyze these metrics along with health indicators of the database servers and determine times when database server bottlenecks are affecting application performance.

#6 Slow Database Calls

Database is an integral part of the application architecture. Performance of the application greatly depends on how fast the database responds and executes queries. According to a DZone performance monitoring survey, database problems rank second for the most likely cause of application performance problems. Not just that, but application developers wrongly get blamed for application issues when it is in fact a database query issue that should be addressed by a DBA. There are many reasons how a slow database query can affect application transaction processing:

Slow queries: When developing an application, developers are focused on getting the functionality right rather than on performance. While their database queries may be returning the correct results, these queries may not be designed optimally. For a query to be optimally designed, it must avoid full table scans at the database level. It must make use of database indexes, so the results are returned in the fastest possible manner. Getting queries to be optimally designed often requires the involvement of a database administrator. The DBA can analyze a query’s explain plan and provide recommendations on how to tune it for optimal performance. These could include redesigning the query, using existing indexes, recommendations for new indexes, addition of hints, etc.

Unused indexes: While it is good practice to have an index on every foreign key in a table, you must also bear in mind what kind of queries are being executed. An index may not be needed when you don’t use a specific column for your queries. Unused indexes will occupy space on the disk and the database need update the indexes every time when there is an insertion/deletion of records. This will slow down query processing.

Insufficient database resources: When database is running out of server resources, such as CPU, memory and disk, it will have an adverse impact of query execution.

Helpful Troubleshooting Tips:
  • Analyze database queries issued by the application and identify web pages and corresponding queries that are taking time.
  • Plan database sizing and configuration properly to ensure the consistent performance.
  • Use database monitoring tools to identify and fix missing indexes, optimize the database layout by re-indexing, etc.

#7 Java Code-Level Issues

The DZone performance monitoring survey referenced earlier cites code-level problem as the top cause of application performance issues. Most code-level issues are due to bugs in the code constructs, such as long waits, poor iteration, inefficient code algorithms, bad choice of data structures, etc. For example, iterating through a Vector with hundreds of thousands of records will be inefficient. A HashMap may be a more efficient data structure for this task. In most cases, code-level issues manifest as loops that take up CPU cycles in the JVM.

Then, there could be performance bugs in third-party frameworks use in application development. Ideally, all code-level issues should be captured by the QA team and fixed by the development team before production rollout. But this is not always the case.

Helpful Troubleshooting Tips:
  • Incorporate best practices during the entire software development lifecycle – from design, development, testing, to rollout. Development teams must be skilled to avoid code-level mistakes, and QA teams should be adept to catch issues proactively.
  • Incorporate code optimization practices to ensure the application code meets expected standards.
  • Use transaction profiling tools to isolate code-level problems automatically.
  • Reviewing applications logs could also provide good insights for debugging.

#8 Java Application Server Bottlenecks

The application server is a critical component of a Java application architecture. Popular Java application servers are Oracle WebLogic, IBM WebSphere, JBoss, WildFly, Tomcat, etc. A bottleneck in the application server would directly impact the business transactions and affect application performance and end-user experience. Problems in servlet execution, bean caching, queuing, JDBC connectivity, etc. will affect performance.

Transaction rollback is another issue to deal with. An application rollback is typically the result of pre-designed business logic. But a non-application rollback is a serious issue and needs to be addressed immediately. The application would throw an exception and the transaction would roll back in return, not allowing the end-user using the application to process their request. There are three types of non-application rollbacks:

  • system rollback happens due to a problem in the Java application server
  • time-out rollback happens because a process within the Java container times out
  • resource rollback happens when there is a problem in resource management of the Java container
Helpful Troubleshooting Tips:
  • Use application performance monitoring tools to monitor the health, availability and performance of Java application servers from end to end.
  • Track key application server metrics to understand anomalies and antipatterns.

#9 Server Performance Problems

All applications and supporting middleware and back end are run on enterprise servers. This could be a server operating system on a physical box, a virtual machine, or even in the cloud. A problem in the server hardware or resources will affect the performance of the application running on it. Insufficient CPU, memory, and disk, operating system errors, server hardware faults, high-CPU processes, zombie processes, corrupt services, etc. are some common problems administrators deal with. Especially when the server is virtualized, it is even more difficult to pinpoint problems. A resource contention issue at the virtualized host server can impact all the guest virtual machines and applications running on them. The same with containers and cloud-based application workloads

Helpful Troubleshooting Tips:
  • Constantly monitor the performance of servers and operating system.
  • Size servers optimally for maximum application performance.
  • Understand dependencies between the server (physical/virtual/cloud) and the applications running on them to causal analysis of application slowdowns.

#10 Network Latency and Connectivity Issues

Bandwidth congestion in the network, high latency and packet loss, misconfiguration in a router, DNS failure, etc. could affect the performance of applications. Usually there is a lot of finger-pointing between the application team and network team as to where the root cause of an application problem is and who needs to resolve it. When it is actually a network issue, the application team could be chasing a red herring on the server side.

Helpful Troubleshooting Tips:
  • Continuously monitor the network devices, traffic and configurations.
  • Compare and correlate network performance with application problems to easily know the impact of network performance on the applications and isolate the root cause.

It is important for developers to code efficiently and testers to catch and report issues – no doubt! The IT teams should also act proactively have the necessary performance management measures in place. Monitoring should not be an afterthought for organizations developing, hosting, implementing and using Java-based applications. Constant monitoring of user experience, application transactions, application code, and the supporting infrastructure is vital to detect and resolve problems before they become business-impacting.