Java is one of the most popular technologies for application development. Tens of thousands of enterprise applications are powered by Java and millions of people use them daily. Java has been evolving over many decades and there are so many web frameworks, middleware, data access technologies and protocols built on Java. Compared to C, C++, and other languages where memory management is mostly done manually by the programmers, Java is self-regulating and manages memory (free-up and reclamation) on its own, automatically.
Despite this, performance problems can also occur in Java-based applications and when a problem happens, it can be business-impacting. In this blog, we will look at some popular problems that Java developers and administrators encounter and recommend some best practices to resolve and prevent them.
|Top 10 Common Java Performance Problems:|
|Application / Code|
#1 Out-of-Memory Errors in the JVM
The dreaded java.lang.OutOfMemoryError is an indication that the application is attempting to add more data to the memory, but there is no additional room for it. Out-of-memory errors result in failures that the application cannot recover from and hence, must be avoided at all cost.
There can be many reasons why an out-of-memory error occurs:
Under-provisioned memory: First, the configured heap memory in the JVM may not be not sufficient for the application. The application may attempt to put more data into the heap, but there is no more room for it. Consider the case of an application attempting to read and store a 256 MB file in memory. The JVM needs to be configured with a heap size of at least 256 MB for this to work. While specifying adequate heap memory for the JVM is important, it is equally important to ensure that the other memory spaces used by the JVM also have sufficient memory. For instance, the Oracle JVM has multiple memory spaces:
- Eden space for all objects initially
- Survivor space for objects that have survived garbage collection
- Tenured space for objects that have existed for some time in the survivor space
- Code cache where memory is used for compilation and storage of native code
- Permanent generation where class and method objects are stored
Each of these memory spaces has space usage limits that can be individually set. When any of these memory spaces is fully utilized, application errors will occur.
Spike in incoming traffic: Second, a spike in application load can trigger an out-of-memory exception. Consider a load balanced server cluster where each of the JVMs is configured to handle its normal load. When one of the nodes goes down, the other node will need to handle the additional workload. When the memory configured in the JVM in not sufficient to handle the increased workload, out-of-memory exceptions will occur.
Programming error: Third, a memory leak in the application can be caused by a programming error. The Java garbage collector is designed to reclaim the memory consumed by unused objects. But if a program keeps adding memory to the heap (e.g., a continuously growing hash table), an out-of-memory error is inevitable.
- The Xmx setting of a JVM controls the maximum heap setting of the JVM. Make sure you have set this setting to be sufficiently high so that your application can handle the expected workload.
- The limit for the individual memory spaces also must be tuned correctly.
- Monitor JVM memory spaces and growth patterns continuously to proactively detect situations when there is a memory shortfall.
- When excessive memory usage is detected, take a heap dump from the JVM, analyze the dump using a tool like the Eclipse Memory Analyzer and identify objects that are taking up an unusual amount of memory. Use this to fix code-level issues that may be causing memory leaks.
- Fix code-level issues that cause unused objects to use up heap memory
#2 Excessive Garbage Collection
Garbage collection (GC) is a very useful process in the JVM that frees up room to add new data in memory. As much as it is useful, it can also turn undesirable if it happens too often. When garbage collection runs, it can hog CPU, the JVM’s processing may be paused and this may choke the performance of the application. The Oracle JVM supports different garbage collection algorithms: serial collector, parallel collector, the concurrent mark sweep (CMS) collector and the Garbage-first collector (G1GC). The choice of the garbage collector can have an impact on performance.
- For best performance, garbage collection should be taking a small percentage of CPU time (< 10%).
- If more than 20% of CPU time is used for garbage collection, it means that the application has a significant memory related performance problem that must be corrected.
Configuring your JVM’s memory to be too large can also be detrimental to performance. In such a case, garbage collection can take a very long time to complete, affecting performance.
- Track instances of GC, time taken for GC, and % of GC time spent by the JVM.
- Look for times when full GC happens. This can cause application slowness.
- High CPU usage of the JVM can be caused by excessive garbage collection. When you don’t see your application threads taking CPU, check the performance of garbage collection. A memory issue can manifest as high CPU usage, making performance diagnosis difficult.
#3 Improper Data Caching
While caching is an essential process for faster reading of data in-memory (as opposed to making a database call across the network), it is counter intuitive to allocate excessive memory for caching. Sub-optimal memory configuration for caching will lead to more GC pauses and subsequently affect application processing.
Misconfiguration in caching will also lead to problems. Cached objects are stateful in nature, unlike pools that have stateless objects. When caching is not properly set, a recently used object could be mistakenly removed from the cache, to make room for a new object, resulting in a “cache miss” scenario when that object is being called. In addition to memory configuration, cache hit and cache miss configurations are also vital to set properly.
- Monitor cache size continuously and get alerted when it falls below or exceeds your accepted threshold
- Monitor cache hit and miss ratios to track the success of the caching process
- Ensure proper synchronization of distributed caching happens in multiple servers
#4 Thread Deadlocks and Gridlocks
Java applications, especially web-based applications are often multi-threaded. Multi-threading helps with scalability, but at the same time, when multiple threads need to access shared JVM resources (often memory), locking is used to ensure that access to the shared resources is exclusively provided to each thread. When one thread locks a resource, other threads wait for the lock to be released. The Java programming language makes it easy for developers to implement synchronization between threads. The synchronized keyword can be used to create a block of code that is synchronized. Methods can also be synchronized.
Because it is so easy to create synchronized blocks, many a times developers create synchronized blocks without understanding the performance implication of such code blocks. When hundreds of threads synchronize on the same lock, the Java application’s processing of requests is severely affected and users will experience excessive slowness. When such a situation happens in production and there are hundreds of threads running in the JVM, it is very difficult to determine which lock caused the slowness and which block of code is the culprit.
Another issue with thread locking is deadlocks. For example, thread A which has an object lock waits for the execution of thread B, while thread B has a lock of its own and waits for the execution of thread A. Now, both these threads are deadlocked and will never execute causing application hangs or crashes.
Too much synchronization also takes a toll on performance. By over-synchronizing threads, one could end up facing the problem of a thread gridlock, where many threads are using the same lock, and waiting until the lock gets released.
- Monitor the status of threads in the JVM and determine the count of threads in running, blocked and deadlocked state.
- Use Java performance monitoring tools to help automatically detect blocked threads and deadlocks.
- Identify the exact module and the line of code at which the locking is happening.
#5 Running Out of Database Connections
Most Java applications use database servers for persistent storage of data. Connections to the database server are used to store and retrieve data. Because establishing a database connection for each request is expensive, a connection pool is often used. The connection pool has an initial setting for the number of connections that will be pre-established when the application starts. When additional connections are required, the pool is dynamically grown subject to a maximum specified limit.
If the number of connections in use reaches the maximum limit, newer requests will have to wait until processing of existing database requests is completed. It is important for developers and DBAs to have a fair estimation of the application workload and set the configuration accordingly. At the same time, specific application modules or web pages may have connection leaks – i.e., a connection is obtained from the pool, but it is not released back into the pool. Such connection leaks will ultimately result in application errors being reported to users. High connection pool usage can also occur during times when the database server has slowed down its processing. Hence, it is important to differentiate performance issues that are a result of database connection leaks as compared to ones that are a result of a database server bottleneck.
- Continuously monitor connections to the database: total connections, active connections, etc.
- Track connection pool metrics, such as allocated, freed, created, closed and managed connections.
- Correlate application access patterns with database connection pool usage to identify the cause of connection leaks.
- Get visibility into waiting requests and connection delays, analyze these metrics along with health indicators of the database servers and determine times when database server bottlenecks are affecting application performance.
#6 Slow Database Calls
Database is an integral part of the application architecture. Performance of the application greatly depends on how fast the database responds and executes queries. According to a DZone performance monitoring survey, database problems rank second for the most likely cause of application performance problems. Not just that, but application developers wrongly get blamed for application issues when it is in fact a database query issue that should be addressed by a DBA. There are many reasons how a slow database query can affect application transaction processing:
Slow queries: When developing an application, developers are focused on getting the functionality right rather than on performance. While their database queries may be returning the correct results, these queries may not be designed optimally. For a query to be optimally designed, it must avoid full table scans at the database level. It must make use of database indexes, so the results are returned in the fastest possible manner. Getting queries to be optimally designed often requires the involvement of a database administrator. The DBA can analyze a query’s explain plan and provide recommendations on how to tune it for optimal performance. These could include redesigning the query, using existing indexes, recommendations for new indexes, addition of hints, etc.
Unused indexes: While it is good practice to have an index on every foreign key in a table, you must also bear in mind what kind of queries are being executed. An index may not be needed when you don’t use a specific column for your queries. Unused indexes will occupy space on the disk and the database need update the indexes every time when there is an insertion/deletion of records. This will slow down query processing.
Insufficient database resources: When database is running out of server resources, such as CPU, memory and disk, it will have an adverse impact of query execution.
- Analyze database queries issued by the application and identify web pages and corresponding queries that are taking time.
- Plan database sizing and configuration properly to ensure the consistent performance.
- Use database monitoring tools to identify and fix missing indexes, optimize the database layout by re-indexing, etc.
#7 Java Code-Level Issues
The DZone performance monitoring survey referenced earlier cites code-level problem as the top cause of application performance issues. Most code-level issues are due to bugs in the code constructs, such as long waits, poor iteration, inefficient code algorithms, bad choice of data structures, etc. For example, iterating through a Vector with hundreds of thousands of records will be inefficient. A HashMap may be a more efficient data structure for this task. In most cases, code-level issues manifest as loops that take up CPU cycles in the JVM.
Then, there could be performance bugs in third-party frameworks use in application development. Ideally, all code-level issues should be captured by the QA team and fixed by the development team before production rollout. But this is not always the case.
- Incorporate best practices during the entire software development lifecycle – from design, development, testing, to rollout. Development teams must be skilled to avoid code-level mistakes, and QA teams should be adept to catch issues proactively.
- Incorporate code optimization practices to ensure the application code meets expected standards.
- Use transaction profiling tools to isolate code-level problems automatically.
- Reviewing applications logs could also provide good insights for debugging.
#8 Java Application Server Bottlenecks
The application server is a critical component of a Java application architecture. Popular Java application servers are Oracle WebLogic, IBM WebSphere, JBoss, WildFly, Tomcat, etc. A bottleneck in the application server would directly impact the business transactions and affect application performance and end-user experience. Problems in servlet execution, bean caching, queuing, JDBC connectivity, etc. will affect performance.
Transaction rollback is another issue to deal with. An application rollback is typically the result of pre-designed business logic. But a non-application rollback is a serious issue and needs to be addressed immediately. The application would throw an exception and the transaction would roll back in return, not allowing the end-user using the application to process their request. There are three types of non-application rollbacks:
- A system rollback happens due to a problem in the Java application server
- A time-out rollback happens because a process within the Java container times out
- A resource rollback happens when there is a problem in resource management of the Java container
- Use application performance monitoring tools to monitor the health, availability and performance of Java application servers from end to end.
- Track key application server metrics to understand anomalies and antipatterns.
#9 Server Performance Problems
All applications and supporting middleware and back end are run on enterprise servers. This could be a server operating system on a physical box, a virtual machine, or even in the cloud. A problem in the server hardware or resources will affect the performance of the application running on it. Insufficient CPU, memory, and disk, operating system errors, server hardware faults, high-CPU processes, zombie processes, corrupt services, etc. are some common problems administrators deal with. Especially when the server is virtualized, it is even more difficult to pinpoint problems. A resource contention issue at the virtualized host server can impact all the guest virtual machines and applications running on them. The same with containers and cloud-based application workloads
- Constantly monitor the performance of servers and operating system.
- Size servers optimally for maximum application performance.
- Understand dependencies between the server (physical/virtual/cloud) and the applications running on them to causal analysis of application slowdowns.
#10 Network Latency and Connectivity Issues
Bandwidth congestion in the network, high latency and packet loss, misconfiguration in a router, DNS failure, etc. could affect the performance of applications. Usually there is a lot of finger-pointing between the application team and network team as to where the root cause of an application problem is and who needs to resolve it. When it is actually a network issue, the application team could be chasing a red herring on the server side.
- Continuously monitor the network devices, traffic and configurations.
- Compare and correlate network performance with application problems to easily know the impact of network performance on the applications and isolate the root cause.
It is important for developers to code efficiently and testers to catch and report issues – no doubt! The IT teams should also act proactively have the necessary performance management measures in place. Monitoring should not be an afterthought for organizations developing, hosting, implementing and using Java-based applications. Constant monitoring of user experience, application transactions, application code, and the supporting infrastructure is vital to detect and resolve problems before they become business-impacting.