What is Garbage Collection in Java and Why is it Important?

What is Garbage Collection in Java?For many, the world of Java is shrouded in mystery and endeavor. One such endeavor is garbage collection. There is many a viewpoint on garbage collection – whether it is good or bad, when is it needed, how often should it run, how to tune garbage collection operation, how to know when it is not operating as expected, and so on. In this educational post, we will try to clear the air on Java garbage collection and make it easy for developers and administrators to deal with it.

What is Java Garbage Collection?

Excessive Garbage CollectionJava applications obtain objects in memory as needed. It is the task of garbage collection (GC) in the Java virtual machine (JVM) to automatically determine what memory is no longer being used by a Java application and to recycle this memory for other uses. Because memory is automatically reclaimed in the JVM, Java application developers are not burdened with having to explicitly free memory objects that are not being used. The GC operation is based on the premise that most objects used in the Java code are short-lived and can be reclaimed shortly after their creation. Because unreferenced objects are automatically removed from the heap memory, GC makes Java memory-efficient.

Garbage collection frees the programmer from manually dealing with memory deallocation. As a result, certain categories of application program bugs are eliminated or substantially reduced by GC:

  • Dangling pointer bugs, which occur when a piece of memory is freed while there are still pointers to it, and one of those pointers is dereferenced. By then the memory may have been reassigned to another use with unpredictable results.
  • Double free bugs, which occur when the program tries to free a region of memory that has already been freed and perhaps already been allocated again.
  • Certain kinds of memory leaks, in which a program fails to free memory occupied by objects that have become unreachable, which can lead to memory exhaustion.

There are two types of garbage collection activity that usually happens in Java:

  • A minor or incremental garbage collection is said to have occurred when unreachable objects in the young generation heap memory are removed.
  • A major or full garbage collection is said to have occurred when the objects that survived the minor garbage collection and copied into the old generation or permanent generation heap memory are removed. When compared to young generation, garbage collection happens less frequently in old generation.

To free up memory, the JVM must stop the application from running for at least a short time and executes GC. This process is called “stop-the-world.” This means all the threads, except for the GC threads, will stop executing until the GC threads are executed and objects are freed up by the garbage collector.

Modern GC implementations try to minimize blocking “stop-the-world” stalls by doing as much work as possible on the background (i.e. using a separate thread), for example marking unreachable garbage instances while the application process continues to run.

Garbage collection consumes CPU resources for deciding which memory to free. Various garbage collectors have been developed over time to reduce the application pauses that occur during garbage collection and at the same time to improve on the performance hit associated with garbage collection.

The traditional Oracle HotSpot JVM has four ways of performing the GC activity:

  • Serial where just one thread executed the GC
  • Parallel where multiple minor threads are executed simultaneously each executing a part of GC
  • Concurrent Mark Sweep (CMS) which is similar to parallel, but also allows the execution of some application threads, and reducing the frequency stop-the-world GC
  • G1 which is also run in parallel and concurrently but functions differently than CMS

Many JVMs, such as Oracle HotSpot, JRockit, OpenJDK, IBM J9, and SAP JVM, use stop-the-world GC techniques. Modern JVMs like Azul Zing use Continuously Concurrent Compacting Collector (C4), which eliminates the stop-the-world GC pauses that limit scalability in the case of conventional JVMs.

Why is Monitoring Garbage Collection Important?

Monitoring Garbage Collection ImportantGarbage collection can impact the performance of Java application in unpredictable ways. When there is frequent GC activity, it adds a lot of CPU load and slows down application processing. In turn, this leads to slow execution of business transaction and ultimate affects the user experience of end users accessing the Java application.

Excessive garbage collection activity can occur due to a memory leak in the Java application. Insufficient memory allocation to the JVM can also result in increased garbage collection activity. And when excessive garbage collection activity happens, it often manifests as increased CPU usage of the JVM!

For optimal Java application performance, it is critical to monitor a JVM’s GC activity. For good performance, full GCs should be few and far between. The time spent on GC should be low – typically less than 5% and the percentage of CPU spent for garbage collection should be very less (this allows application threads to use almost all the available CPU resources).

What Are the Key Garbage Collection Metrics to Monitor?

To know if garbage collection is creating Java performance problems, you need to track all aspects of the garbage collection activity in the JVM:

  • When garbage collection happened
  • How often garbage collection is happening
  • How much memory is being collected each time
  • How long garbage collection is running for
  • Percentage of time spent by JVM for garbage collection
  • What type of garbage collection happened – minor or full GC?
  • JVM heap and non-heap memory usage
  • CPU utilization of the JVM

This allows you to identify when garbage collection is taking too long and impacting performance, which will help you to determine the optimal settings for each application based on historical patterns and trends.

Tracking time taken for garbage collectionTracking time taken for garbage collection

When GC activity is excessive, take heap dumps of the JVM’s memory and analyze the top memory consuming objects. Any unusually large objects are an indicator of memory leaks in the application code. On the other hand, if no object is occupying an unusually large amount of memory and if the percentage of memory used by any of the JVM’s memory pools is close to 100%, this is an indicator that the JVM’s memory configuration may be insufficient. In this case, you may need to increase the corresponding JVM memory pool for improved application performance.

Summary

Now that we have fair understanding of Java garbage collection, let’s summarize by answering some of the questions we started the blog with:

  • Is garbage collection good or bad? Definitely good. But, as the adage goes, too much of anything is a bad thing. So, you need to make sure Java heap memory is properly configured and managed so GC activity is optimized.
  • When is GC needed? It is needed when there are unreferenced objects to be cleared out. Since it is not a manual activity, the JVM will automatically take care of this for you. From all the information above, you would have understood why GC is needed and when. So, that answers this question.
  • How to tune garbage collection? There are two common ways to do this:
    1. Keep the number of objects passed to the old generation area to a minimum
    2. Configure the major (or full) GC time to be low

    Some critical JVM parameters to configure for right-sizing the JVM’s memory are -Xms, -Xmx, and -NewRatio (ratio of new generation and old generation size)

  • How to know when GC is not operating as expected? JVM monitoring is key. Make sure to track vital JVM metrics and be alerted when GC activity is deviating from the norm.