Garbage Collection in Java

Java Virtual Machine

A Java program written in a .java file is compiled into bytecode which is stored in a .class file. This bytecode is executed on Java Virtual Machine (JVM), not directly on an operating system. Since JVMs specific to each operating system are distributed, we don’t have to worry about the platform on which a java applications is going to be executed; which is known as platform independence.

The most well-known JVM implementation is HotSpot, which includes the followings; the classloader, the runtime data areas, and the execution engine.

Runtime Data Areas; JVM Stack and Heap

JVM defines various runtime data areas. While some of them are created when JVM starts, and destroyed when JVM exits, others are specific to individual threads.

Java Virtual Machine stack (JVM stack) is a per-thread data. This data area contains local variables and plays a part in method invocation and return. If a thread requires a larger JVM stack than is permitted, a StackOverFlowError would be thrown. Additionally, If a JVM stack attempts to expand its area but fails due to an insufficient memory, JVM would throw an OutOfMemoryError.

On the other hand, a heap is created when JVM starts, and shared among all threads. A heap is the data area where your object data is stored, and managed by the garbage collector selected when an JVM starts. If a computation requires a larger area than can be made available by the automatic storage management system, JVM would throw an OutOfMemoryError. Most tuning options are related to sizing the heap and choosing an appropriate garbage collector.

Generational Garbage Collection

An automatic garbage collection process involves two steps; marking and deleting. First, the garbage collector inspects the heap to identify whether each object is still used by the application. Objects which are not going to be used or referenced anymore are marked by the garbage collector. And then, the garbage collector deletes the objects that were marked in the previous step, leaving alive objects and the pointers which point to free space. Some implementations may also involve defragmentation after deletion.

Scanning all the heap is a time-consuming process. Additionally, empirical studies have shown that most objects are referenced just for a short time. Consequently, it improves efficiency to divide the heap into smaller parts, allocate objects to them according to their life time, and scan the short-lived part more frequently, which is the concept of generational garbage collection.

In generational garbage collection, the heap is divided into three smaller parts; the young generation, old generation, and permanent generation.

Every new object is stored in the young generation, and when the young generation is full, a minor garbage collection occurs. Typically, a threshold is configured over the young generation, which specifies when objects go to the old generation. Eventually, a major garbage collection is performed in the old generation. The permanent generation stores metadata required by JVM, which is subject to collection during a full garbage collection.

These garbage collections are “Stop the World” events, which means that all the threads stop until the operation completes.

Garbage Collectors

There’re some common options related to heap sizing.

  • -Xms: The initial heap size.
  • -Xmx: The maximum heap size.
  • -Xmn: The size of the young generation.
  • -XX:PermSize: The initial size of the permanent generation.
  • -XX:MaxPermSize: The maximum size of the permanent generation.

The well-known garbage collectors are the serial collector and the parallel collector, which used to be the default collectors.

In contrast to these collectors, the G1 collector and the CMS collector run without stopping the application threads. The G1 collector has been the default collectors since JDK 9.