Reader small image

You're reading from  Java Coding Problems - Second Edition

Product typeBook
Published inMar 2024
PublisherPackt
ISBN-139781837633944
Edition2nd Edition
Right arrow
Author (1)
Anghel Leonard
Anghel Leonard
author image
Anghel Leonard

Anghel Leonard is a Chief Technology Strategist and independent consultant with 20+ years of experience in the Java ecosystem. In daily work, he is focused on architecting and developing Java distributed applications that empower robust architectures, clean code, and high-performance. Also passionate about coaching, mentoring and technical leadership. He is the author of several books, videos and dozens of articles related to Java technologies.
Read more about Anghel Leonard

Right arrow

Garbage Collectors and Dynamic CDS Archives

This chapter includes 15 problems covering garbage collectors and Application Class Data Sharing (AppCDS).

By the end of this chapter, you’ll have a profound understanding of how a garbage collector (GC) works and how you can tune it for maximum performance. Moreover, you’ll have a good understanding of how AppCDS can boost your application startup.

Problems

Use the following problems to test your advanced programming prowess in garbage collectors and application class data sharing in Java. I strongly encourage you to give each problem a try before you turn to the solutions and download the example programs:

  1. Hooking the garbage collector goal: Introduce Java garbage collectors quickly. Highlight the main objectives (advantages) and disadvantages of a garbage collector.
  2. Handling the garbage collector stages: List and briefly describe the most common stages of a garbage collector.
  3. Covering some garbage collector terminology: A garbage collector has specific terminology. Provide here the main terms used in conjunction with garbage collectors.
  4. Tracing the generational GC process: Exemplify and explain a hypothetical scenario containing several consecutive runs of a generational garbage collector.
  5. Choosing the correct garbage collector: List and explain the three main factors that should be considered...

243. Hooking the garbage collector goal

Every programming language has to manage memory usage. Some programming languages delegate this task to programmers, while others leverage different mechanisms to partially control how memory is used. Java programmers can focus 100% on the functionalities of the application and let the garbage collector manage how memory is used.

The name garbage collector suggests an entity capable of finding and collecting garbage from memory. Actually, a garbage collector is a very complex process representing the climax of Java memory management that is capable of tracking every object from the heap and identifying and removing the ones that are not used/referenced by the application. The main advantages of a garbage collector include:

  • The Java programmer doesn’t need to manually handle the allocation/deallocation of memory.
  • The Java programmer doesn’t need to deal with dangling and wild pointers (https://en.wikipedia.org...

244. Handling the garbage collector stages

During its work, GC passes through different stages or steps. It can pass through one or more of the following stages:

  • Mark – In this stage, the GC identifies and marks (or paints) all pieces of memory (blocks) that are used (have references) and not used (have no references). The marked (painted) blocks are called live objects, while the rest are called non-live objects. Imagine that you go to the pantry and identify all the fresh fruits and vegetables and separate them from the spoiled ones.
  • Sweep – In this stage, the GC removes all non-live objects from memory. Next, you take all the spoiled fruits and vegetables out of the pantry and throw them away.
  • Compact – In this stage, the GC attempts to group the live objects closer together – in other words, it arranges the live objects at the start of the heap in a continuous sequence of memory blocks. So, compacting involves defragmentation...

245. Covering some garbage collector terminology

Garbage collection has its own terminology that it is essential to know in order to better understand how it works. Some of these terms are presented here; we start with epoch, single pass, and multiple passes.

Epoch

A GC works in cycles. A complete cycle of a GC is known as an epoch.

Single and multiple passes

A GC can handle its internal steps in a single pass (single-pass) or multiple passes (multi-pass). In the case of single-pass, the GC groups multiple steps and handles them in a single run. On the other hand, in the case of multi-pass, the GC handles multiple steps in a sequence of several passes.

Serial and parallel

A GC is considered serial if it uses a single thread. On the other hand, a GC is considered parallel if it uses multiple threads.

Stop-the-World (STW) and concurrent

A GC is of the type Stop-the-World (STW) if it has to stop (temporarily suspend) the application execution in order...

246. Tracing the generational GC process

In this problem, let’s start from an arbitrary initial state of a generational GC and follow a few hypothetical epochs (generally, all generational GC works more or less as you’ll see in this problem). We start with the following diagram:

Figure 12.2.png

Figure 12.2: GC initial state

At its initial state, the GC has an almost full Eden space (it stores objects 1, 4, 5, 2, 6, and 3, and some free space – represented by those white gaps between objects) and empty Survivor and Tenured spaces. Moreover, object 7 should be added in the Eden space but there is not enough memory for it. When the Eden space cannot accommodate more objects, the GC triggers a MinorGC event. First, the non-live objects are identified. Here (as you can see in the following diagram), we have three objects (5, 2, and 3) that should be collected as garbage:

Figure 12.3.png

Figure 12.3: Identify the non-live objects from the Eden space

These three objects are...

247. Choosing the correct garbage collector

As you’ll see in the next problem, Java allows us to choose between several garbage collectors. There is no silver bullet, so choosing the correct garbage collector for your particular application is an important decision that should be made based on three factors: throughput, latency, and footprint.

Figure 12.14.png

Figure 12.14: The factors that affect the choice of GC

Throughput represents the total time spent running the application code vs. running the GC. For instance, your application may run 97% of the total time, so you have a throughput of 97%. The remaining 3% is the time spent running the GC.

Latency measures how much the execution of the application is delayed by pauses caused by the GC. This is important because latency can affect the application’s responsiveness. These pauses may lead, at the interactivity level, to an unpleasant experience for the end users.

Footprint represents the extra memory needed by...

248. Categorizing garbage collectors

Garbage collectors have evolved exactly as Java itself has evolved. Today (JDK 21), we distinguish between several GC types, as follows:

  • Serial garbage collector
  • Parallel garbage collector
  • Garbage-First (G1) collector
  • Z Garbage Collector (ZGC)
  • Shenandoah Garbage Collector (not generational)
  • Concurrent Mark Sweep (CMS) collector (deprecated)

Let’s tackle the main aspects of each GC type.

Serial garbage collector

The serial garbage collector is an STW single-threaded generational collector. Before running its own algorithms, this GC freezes/pauses all the application threads. This means that this GC is not suitable for multi-threaded applications such as server-side components. However, being focused on a very small footprint (useful for small heaps), this collector is a good fit for single-threaded applications (and single-processor machines) that can easily accommodate and tolerate...

249. Introducing G1

The G1 Garbage Collector is probably the most mature, maintained, and improved GC in Java. It was introduced in JDK 7 update 4, and from JDK 9, it became the default GC. This GC sustains high throughput and low latency (a few hundred milliseconds), being known for its balanced performance.

Internally, G1 splits the heap into equally small chunks (max size of 32 MB), which are independent of each other and can be allocated dynamically to Eden, Survivor, or Tenured spaces. Each such chunk is called the G1 heap region. So, G1 is a region-based GC.

Figure 12.15.png

Figure 12.15: G1 splits the memory heap into equal small chunks

This architecture has a significant number of advantages. Probably, the most important one is represented by the fact that the Old generation can be cleaned up efficiently by cleaning it up in parts that sustain low latency.

For a heap size smaller than 4 GB, G1 will create regions of 1 MB. For heaps between 4 and 8 GB, G1 will create regions...

250. Tackling G1 throughput improvements

G1 has made major progress from JDK 8 to JDK 20. Some of these improvements have been reflected in throughput. Of course, this throughput improvement is dependent on a lot of factors (application, machine, tuning, and so on) but you may expect at least 10% higher throughput in JDK 18/20 than in JDK 8.

In order to increase throughput, G1 has passed through several changes, as follows.

Delaying the start of the Old generation

Starting with JDK 9, G1 is heavily focused on collecting garbage from the Young generation while delaying the start (initialization, resource allocation, and so on) of the Old generation to the last moment (it anticipates when the Old generation should be started).

Focusing on easy pickings

By easy pickings, we mean objects that are short-lived (for instance, temporary buffers), occupy a significant amount of heap, and can be collected easily at low cost with important benefits. Starting with JDK 9, G1...

251. Tackling G1 latency improvements

G1 GC latency has also recorded some improvements from JDK 8 to JDK 20 (which are obviously reflected in G1 GC throughput as well).

In order to decrease latency, G1 has passed through several changes, as follows.

Merge parallel phases into a larger one

Starting with JDK 8, many aspects of G1 have been parallelized. In other words, at any moment in time, we may have in execution multiple parallel phases. Starting with JDK 9, these parallel phases can be merged into a single larger one. In practice, this means less synchronization and less time spent creating/destroying threads. As a result, this improvement speeds up the parallelization processing, leading to less latency.

Reduction of metadata

Reduction of metadata was added in JDK 11. Practically, G1 attempts to manage less metadata by reducing its amount as much as possible. Less data to manage means better latency. Of course, this means a smaller footprint as well.

...

252. Tackling G1 footprint improvements

Between JDK 8 and JDK 20, the G1 footprint has been improved by focusing on efficient metadata and freeing the memory as quickly as possible.

In order to optimize its footprint, G1 has passed through several changes, as follows.

Maintain only the needed metadata

In order to maintain only the needed metadata, JDK 11 is capable of concurrently (re)creating the needed data and freeing it as fast as possible. In JDK 17, the focus on the needed metadata has been reiterated and only the absolutely required data is kept around. Moreover, JDK 18 comes up with a denser representation of data. All these improvements are reflected in a smaller footprint.

Release memory

Starting with JDK 17, the G1 GC is capable of concurrently releasing memory (giving it back to the OS). This means that memory can be optimally reused and is available to serve other tasks.

253. Introducing ZGC

Z Garbage Collector (ZGC) was introduced for the first time (as an experimental feature) in JDK 11. It was promoted to the production stage (production ready) in JDK 15 under JEP 377. It continues to be improved as we speak – in JDK 21, ZGC sustains application performance by maintaining separate generations for young and old objects. Basically, this minimizes allocation stalls and heap memory overhead. Moreover, JDK 21 (JEP 439) has promoted ZGC’s status from Targeted to Completed.

ZGC is concurrent (works at the same time as the application based on low-level concurrency primitives such as load barriers and colored pointers), tracing (traversing the object graph to identify live and non-live objects), and compacting (fight against fragmentation). It is also NUMA-aware and region-based.

ZGC was specially designed as a low-latency, highly scalable GC capable of handling from small (a few megabytes; the documentation states 8 MB) to massive...

254. Monitoring garbage collectors

Monitoring the activity and evolution in the timeline of your GC is a major aspect in order to identify potential performance issues. For instance, you may be interested in monitoring pause times, identifying the frequency and types of GC events, what spaces are filled up by the triggered GC events, and so on. The main goal is to collect as much information as possible that can be helpful in troubleshooting performance issues related to heap memory and GC evolution.

Any modern IDE provides profilers that contain (among other related things) information and real-time graphs about the GC epochs/cycles. For instance, the following figure is from the NetBeans IDE, which displays the GC evolution (heap status) as an item of the toolbar (by simply clicking on that area, you can force the GC to perform garbage collection):

Figure 12.22.png

Figure 12.22: NetBeans display GC evolution on the toolbar

Of course, a more detailed view is available via the NetBeans...

255. Logging garbage collectors

Analyzing the GC logs is another approach that can be useful for finding memory issues. Since GC logs don’t add a significant overhead, they can be enabled in production for debugging purposes. Really, GC logs have an insignificant overhead, so you should definitely use them!

Let’s consider some simple Java code that adds and removes from List<String>. Adding and removing the code requires a full GC via System.gc():

private static final List<String> strings = new ArrayList<>();
...
logger.info("Application started ...");
String string = "prefixedString_";
// Add in heap 5 millions String instances
for (int i = 0; i < 5_000_000; i++) {
  String newString = string + i;
  strings.add(newString);
}
logger.info(() -> "List size: " + strings.size());
// Force GC execution
System.gc();
// Remove 10_000 out of 5 millions
for (int i = 0; i < 10_000; i++) {
  String newString = string...

256. Tuning garbage collectors

Garbage collectors are complex machinery whose performances are highly related to their settings (startup parameters) in the context of the current JVM, current application, and hardware. Since the GC consumes and shares resources (memory, CPU time, and so on) with our application, it is essential to tune it to work as efficiently as possible. If the GC is not efficient, then we may face significant pause times that will negatively impact the application run.

In this problem, we will cover the main tuning options available for the serial GC, parallel GC, G1 GC, and ZGC.

How to tune

Before attempting to tune the GC, ensure that it is really causing trouble. By inspecting and correlating the charts and logs, you can identify such troubles and decide where you should act (what parameters should be tuned). Check out the usage of the heap memory and how objects fill up the Eden, Survivor, and Tenured spaces.

Typically, a healthy GC produces...

257. Introducing Application Class Data Sharing (AppCDS, or Java’s Startup Booster)

Launching a Java application is a multi-step process. Before executing the bytecode of a class, the JVM has to perform at least the following steps for a given class name:

  1. Look up the class on disk (JVM has to scan the disk and find the given class name).
  2. Load the class (JVM opens the file and loads its content).
  3. Check the bytecode (JVM verifies the integrity of the content).
  4. Pull the bytecode internally (JVM transfers the code into an internal data structure).

Obviously, these steps are not cost-free. Loading hundreds/thousands of classes will have a significant overhead on launching time and memory footprint. Typically, an application’s JAR remains unchanged for a long time, but JVM performs the previous steps and obtains the same result every time we launch the application.

Improving/accelerating the startup performance and even reducing the...

Summary

This chapter covered 15 problems with garbage collectors and AppCDS. Even if these problems have been mostly theoretical, they still represent major topics that can boost your application performance at runtime (in the GC case) and startup (in the AppCDS case).

Join our community on Discord

Join our community’s Discord space for discussions with the author and other readers:

https://discord.gg/8mgytp5DGQ

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Java Coding Problems - Second Edition
Published in: Mar 2024Publisher: PacktISBN-13: 9781837633944
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Anghel Leonard

Anghel Leonard is a Chief Technology Strategist and independent consultant with 20+ years of experience in the Java ecosystem. In daily work, he is focused on architecting and developing Java distributed applications that empower robust architectures, clean code, and high-performance. Also passionate about coaching, mentoring and technical leadership. He is the author of several books, videos and dozens of articles related to Java technologies.
Read more about Anghel Leonard