Chapter 1: Evolution of Java Virtual Machine
This chapter will walk you through the evolution of Java Virtual Machine (JVM), and how it optimized the interpreter and compiler. We will learn about C1 and C2 compilers and various types of code optimizations that the JVM performs to run Java programs faster.
In this chapter, we will cover the following topics:
- Introduction to GraalVM
- Learning how JVM works
- Understanding the JVM architecture
- Understanding the kind of optimizations JVM performs with Just-In-Time (JIT) compilers
- Learning the pros and cons of the JVM approach
By the end of this chapter, you will have a clear understanding of the JVM architecture. This is critical in understanding the GraalVM architecture and how GraalVM further optimizes and builds on top of JVM best practices.
Introduction to GraalVM
GraalVM is a high-performance VM that provides the runtime for modern cloud-native applications. Cloud-native applications are built based on the service architecture. The microservice architecture changes the paradigm of building micro applications, which challenges the fundamental way we build and run applications. The microservices runtimes demand a different set of requirements.
Here are some of the key requirements of a cloud-native application built on the microservice architecture:
- Smaller footprint: Cloud-native applications run on the "pay for what we use" model. This means that the cloud-native runtimes need to have a smaller memory footprint and should run with the optimum CPU cycles. This will help run more workloads with fewer cloud resources.
- Quicker bootstrap: Scalability is one of the most important aspects of container-based microservices architecture. The faster the application's bootup, the faster it can scale the clusters. This is even more important for serverless architectures, where the code is initialized and run and then shut down on request.
- Polyglot and interoperability: Polyglot is the reality; each language has its strengths and will continue to. Cloud-native microservices are being built with different languages. It's very important to have an architecture that embraces the polyglot requirements and provides interoperability across languages. As we move to modern architectures, it's important to reuse as much code and logic as possible, that is time-tested and critical for business.
GraalVM provides a solution to all these requirements and provides a common platform to embed and run polyglot cloud-native applications. It is built on JVM and brings in further optimizations. Before understanding how GraalVM works, it's important to understand the internal workings of JVM.
Traditional JVM (before GraalVM) has evolved into the most mature runtime implementation. While it has some of the previously listed requirements, it is not built for cloud-native applications, and it comes with its baggage of monolith design principles. It is not an ideal runtime for cloud-native applications.
This chapter will walk you through in detail how JVM works and the key components of the JVM architecture.
Learning how JVM works
Java is one of the most successful and widely used languages. Java has been very successful because of its write once, run anywhere design principle. JVM realizes this design principle by sitting between the application code and the machine code and interpreting the application code to machine code.
Traditionally, there two ways of running application code:
- Compilers: Application code is directly compiled to machine code (in C, C++). Compilers go through a build process of converting the application code to machine code. Compilers generate the most optimized code for a specific target architecture. The application code has to be compiled to target architectures. In general, the compiled code always runs faster than interpreted code, and issues with code semantics can be identified during compilation time rather than runtime.
JVM has taken the best of both interpreters and compilers. The following diagram illustrates how JVM runs the Java code using both the interpreter and compiler approaches:
- Java Compiler (javac) compiles the Java application source code to bytecode (intermediate format).
- JVM interprets the bytecode to machine code line by line at runtime. This helps in translating the optimized bytecode to target machine code, helping in running the same application code on different target machines, without re-programming or re-compiling.
- JVM also has a Just-In-Time (JIT) compiler to further optimize the code at runtime by profiling the code.
In this section, we looked at how Java Compiler and JIT work together to run Java code on JVM at a higher level. In the next section, we will learn about the architecture of JVM.
Understanding the JVM architecture
Over the years, JVM has evolved into the most mature VM runtime. It has a very structured and sophisticated implementation of a runtime. This is one of the reasons why GraalVM is built to utilize all the best features of the JVM and provide further optimizations required for the cloud-native world. To better appreciate the GraalVM architecture and optimizations that it brings on top of the JVM, it's important to understand the JVM architecture.
This section walks you through the JVM architecture in detail. The following diagram shows the high-level architecture of various subsystems in JVM:
The rest of this section will walk you through each of these subsystems in detail.
Class loader subsystem
The class loader subsystem is responsible for allocating all the relevant
.class files and loading these classes to the memory. The class loader subsystem is also responsible for linking and verifying the schematics of the
.class file before the classes are initialized and loaded to memory. The class loader subsystem has the following three key functionalities:
The following diagram shows the various components of the class loader subsystem:
Let's now look at what each of these components does.
In traditional compiler-based languages such as C/C++, the source code is compiled to object code, and then all the dependent object code is linked by a linker before the final executable is built. All this is part of the build process. Once the final executable is built, it is then loaded into the memory by the loader. Java works differently.
Java source code (
.java) is compiled by Java Compiler (
javac) to bytecode (
.class) files. Class loader is one of the key subsystems of the JVM, which is responsible for loading all the dependent classes that are required to run the application. This includes the classes that are written by the application developer, the libraries, and the Java Software Development Kit (SDK) classes.
There are three types of class loaders as part of this system:
- Bootstrap: Bootstrap is the first classloader that loads
rt.jar, which contains all the Java Standard Edition JDK classes, such as
java.io. Bootstrap is responsible for loading all the classes that are required to run any Java application. This is a core part of the JVM and is implemented in the native language.
- Extensions: Extension class loaders load all the extensions to the JDK found in the
extdirectory. Extension class loader classes are typically extension classes of the bootstrap implemented in Java. The extension class loader is implemented in Java (
- Application: The application class loader (also referred to as a system class loader) is a child class of the extension class loader. The application class loader is responsible for loading the application classes in the application class path (
CLASSPATHenv variable). This is also implemented in Java (
Bootstrap, extension, and application class loaders are responsible for loading all the classes that are required to run the application. In the event where the class loaders do not find the required classes,
ClassNotFoundException is thrown.
Class loaders implement the delegation hierarchy algorithm. The following diagram shows how the class loader implements the delegation hierarchy algorithm to load all the required classes:
Let's understand how this algorithm works:
- JVM looks for the class in the method area (this will be discussed in detail later in this section). If it does not find the class, it will ask the application class loader to load the class into memory.
- The application class loader delegates the call to the extension class loader, which in turn delegates to the bootstrap class loader.
- The bootstrap class loader looks for the class in the bootstrap
CLASSPATH. If it finds the class, it will load to the memory. If it does not find the class, control is delegated to the extension class loader.
- The extension class loader will try to find the class in the extension
CLASSPATH. If it finds the class, it will load to the memory. If it does not find the class, control is delegated to the application class loader.
- The application class loader will try to look for the class in
CLASSPATH. If it does not find it, it will raise
ClassNotFoundException, otherwise, the class is loaded into the method area, and the JVM will start using it.
Once the classes are loaded into the memory (into the method area, discussed further in the Memory subsystem section), the class loader subsystem will perform linking. The linking process consists of the following steps:
- Verification: The loaded classes are verified for their adherence to the semantics of the language. The binary representation of the class that is loaded is parsed into the internal data structure, to ensure that the method runs properly. This might require the class loader to load recursively the hierarchy of inherited classes all the way to
java.lang.Object. The verification phase validates and ensures that the methods run without any issues.
- Preparation: Once all the classes are loaded and verified, JVM allocates memory for class variables (static variables). This also includes calling static initializations (static blocks).
- Resolution: JVM then resolves by locating the classes, interfaces, fields, and methods referenced in the symbol table. The JVM might resolve the symbol during initial verification (static resolution) or may resolve when the class is being verified (lazy resolution).
The class loader subsystem raises various exceptions, including the following:
You can refer to the Java specifications for more details: https://docs.oracle.com/en/java/javase.
Once all the classes are loaded and symbols are resolved, the initialization phase starts. During this phase, the classes are initialized (new). This includes initializing the static variables, executing static blocks, and invocating reflective methods (
java.lang.reflect). This might also result in loading those classes.
Class loaders load all the classes into the memory before the application can run. Most of the time, the class loader has to load the full hierarchy of classes and dependent classes (though there is lazy resolution) to validate the schematics. This is time-consuming and also takes up a lot of memory footprint. It's even slower if the application uses reflection and the reflected classes need to be loaded.
After learning about the class loader subsystem, let's now understand how the memory subsystem works.
The memory subsystem is one of the most critical subsystems of the JVM. The memory subsystem, as the name suggests, is responsible for managing the allocated memory of method variables, heaps, stacks, and registers. The following diagram shows the architecture of the memory subsystem:
The memory subsystem has two areas: JVM level and thread level. Let's discuss each in detail.
JVM-level memory, as the name suggests, is where the objects are stored at the JVM level. This is not thread-safe, as multiple threads might be accessing these objects. This explains why programmers are recommended to code thread-safe (synchronization) when they update the objects in this area. There are two areas of JVM-level memory:
- Method: The method area is where all the class-level data is stored. This includes the class names, hierarchy, methods, variables, and static variables.
- Heap: The heap is where all the objects and the instance variables are stored.
Thread-level memory is where all the thread-local objects are stored. This is accessible/visible to the respective threads, hence it is thread-safe. There are three areas of the thread-level memory:
- Stack: For each method call, a stack frame is created, which stores all the method-level data. The stack frame consists of all the variables/objects that are created within the method scope, operand stack (used to perform intermediate operations), the frame data (which stores all the symbols corresponding to the method), and exception catch block information.
- Registers: PC registers keep track of the instruction execution and point to the current instruction that is being executed. This is maintained for each thread that is executing.
- Native Method Stack: The native method stack is a special type of stack that stores the native method information, which is useful when calling and executing the native methods.
Now that the classes are loaded into the memory, let's look at how the JVM execution engine works.
JVM execution engine subsystem
The JVM execution engine is the core of the JVM, where all the execution happens. This is where the bytecodes are interpreted and executed. The JVM execution engine uses the memory subsystem to store and retrieve the objects. There are three key components of the JVM execution engine, as shown:
We will talk about each component in detail in the following sections.
As mentioned earlier in this chapter, bytecode (
.class) is the input to the JVM. The JVM bytecode interpreter picks each instruction from the
.class file and converts it to machine code and executes it. The obvious disadvantage of interpreters is that they are not optimized. The instructions are executed in sequence, and even if the same method is called several times, it goes through each instruction, interprets it, and then executes.
The JIT compiler saves the day by profiling the code that is being executed by interpreters, identifies areas where the code can be optimized and compiles them to target machine code, so that they can be executed faster. A combination of bytecode and compiled code snippets provide the optimum way to execute the class files.
The following diagram illustrates the detailed workings of JVM, along with the various types of JIT compilers that the JVM uses to optimize the code:
Let's understand the workings shown in the previous diagram:
- The JVM interpreter steps through each bytecode and interprets it with machine code, using the bytecode to machine code mapping.
- JVM profiles the code consistently using a counter, to count the number of times a code is executed, and if the counter reaches a threshold, it uses the JIT compiler to compile that code for optimization and stores it in the code cache.
- JVM then checks whether that compilation unit (block) is already compiled. If JVM finds a compiled code in the code cache, it will use the compiled code for faster execution.
- JVM uses two types of compilers, the C1 compiler and the C2 compiler, to compile the code.
As illustrated in Figure 1.7, the JIT compiler brings in optimizations by profiling the code that is running and, over a period of time, it identifies the code that can be compiled. The JVM runs the compiled snippets of code instead of interpreting the code. It is a hybrid method of running interpreted code and compiled code.
JVM introduced two types of compilers, C1 (client) and C2 (server), and the recent versions of JVM use the best of both for optimizing and compiling the code at runtime. Let's understand these types better:
- C1 compiler: A performance counter was introduced, which counted the number of times a particular method/snippet of code is executed. Once a method/code snippet is used a particular number of times (threshold), then that particular code snippet is compiled, optimized, and cached by the C1 compiler. The next time that code snippet is called, it directly executes the compiled machine instructions from the cache, rather than going through the interpreter. This brought in the first level of optimization.
- C2 compiler: While the code is getting executed, the JVM will perform runtime code profiling and come up with code paths and hotspots. It then runs the C2 compiler to further optimize the hot code paths. This is also known as a hotspot.
C1 is faster and good for short-running applications, while C2 is slower and heavy, but is ideal for long-running processes such as daemons and servers, so the code performs better over time.
In Java 6, there is a command-line option to use either C1 or C2 methods (with the command-line arguments
-client (for C1) and
-server (for C2)). In Java 7, there is a command-line option to use both. Since Java 8, both C1 and C2 compilers are used for optimization as the default behavior.
There are five tiers/levels of compilation. Compilation logs can be generated to understand which Java method is compiled using which compiler tier/level. The following are the five tiers/levels of compilation:
- Interpreted code (level 0)
- Simple C1 compiled code (level 1)
- Limited C1 compiled code (level 2)
- Full C1 compiled code (level 3)
- C2 compiled code (level 4)
Let's now look at the various types of code optimizations that the JVM applies during compilation.
The JIT compiler generates the internal representation of the code that is being compiled to understand the semantics and syntax. These internal representations are tree data structures, on which the JIT will then run the code optimization (as multiple threads, which can be controlled with the
XcompilationThreads options from the command line).
The following are some of the optimizations that the JIT compilers perform on the code:
- Inlining: One of the most common programming practices in object-oriented programming is to access the member variables through getter and setter methods. Inlining optimization replaces these getter/setter methods with actual variables. The JVM also profiles the code and identifies other small method calls that can be inlined to reduce the number of method calls. These are known as hot methods. A decision is taken based on the number of times that the method is called and the size of the method. The size threshold used by JVM to decide inlining can be modified using the
-XX:MaxFreqInlineSizeflag (by default, it is 325 bytes).
- Escape analysis: The JVM profiles the variables to analyze the scope of the usage of the variables. If the variables don't escape the local scope, it then performs local optimization. Lock Elision is one such optimization, where the JVM decided whether a synchronization lock is really required for the variable. Synchronization locks are very expensive to the processor. The JVM also decides to move the object from the heap to the stack. This has a positive impact on memory usage and garbage collection, as the objects are destroyed once the method is executed.
- DeOptimization: DeOptimization is another critical optimization technique. The JVM profiles the code after optimization and may decide to deoptimize the code. Deoptimizations will have a momentary impact on performance. The JIT compiler decides to deoptimize in two cases:
a. Not Entrant Code: This is very prominent in inherited classes or interface implementations. JIT may have optimized, assuming a particular class in the hierarchy, but over time when it learns otherwise, it will deoptimize and profile for further optimization of more specific class implementations.
b. Zombie Code: During Not Entrant code analysis, some of the objects get garbage collected, leading into code that may never be called. This code is marked as zombie code. This code is removed from the code cache.
Apart from this, the JIT compiler performs other optimizations, such as control flow optimization, which includes rearranging code paths to improve efficiency and native code generation to the target machine code for faster execution.
JIT compiler optimizations are performed over a period of time, and they are good for long-running processes. We will be going into a detailed explanation on JIT compilation in Chapter 2, JIT, Hotspot, and GraalVM.
Java ahead-of-time compilation
The ahead-of-time compilation option was introduced with Java 9 with
jaotc, where a Java application code can be directly compiled to generate final machine code. The code is compiled to a target architecture, so it is not portable.
Java supports running both Java bytecode and AOT compiled code together in an x86 architecture. The following diagram illustrates how it works. This is the most optimum code that Java can generate:
The bytecode will go through the approach that was explained previously (C1, C2).
jaotc compiles the most used java code (like libraries) into machine code, ahead of time, and this is directly loaded into the code cache. This will reduce the load on JVM. The Java byte code goes through the usual interpreter, and uses the code from the code cache, if available. This reduces a lot of load on JVM to compile the code at runtime. Typically, the most frequently used libraries can be AOT compiled for faster responses.
One of the sophistication of Java is its in-built memory management. In languages such as C/C++, the programmer is expected to allocate and de-allocate the memory. In Java, JVM takes care of cleaning up the unreferenced objects and reclaims the memory. The garbage collector is a daemon thread that performs the cleanup either automatically or can also be invoked by the programmer (
Java allows programmers to access native libraries. Native libraries are typically those libraries that are built (using languages such as C/C++) and used for a specific target architecture. Java Native Interface (JNI) provides an abstraction layer and interface specification for implementing the bridge to access the native libraries. Each JVM implements JNI for the specific target system. Programmers can also use JNI to call the native methods. The following diagram illustrates the components of the native subsystem:
The native subsystem provides the implementation to access and manage the native libraries.
JVM has evolved and has one of the most sophisticated implementations of a language VM runtime.
In this chapter, we started by learning what GraalVM is, followed by understanding how JVM works and its architecture, along with its various subsystems and components. Later on, we also learned how JVM combines the best of interpreters and the compiler approach to run Java code on various target architectures, along with how a code is compiled just-in-time with C1 and C2 compilers. Lastly, we learned about various types of code optimizations that the JVM performs.
This chapter provided a good understanding of the architecture of JVM, which will help us understand how the GraalVM architecture works and how it is built on top of JVM.
The next chapter will cover the details of how JIT compilers work and help you understand how Graal JIT builds on top of JVM JIT.
- Why is Java code interpreted to bytecode and later compiled at runtime?
- How does JVM load the appropriate class files and link them?
- What are the various types of memory areas in JVM?
- What is the difference between the C1 compiler and the C2 compiler?
- What is a code cache in JVM?
- What are the various types of code optimizations that are performed just in time?
- Introduction to JVM Languages, by Vincent van der Leun, Packt Publishing (https://www.packtpub.com/product/introduction-to-jvm-languages/9781787127944)
- Java Documentation and Specification, by Oracle (https://docs.oracle.com/en/java/)