Chapter 3: GraalVM Architecture

In Chapter 1, Evolution of Java Virtual Machine, we took a detailed look at the JVM architecture. In Chapter 2, JIT, HotSpot, and GraalJIT, we went into more detail on how JVM JIT compilers work. We also looked at how JVM has evolved into an optimum HotSpot VM, with C1 and C2 JIT compilers.

While the C2 compiler is very sophisticated, it has become a very complex piece of code. GraalVM provides a Graal compiler, which builds on top of all the best practices from the C2 compiler, but it is built entirely from the ground up in Java. Hence, Graal JIT is more object-oriented, and has modern and manageable code, with the support of all of the modern integrated development environments, tools, and utilities to monitor, tune, and manage the code. GraalVM is much more than just the Graal JIT compiler. GraalVM brings in a larger ecosystem of tools, runtimes, and APIs to support multiple languages (Polyglot) to run on VMs, leveraging the most mature and hardened JIT compilation provided by Graal.

In this chapter, we will focus on the GraalVM architecture and its various components to achieve the most advanced, fastest, polyglot runtime for the cloud. We will also explore the cloud-native architectural patterns, and how GraalVM is the best platform for the cloud.

Before we get into the details of the GraalVM architecture, we will begin by learning the requirements of a modern technical architecture. Later in the chapter, as we go through each of the GraalVM architectural components, we will address these requirements.

In this chapter, we will cover the following topics:

  • Reviewing modern architectural requirements
  • Learning what the GraalVM architecture is
  • Reviewing the GraalVM editions
  • Understanding the GraalVM architecture
  • An overview of the GraalVM microservices architecture
  • An overview of various microservices frameworks that can build code for GraalVM
  • Understanding how GraalVM addresses various non-functional aspects

By the end of this chapter, you will have a very clear understanding of the GraalVM architecture and how various components come together to provide a comprehensive VM runtime for polyglot applications.

Reviewing modern architectural requirements

Before we dig deeper into the GraalVM architecture, let's first understand the shortcomings of JVM and why we need a new architecture and approach. The older versions of JVM were optimized for traditional architectures, which were built for long-running applications that run in a data center, providing high throughput and stability (for example, monolith web application servers and large client-side applications). Some microservices are long-running, and Graal JIT will also provide the optimum solution. As we move to cloud-native, the whole architecture paradigm has shifted to componentized, modularized, distributed, and asynchronous architecture tuned to run efficiently with high scalability and availability requirements.

Let's break this down into more specific requirements for the modern cloud-native architectures.

Smaller footprint

The applications are composed of granular modular components (microservices) for high scalability. Hence, it is important to build the applications with a smaller footprint, so that they don't consume too much RAM and CPU. As we move to cloud-native deployments, it's even more important, as we have pay-per-use on the cloud. The smaller the footprint, the more we can run with fewer resources on the cloud. This has a direct impact on Total Cost of Ownership (TCO), one of the key business KPIs.

A smaller footprint also helps us to make changes and deploy them rapidly and continuously. This is very important in the agile world, where the systems are built to embrace change. As businesses change rapidly, applications are also required to embrace changes rapidly to support business decisions. In traditional monolith architectures, even a small change requires an overall build, test, and deployment. In modern architectures, we need flexibility to roll out changes in the functionality in a modular way, without bringing the production systems down.

We have new engineering practices such as A/B testing, where we perform the testing of these functional modules (microservices) in parallel with the older version, to decide whether the new version is good enough to roll out. We perform canary deployments (rolling updates), where the application components are updated, without stopping the production systems. We will cover these architectural requirements in more detail in the DevOps – continuous integration and delivery section later in this chapter.

Quicker bootstrap

Scalability is one of the most important requirements. Modern applications are built to scale up and down rapidly based on the load. The load has increased exponentially and modern-day applications are required to handle any load gracefully. With a smaller footprint, it's also expected that these application components (microservices) boot up quickly to start handling the load. As we move toward more serverless architectures, the application components are expected to handle bootup and shutdown on request. This requires a very rapid bootup strategy.

A quicker bootstrap and smaller footprint also pose the challenge of building application components with embeddable VM. The container-based approach requires these application components to be immutable.

Polyglot and interoperability

Polyglot is the reality: each language has its own strengths and will continue to have, so we need to embrace this fact. If you look at the core logic of the interpreter/compiler, they are all the same. They all try to achieve similar levels of optimization and generate the fastest running machine code with the smallest footprint. What we need is an optimum platform that can run these various applications, written in different languages, and also allow interoperability between them.

With these architecture requirement lists in mind, let's now understand how GraalVM works and how it addresses these requirements.

Learning what the GraalVM architecture is

GraalVM provides a Graal JIT compiler, an implementation of JVMCI (which we covered in the previous chapter), which is completely built on Java and uses C2 compiler optimization techniques as the baseline and builds on top of it. Graal JIT is much more sophisticated than a C2 compiler. GraalVM is a drop-in replacement for JDK, which means that all the applications that are currently running on JDK should run on GraalVM without any application code changes.

While GraalVM is built on Java, it not only supports Java, but also enables Polyglot development with JavaScript, Python, R, Ruby, C, and C++. It provides an extensible framework called Truffle that allows any language to be built and run on the platform.

GraalVM also provides AOT compilation to build native images with static linking. GraalVM comes with the following list of runtimes, libraries, and tools/utilities (this is for the GraalVM 20.3.0 version. The latest list of components can be found at https://www.graalvm.org/docs/introduction/.)

First, let's have a look at the core components in the following table:

Next, let's go through the list of additional tools and utilities in the following table:

Now that we are aware of the components in GraalVM, we will go through the various editions of GraalVM that are available, and the differences between these editions.

Reviewing the GraalVM editions (Community and Enterprise)

GraalVM is available as Community and Enterprise Editions:

  • Community Edition: GraalVM Community Edition (CE) is an open source edition built as an OpenJDK distribution. Most of the components of GraalVM are GPL 2, with a classpath exception licensed. For more details on licensing, please refer to https://github.com/oracle/graal#license. GraalVM CE is based on OpenJDK 1.8.272 and OpenJDK 11.0.9. GraalVM CE is community supported. It can be deployed in production. However, it does not come with the required support services from Oracle. Oracle also provides a Docker image, which is readily downloadable, for testing and evaluation (refer to https://www.graalvm.org/docs/getting-started/container-images/ for further details).
  • Enterprise Edition: GraalVM Enterprise Edition (EE) is a licensed version under the GraalVM OTN license agreement. This is free for evaluation and building non-production applications. GraalVM EE provides additional performance (~20% faster than CE and dynamic languages such as JavaScript, Python, R, and Ruby are ~2x faster), a smaller footprint (~2x smaller than CE), security (native code memory protection), and scalability for running production enterprise applications. EE comes with additional debugging tools, such as Ideal Graph Visualizer, which helps in not only debugging performance issues, but also in fine-tuning the applications for best performance on GraalVM. GraalVM EE comes with support services. For Oracle cloud customers, GraalVM EE support is available as part of the subscription. GraalVM EE also has a managed mode, which does better heap management, avoiding page faults and crashes. GraalVM EE is available for clients who already have a Java SE subscription.

Now that we know the various available editions of GraalVM, and what runtimes, tools, and frameworks come with it, let's get into the details of the GraalVM architecture.

Understanding the GraalVM architecture

In this section, we will look at the various architectural components of GraalVM. We will look at how various runtimes, tools, and frameworks come together to provide the most advanced VM and runtime. The following diagram shows the high-level architecture of GraalVM:

Figure 3.1 – Graal VM architecture

Figure 3.1 – Graal VM architecture

Let's go through each of these components in detail.

JVM (HotSpot)

JVM HotSpot is the regular Java HotSpot VM. The C2 compiler, which is part of the HotSpot VM, is replaced with the Graal JIT compiler implementation. The Graal JIT compiler is an implementation of Java Virtual Machine Compiler Interface (JVMCI) and plugs into the Java VM. We covered the architecture of JVM HotSpot in the previous chapters. Please refer to them for a more in-depth understanding of how JVM HotSpot works and the various architectural components of JVM.

Java Virtual Machine Compiler Interface (JVMCI)

JVMCI was introduced in Java 9. This allowed compilers to be written as plugins that JVM can call for dynamic compilation. It provides an API and a protocol to build compilers with custom implementations and optimizations.

The word compiler in this context means a just-in-time compiler. We went into a lot of detail on JIT compilers in the previous chapters. GraalVM uses JVMCI to get access to the JVM objects, interact with JVM, and install the machine code into the code cache.

Graal JIT implementation comes in two modes:

  • libgraal: libgraal is an AOT compiled binary that is loaded by HotSpot VM as a native binary. This is the default mode and the recommended way to run GraalVM with HotSpot VM. In this mode, libgraal uses its own memory space and does not use the HotSpot heap. This mode of Graal JIT has quick bootup and improved performance.
  • jargraal: In this mode, Graal JIT is loaded like any other Java class, and hence it goes through a warm-up phase and runs with an interpreter until the hot methods are identified and optimized. This mode can be invoked by passing the --XX:-UseJVMCINativeLibrary flag from the command line.

In OpenJDK 9+, 10+, and 11+, we use the -XX:+UnlockExperimentalVMOptions, -XX:+UseJVMCICompiler, and XX:+EnableJVMCI flags to run the Graal compiler, instead of the C2 compiler. GraalVM, by default, uses the Graal JIT compiler. It is always advisable to use GraalVM distributions, as these have the latest changes. OpenJDK gets the changes merged at a slower rate.

In the next chapter, we will be going into detail on how Graal JIT is better than the C2 JIT, using a sample code. We will be using the debugging tools and utilities that come with Graal to demonstrate the optimizations that Graal JIT performs at runtime.

Graal compiler and tooling

The Graal compiler is built on JVMCI and provides a better JIT compiler (C2 as we covered in both the previous chapters) implementation, with further optimizations. The Graal compiler also provides an AOT (Graal AOT) compilation option to build native images that can run standalone with embedded VMs.

Graal JIT compiler

We looked at the JVM architecture in Chapter 1, Evolution of Java Virtual Machine. For reference, here is the high-level architecture overview of JVM:

Figure 3.2 – JVM architecture with a C2 compiler

Figure 3.2 – JVM architecture with a C2 compiler

As you can see, the C1 and C2 compilers implement the JIT compilation as part of the JVM execution engine. We went into a lot of detail on how C1 and C2 optimize/deoptimize the code based on the compilation threshold.

GraalVM replaces the JIT compiler in JVM and incorporates further optimization. The following diagram shows the high-level architecture of GraalVM:

Figure 3.3 – VM architecture with Graal compiler

Figure 3.3 – VM architecture with Graal compiler

One of the differences between the JVM JIT compiler and Graal JIT is that Graal JIT is built to optimize the intermediate code representation (abstract syntax tree (AST) and using Graal graphs, or Graal intermediate representation). Java represents the code as an AST, an intermediate representation, while compiling.

Any language expressions and instructions can be converted and represented as ASTs; this helps in abstracting the language-specific syntax and semantics from the logic of optimizing the code. This approach makes GraalVM capable of optimizing and running code written in any language, as long as the code can be converted into an AST. We will be doing a deep dive into Graal graphs and ASTs in Chapter 4, Graal Just-In-Time Compiler.

The four key components of the Graal JIT compiler are as follows:

  • Profiler: As the name suggests, it profiles the running code and generates the information that is used by the code optimizer to take decisions or make assumptions regarding optimization.
  • Intermediate Code Generator: This generates the intermediate code representation, which is the input for the code optimizer.
  • Code Optimizer: This uses the data that is collected by profiles and optimizes the intermediate code.
  • Target Code Generator: The optimized code is then converted to the target machine code.

The following diagram shows how Graal JIT works at a very high level:

Figure 3.4 – Graal JIT compilation – a high-level flowchart

Figure 3.4 – Graal JIT compilation – a high-level flowchart

Let's understand this flowchart better:

  • The JVM language (Java, Kotlin, Groovy, and so on) code runs on Graal JIT natively, and Graal JIT optimizes the code.
  • The non-JVM languages (such as JavaScript and Ruby) implement a language parser and interpreter using the Truffle API. The language interpreters convert the code into AST representation. Graal runs the JIT compilation on the intermediate representation. This helps in leveraging all the advanced optimization techniques implemented by Graal to non-JVM languages.
  • The native LLVM-based languages (such as C/C++, Swift, and Objective C) follow a slightly different route to convert to the intermediate representation. Graal Sulong is used to create the intermediate representation that is used by Graal. We will be talking about Truffle and Sulong later in this chapter.

Graal JIT optimization strategies

Graal JIT optimization strategies are built from the ground up, based on the best practices from the C2 JIT compiler optimization strategies. Graal JIT builds on top of the C2 optimization strategies and provides more advanced optimization strategies. Here are some of the optimization strategies that the Graal JIT compiler performs:

We will be covering these optimization strategies in detail, with the help of sample code and examples, in the next chapter.

Truffle

The Truffle framework is an open source library for building the interpreters and the tools/utilities (such as integrated development environments, debuggers, and profilers). The Truffle API is used to build language interpreters that can run on GraalVM, leveraging the optimizations provided by GraalVM.

The Graal and Truffle frameworks consist of the following APIs that enable Polyglot:

  • Language Implementation Framework: This framework is used by the language implementers. It also comes with a reference implementation of a language called SimpleLanguage. We will be going through this in detail in Chapter 9, Graal Polyglot – LLVM, Ruby, and WASM.
  • Polyglot API: This set of APIs aids interaction between code written in different languages (guest languages) with Java (the host language). For example, a Java (host) program can embed R (guest) language code to perform some machine learning/AI logic. The Polyglot API provides the framework that will help the language programmers to manage the objects between the guest and the host.
  • Instrumentation: The Truffle Instrumentation API provides the framework for utilities/tool builders to build integrated development/debugging environments, tools, and utilities. The tools and utilities that are built using the Truffle Instrumentation API can work with any language that is implemented with Truffle. This provides a consistent developer experience across various languages and leverages the sophisticated debugging/diagnostic capabilities of JVM.

Figure 3.5 shows the high-level architecture of how Truffle acts as an intermediate layer between GraalVM and other language interpreters. The individual language interpreters are implemented using the Truffle API. Truffle also provides an interoperability API, for calling methods and passing data between methods across various language implementations:

Figure 3.5 – Truffle architecture

Figure 3.5 – Truffle architecture

As represented in the previous diagram, Java applications run directly on the GraalVM, with Graal Compiler replacing the C2 JIT compiler. Other language programs run on top of the Truffle Language Implementation framework. The respective language interpreters use the Truffle to implement the interpreters. Truffle combines the code along with the interpreter to produce the machine code, using partial evaluation.

AST is the intermediate representation. AST provides the optimum way to represent the syntactic structure of the language, where typically, the parent node is the operator, and the children node represents the operands or operators (based on cardinality). The following diagram shows a rough representation of AST:

Figure 3.6 – AST for simple expression

Figure 3.6 – AST for simple expression

In this diagram, a, b, and c can be any variables (for loosely typed languages). The interpreter starts assuming "generics" based on the profiling of various executions. It then starts assuming the specifics and will then optimize the code using partial evaluation.

Truffle (language interpreters written with Truffle) runs as an interpreter and Graal JIT kicks in to start identifying optimizations in the code.

The optimizations are based on speculation, and eventually, if the speculation is proven to be incorrect at runtime, the JIT will re-optimize and recompile (as shown in the previous diagram). Re-optimization and recompiling is an expensive task.

Partial evaluation creates an intermediate representation of the language, from the code and the data, and as it learns, and identifies new data types, it deoptimizes to the AST interpreter, applies optimizations, and does the node rewriting and recompiles. After a certain point, it will have the optimum representation. The following diagram explains how Truffle and Graal optimize intermediate representation:

Figure 3.7 – AST optimization by Graal JIT

Figure 3.7 – AST optimization by Graal JIT

Let's understand this diagram better:

  • The expression is reduced to an AST. In the AST nodes, the leaf nodes are operands. In this example, we have taken a very simple expression to understand how partial evaluation works. In a language such as JavaScript, which is not a strongly typed language, a, b, and c can be any data type (sometimes referred to as generics). Evaluating a generic in an expression is a costly operation.
  • Based on the profiling, Graal JIT speculates and assumes a specific data type (in this example, as an integer), optimizes the code to evaluate the expression for integers, and compiles the code.
  • In this example, it is using an inlining optimization strategy. The Graal JIT compiler has various other optimization strategies that are applied based on the use case.
  • When, during runtime, the compiler identifies a control flow where one of the operands is not really an integer, it deoptimizes and rewrites the AST with the new data type and optimizes the code.
  • Following a few iterations of running this optimization/deoptimization, the compiler will eventually generate the most optimum code.

The key difference here is that Graal is working on the AST and generating optimized code, and it does not matter what language is used to write the source code as long as the code is represented as AST.

The following diagram shows a high-level flow of how different languages run on GraalVM, with Truffle acting as an intermediate layer, to execute any programming language code on GraalVM:

Figure 3.8 – Truffle and Graal compilation flowchart

Figure 3.8 – Truffle and Graal compilation flowchart

This diagram illustrates a simpler representation of how Truffle acts as a layer between non-JVM languages and GraalVM. The code can also be built directly as a native image with Substrate VM.

Truffle API is used along with a custom annotations processor to generate interpreter source code, which is then compiled. Java code does not need the intermediate representation. It can be compiled directly to run on GraalVM. We will discuss Truffle interpreters and how to write a custom interpreter in Chapter 9, Graal Polyglot – LLVM, Ruby, and WASM. We will cover the Truffle Polyglot API in Chapter 6, Truffle – An Overview.

Truffle also provides a framework called the Truffle Instrument API for building tools. Instruments provide fine-grained VM-level runtime events, which can be used to build profiling, tracing analyzing, and debugging tools. The best part is that the language interpreters built with Truffle can use the ecosystem of Truffle instruments (for example, VisualVM, Chrome Debugger, and GraalVM Visual Studio Code Extension).

Truffle provides the Polyglot Interoperability Protocol. This protocol defines the message that each language needs to implement and supports the passing of data between the Polyglot applications.

Sulong – LLVM

LLVM is an open source project that is a collection of modular, reusable compilers and toolchains. There are a lot of language (C, C++, Fortran, Rust, Swift, and so on) compilers that are built on LLVM, where LLVM provides the intermediate representation (also known as LLVM-IR).

The Sulong pipeline looks different from what we looked at in other language compilers that are running on Truffle. The following diagram shows how C/C++ code gets compiled:

Figure 3.9 – LLVM compilation flowchart

Figure 3.9 – LLVM compilation flowchart

This diagram shows how an application code written in C is compiled and run on GraalVM. The application code in native languages such as C/C++ is compiled in Clang, into an intermediate representation. This LLVM intermediate representation runs on the LLVM intermediate representation interpreter, which is built on the Truffle API. Graal JIT will further optimize the code at runtime.

SubstrateVM (Graal AOT and native image)

Applications on Graal can be deployed on GraalVM or SubstrateVM. SubstrateVM is embeddable VM code that gets packaged during AOT compilation in native images.

Graal AOT compilation is a very powerful way to create native binaries for a particular targeted OS/architecture. For cloud-native workloads and serverless, this is a very powerful option for achieving a smaller footprint, faster startups, and, more importantly, embeddable runtimes (providing immutability).

Rapid componentized modular deployment (containers) also poses management and versioning challenges. This is typically called Configuration Drift, which is one of the major issues that we face in managing a large number of containers in high-availability environments. Typically, container infrastructure is built by a team and, over time, it is managed by different teams. There are always situations where we are forced to change the configuration of the VMs/containers/OS in a particular environment that we may never trace. This causes a gap between production and the DR/HA environment.

Immutable infrastructure (images) helps us do better version control of the infrastructure. It also gives us more confidence in testing, as the underlying infrastructure on which our application containers are running is immutable, and we are certain about the test results. To build immutable components, we require an embeddable VM (with a small footprint). SubstrateVM provides that embeddable VM.

In AOT compilation, the code is compiled directly to the machine code and executed. There is no runtime profiling or optimization/deoptimization. The Graal AOT compiler (also referred to as the "native image" compiler) performs static analysis and static initializations on the code and produces a VM-embedded executable code. The optimization performed by AOT is based on the reachability of the code. The following diagram shows the compilation process:

Figure 3.10 – Graal AOT compilation

Figure 3.10 – Graal AOT compilation

This diagram shows how the Graal AOT compiles native images and embeds SubstrateVM as part of the native image. One of the disadvantages of AOT compilation is that the VM cannot optimize the code based on runtime profiling, as in JIT. To address this issue, we use a profile guided optimization strategy to capture the runtime metrics of the application and use that profiled data to optimize the native image by recompiling.

Profile Guided Optimization (PGO)

GraalVM uses Profile Guided Optimization (PGO) to optimize native images based on the runtime profiling data. This is one of the features that is available in Enterprise Edition only. The following diagram shows how a PGO pipeline works:

Figure 3.11 – Graal AOT compilation with PGO

Figure 3.11 – Graal AOT compilation with PGO

Let's understand this workflow better:

  • When the code is compiled with native-image, we use the --pgo-instrumented flag. This will tell the compiler to inject instrumentation code into the compiled code.
  • When we start running this instrumented native image, the profiler starts collecting the runtime data and then starts creating the profile files (.ipof).
  • Once we have run the native image with various workloads (all possible workloads – to capture as much instrumentation as possible), we can then recompile the native image with the --pgo flag (native-image --pgo=profile.iprof), providing the profile files as input. The Graal native image compiler creates the optimum native image.

We will be building a native image with profile-guided optimization in the next chapter with the help of real examples and also understand how memory management works in native images.

GraalVM is a great runtime for the modern microservices architecture. In the next section, we will go through the various features of GraalVM that help to build a microservices application.

An overview of GraalVM microservices architecture

GraalVM is ideal for microservices architecture. One of the most important requirements for certain microservices architecture is a smaller footprint and faster startup. GraalVM is an ideal runtime for running Polyglot workloads in the cloud. Cloud-native frameworks are already available on the market that can build applications to run optimally on GraalVM, such as Quarkus, Micronut, Helidon, and Spring. These frameworks are found to perform almost 50x times faster when running as native images. We will go into detail about how GraalVM is the right runtime and platform for microservices in Chapter 10, Microservices Architecture with GraalVM.

Understanding how GraalVM addresses various non-functional aspects

In this section, we will go through the typical non-functional requirements of a microservices cloud-native architecture, and how GraalVM addresses these requirements.

Performance and scalability

Performance and scalability are among the more important non-functional requirements of a microservices cloud-native architecture. The microservices are automatically scaled out and down by orchestrators such as Kubernetes. This requires the microservices to be built on a runtime that starts up quickly and runs fast, consuming minimal cloud resources. GraalVM AOT compilation helps to build native images that perform on a par with native languages such as C/C++.

To understand how AOT compiled code (native image) is faster than JIT compiled code, let's look at the steps that JIT and AOT follow at runtime:

Figure 3.12 – Graal JIT versus the AOT flowchart

Figure 3.12 – Graal JIT versus the AOT flowchart

This diagram shows the high-level steps for JIT and AOT. JIT optimizes the code over a period of time by profiling the code at runtime. There are performance overheads, as there is additional profiling, optimizing, and deoptimizing that is done by the JVM at runtime.

It is observed, based on the Apache Bench benchmark, that while GraalVM JIT throughput and performance is lower than AOT at the beginning, as the number of requests increases, Graal JIT optimizes and performs better than Graal AOT after around 14,000 requests per second.

It is also observed that Graal AOT performs 50 times faster than Graal JIT and has a 5x smaller memory footprint than Graal JIT.

Graal AOT with PGO throughput is consistent and sometimes better than Graal JIT. However, for long-running tasks, Graal JIT might have better throughput. So, for the best throughput and consistent performance, Graal AOT with PGO is the best.

Please refer to the benchmark study published at https://www.infoq.com/presentations/graalvm-performance/ and https://www.graalvm.org/why-graalvm/.

There are further benchmark studies that are published with academic collaborators at https://renaissance.dev.

Here's what we can conclude:

  • GraalVM Native Image (AOT) is best for faster startups and applications that require a smaller footprint, such as serverless applications and container microservices.
  • GraalVM JIT is best for peak throughputs. Throughput is a very important aspect for long-running processes, where scalability is critical. This could be high-volume web application servers such as e-commerce servers and stock market applications.
  • A combination of garbage collection configuration and JIT will help in achieving reduced latency. Latency is very important as regards the responsiveness of applications. When we are running high throughput, it's possible that on occasion, garbage collection slows down the response.

There is not a hard and fast rule for using it. It depends on various combinations that we need to decide between JIT and AOT, and various other configurations that are possible. We will explore various compiler and native image configurations in the next chapter.

Security

GraalVM security is built on JVM security, which is based on the sandbox model. Let's have a very quick review of how the sandbox model works:

Figure 3.13 – JVM security model

Figure 3.13 – JVM security model

In Java 2 security architecture, all the class files are verified by bytecode verifier (please refer to both the previous chapters for more details on class loaders). The bytecode verifier checks whether the class files are valid and look for any overflows, underflows, data type conversions, method calls, references to classes, and so on.

Once the bytecode is verified, the dependent classes are loaded by the class loader. Please refer to Chapter 1, Evolution of Java Virtual Machine, to understand how the class loader subsystem works. Class loaders work with Security Manager and access control to enforce security rules that are defined in the policy files. Java code that is downloaded over the network is checked for a signature (represented as java.security.CodeSource, including the public key).

Security Manager (java.lang.SecurityManager) is the most important component for handling authorizations. Security Manager has various checks in place to ensure that the authorization is done. The access controller (java.security.AccessContoller) class is another critical class that helps control access to system resources.

Keystore is a password-protected store that holds all the private keys and certificates. Each entry in the store can also be password-protected.

Java Security is extendable, with custom security implementations called Security Providers.

GraalVM builds on the Java security model and abstracts it to address enforcing security at intermediate representation level. GraalVM does not recommend running untrusted code on Security Manager.

The GraalVM security model uses the Truffle language implementation framework API for JVM host applications to create an execution context, which is passed to the guest application (application code written in different languages). The following diagram shows the high-level architecture of how GraalVM allows the guest and host applications to interoperate and determine how access is controlled:

Figure 3.14 – Graal security model

Figure 3.14 – Graal security model

The execution context (org.graalvm.polyglot.Context) defines the access control for the guest applications. Based on the access control that is defined in the execution context, the guest applications get access to the system's resources. GraalVM provides a Polyglot API to create these access controls, with an execution context, to set the access privileges to access various functions, such as File I/O, Threading, and Native Access. Based on what privileges are set by the host, the guest will have that access. A watchdog thread is used to timebound the context. The watchdog will close the context, in the given time, to free up the resources, and restrict access based on time.

The following code demonstrates how the execution context can be set:

Context context = Context.newBuilder().allowIO(true).build();

Context context = Context.newBuilder()                  .fileSystem(FileSystem fs).build();

Context context = Context.newBuilder()                  .allowCreateThread(true).build()

Context context = Context.newBuilder()                  .allowNativeAccess(true).build()

GraalVM also offers an API to exchange objects between host and guest applications:

  • Guest to Host Data Exchange: The guest application can call the host methods and may pass the data. However, this is controlled based on method access modifiers and the host access policy (ALL, NONE or EXPLICIT – @HostAccess.Export Annotation, for example).
  • Host to Guest Data Exchange: The objects passed from the host to the guest need to be handled by guest languages. The data is passed through the context, for example:

    Value a = Context.create().eval("js", "21 + 21");

Value a can be returned by the host application to the JavaScript guest application with a value of 42 (after evaluation).

We will be covering this in detail in Chapter 6, Truffle – An Overview, with the help of a real example.

GraalVM EE also provides a managed execution mode for LLVM intermediate representation code to handle any memory violations and faults. Please refer to https://docs.oracle.com/en/graalvm/enterprise/19/guide/security/security-guide.html for more details.

DevOps – continuous integration and delivery

DevOps automation is one of the core requirements of any modern, cloud-native architecture. GraalVM integrates very well into the DevOps pipeline. The following diagram illustrates a typical GitOps pipeline with representative software (GraalVM integrates into any stack of the DevOps software stack):

Figure 3.15 – GitOps with GraalVM

Figure 3.15 – GitOps with GraalVM

Let's understand this diagram better.

The Continuous Integration (CI) pipeline gets triggered by a typical pull request from the Git repository with changes in the application code and infrastructure code. CI tools such as GitHub actions, Argo CD, or CicleCI can be used to orchestrate a CI pipeline. A typical CI pipeline consists of the following:

  • Build: In the build phase, the release tagged code is pulled from the appropriate branch from the Git repository. The code is verified (any static code analysis) and built. For cloud-native, typically, the code is built as native images, using the Graal AOT compiler.
  • Test: The code is tested with unit testing scripts and further verified for any security vulnerabilities.
  • Package: Once the code passes all the tests, the code is typically packaged into the cloud-native target runtime (using a Docker image or VM or any other binary format). The target could be a serverless container or Docker container or a VM.
  • Store: The final binaries are stored on binary stores or repositories, such as Docker Hub or Red Hat Quay (if it's a Docker image).

The Continuous Deployment pipeline can either be triggered based on a release plan or can be manually triggered (depending on the release plan and strategy). Continuous Deployment typically has the following phases:

  • Deployment for Validation: The final binary ID deployed to an environment where the binary can now be tested end to end. Various strategies can be followed:

    a. Traditionally: We have an Integration Test Environment and a User Acceptance Test Environment (or Pre-Production Environment) for various levels of validation and testing.

    b. Blue/Green Deployment: There are two parallel environments (called Blue and Green). One of them will be in production, let's assume Blue. The Green environment can be used to test and validate our code. Once we are sure that the new release is working fine, we use the router to switch to the Green environment and use the Blue environment for testing future releases. This provides a high availability way to deploy applications.

    c. Canary Deployments and Rolling Updates: Canary deployment is more a recent approach of using the same environment for both production and validation. This is a great feature to test our code and compare the new release with the current release (A/B testing). Canary deployments provide an API management layer, which can be used to redirect the traffic to specific endpoints, based on various parameters (such as testing users or a user from a specific department can access the new version, while end users are still using the old version). The application can be deployed on a specific number of servers/nodes (by % or number). As we get more confident with the new release, we can perform rolling updates by increasing the number of nodes, where the new release should run, and open up to a wider circle of users. This also gives the flexibility to perform a phased rollout of new releases (by region or user demography or any parameter).

  • Testing: There are various levels of testing that are performed, both functional and non-functional. Most of this is performed with automation, and that is also choreographed by the Continuous Delivery pipeline.
  • Production Deployment: Once it's all tested, the final application is deployed to the production environment. Once again, this deployment may use one of the Traditional or Blue/Green or Canary strategies.

GraalVM provides a very flexible way to deploy the application as a standalone, container, cloud, VM, and Oracle database. There are very sophisticated microservices frameworks, such as Quarkus, Micronaut, and Fn project, that provide native support for GraalVM and integrate very well with modern GitOps tools.

Summary

In this chapter, we explored the GraalVM architecture. Graal JIT is the new implementation of the JIT compiler, which replaces the C2 compiler, and brings in a lot more optimizations. Graal JIT is implemented completely in Java. Truffle provides the interpreter implementation framework and Polyglot framework to get other non-JVM languages into GraalVM.

This chapter provided a good understanding of the various runtimes, frameworks, tools, Graal updater, and utilities that are shipped with GraalVM. We also looked at the two available editions of GraalVM and what the key differences are between these two editions. We went through all the various components of the GraalVM architecture. We also explored some of the non-functional aspects of the architecture, including security model, performance, and DevOps. This is very important if you want to understand how GraalVM can be used to build cloud-native microservices and high-performing applications across various languages.

In the next chapter, we will dig deeper into how Graal JIT works, how we can use the various tools that come with Graal to understand the internal workings of Graal JIT, and how we can use these tools to debug and fine-tune our code.

Questions

  1. What are the various editions of GraalVM?
  2. What is JVMCI?
  3. What is Graal JIT?
  4. What is Graal AOT? How does PGO help AOT compilation?
  5. What is Truffle? How does it help to run multiple language codes on GraalVM?
  6. What is SubstrateVM?
  7. What is Guest Access Context?
  8. Why is GraalVM the ideal runtime for cloud-native microservices?

Further reading

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset