As an LLVM developer, building LLVM from source has always been the first thing you should do. Given the scale of LLVM nowadays, this task can take hours to finish. Even worse, rebuilding the project to reflect changes might also take a long time and hinder your productivity. Therefore, it's crucial to know how to use the right tools and how to find the best build configurations for your project for the sake of saving various resources, especially your precious time.
In this chapter, we are going to cover the following topics:
- Cutting down building resources with better tooling
- Saving building resources by tweaking CMake arguments
- Learning how to use GN, an alternative LLVM build system, and its pros and cons
At the time of writing this book, LLVM only has a few software requirements:
- A C/C++ compiler that supports C++14
- One of the build systems supported by CMake, such as GNU Make or Ninja
- Python (2.7 is fine too, but I strongly recommend using 3.x)
The exact versions of these items change from time to time. Check out https://llvm.org/docs/GettingStarted.html#software for more details.
This chapter assumes you have built an LLVM before. If that's not the case, perform the following steps:
- Grab a copy of the LLVM source tree from GitHub:
$ git clone https://github.com/llvm/llvm-project
- Usually, the default branch should build without errors. If you want to use release versions that are more stable, such as release version 10.x, use the following command:
$ git clone -b release/10.x https://github.com/llvm/llvm-project
- Finally, you should create a build folder where you're going to invoke the CMake command. All the building artifacts will also be placed inside this folder. This can be done using the following command:
$ mkdir .my_build $ cd .my_build
Cutting down building resources with better tooling
As we mentioned at the beginning of this chapter, if you build LLVM with the default (CMake) configurations, by invoking CMake and building the project in the following way, there is a high chance that the whole process will take hours to finish:
$ cmake ../llvm $ make all
This can be avoided by simply using better tools and changing some environments. In this section, we will cover some guidelines to help you choose the right tools and configurations that can both speed up your building time and improve memory footprints.
Replacing GNU Make with Ninja
The first improvement we can do is using the Ninja build tool (https://ninja-build.org) rather than GNU Make, which is the default build system generated by CMake on major Linux/Unix platforms.
Here are the steps you can use to set up Ninja on your system:
- On Ubuntu, for example, you can install Ninja by using this command:
$ sudo apt install ninja-build
Ninja is also available in most Linux distributions.
- Then, when you're invoking CMake for your LLVM build, add an extra argument:
$ cmake -G "Ninja" ../llvm
- Finally, use the following build command instead:
$ ninja all
Ninja runs significantly faster than GNU Make on large code bases such as LLVM. One of the secrets behind Ninja's blazing fast running speed is that while the majority of build scripts such as
Makefile are designed to be written manually, the syntax of Ninja's build script,
build.ninja, is more similar to assembly code, which should not be edited by developers but generated by other higher-level build systems such as CMake. The fact that Ninja uses an assembly-like build script allows it to do many optimizations under the hood and get rid of many redundancies, such as slower parsing speeds, when invoking the build. Ninja also has a good reputation for generating better dependencies among build targets.
Ninja makes clever decisions in terms of its degree of parallelization; that is, how many jobs you want to execute in parallel. So, usually, you don't need to worry about this. If you want to explicitly assign the number of worker threads, the same command-line option used by GNU Make still works here:
$ ninja -j8 all
Let's now see how you can avoid using the BFD linker.
Avoiding the use of the BFD linker
The second improvement we can do is using linkers other than the BFD linker, which is the default linker used in most Linux systems. The BFD linker, despite being the most mature linker on Unix/Linux systems, is not optimized for speed or memory consumption. This would create a performance bottleneck, especially for large projects such as LLVM. This is because, unlike the compiling phase, it's pretty hard for the linking phase to do file-level parallelization. Not to mention the fact that the BFD linker's peak memory consumption when building LLVM usually takes about 20 GB, causing a burden on computers with small amounts of memory. Fortunately, there are at least two linkers in the wild that provide both good single-thread performance and low memory consumption: the GNU gold linker and LLVM's own linker, LLD.
The gold linker was originally developed by Google and donated to GNU's
binutils. You should have it sitting in the
binutils package by default in modern Linux distributions. LLD is one of LLVM's subprojects with even faster linking speed and an experimental parallel linking technique. Some of the Linux distributions (newer Ubuntu versions, for example) already have LLD in their package repository. You can also download the prebuilt version from LLVM's official website.
To use the gold linker or LLD to build your LLVM source tree, add an extra CMake argument with the name of the linker you want to use.
For the gold linker, use the following command:
$ cmake -G "Ninja" -DLLVM_USE_LINKER=gold ../llvm
Similarly, for LLD, use the following command:
$ cmake -G "Ninja" -DLLVM_USE_LINKER=lld ../llvm
Limiting the number of parallel threads for Linking
Limiting the number of parallel threads for linking is another way to reduce (peak) memory consumption. You can achieve this by assigning the
LLVM_PARALLEL_LINK_JOBS=<N> CMake variable, where
N is the desired number of working threads.
With that, we've learned that by simply using different tools, the building time could be reduced significantly. In the next section, we're going to improve this building speed by tweaking LLVM's CMake arguments.
Tweaking CMake arguments
Before we start, you should have a build folder that has been CMake-configured. Most of the following subsections will modify a file in the build folder; that is,
Choosing the right build type
Release: This is the default build type if you didn't specify any. It will adopt the highest optimization level (usually -O3) and eliminate most of the debug information. Usually, this build type will make the building speed slightly slower.
Debug: This build type will compile without any optimization applied (that is, -O0). It preserves all the debug information. Note that this will generate a huge number of artifacts and usually take up ~20 GB of space, so please be sure you have enough storage space when using this build type. This will usually make the building speed slightly faster since no optimization is being performed.
RelWithDebInfo: This build type applies as much compiler optimization as possible (usually -O2) and preserves all the debug information. This is an option balanced between space consumption, runtime speed, and debuggability.
You can choose one of them using the
CMAKE_BUILD_TYPE CMake variable. For example, to use the
RelWithDebInfo type, you can use the following command:
$ cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo …
It is recommended to use
RelWithDebInfo first (if you're going to debug LLVM later). Modern compilers have gone a long way to improve the debug information's quality in optimized program binaries. So, always give it a try first to avoid unnecessary storage waste; you can always go back to the
Debug type if things don't work out.
In addition to configuring build types,
LLVM_ENABLE_ASSERTIONS is another CMake (Boolean) argument that controls whether assertions (that is, the
assert(bool predicate) function, which will terminate the program if the predicate argument is not true) are enabled. By default, this flag will only be true if the build type is
Debug, but you can always turn it on manually to enforce stricter checks, even in other build types.
Avoiding building all targets
The number of LLVM's supported targets (hardware) has grown rapidly in the past few years. At the time of writing this book, there are nearly 20 officially supported targets. Each of them deals with non-trivial tasks such as native code generation, so it takes a significant amount of time to build. However, the chances that you're going to be working on all of these targets at the same time are low. Thus, you can select a subset of targets to build using the
LLVM_TARGETS_TO_BUILD CMake argument. For example, to build the X86 target only, we can use the following command:
$ cmake -DLLVM_TARGETS_TO_BUILD="X86" …
You can also specify multiple targets using a semicolon-separated list, as follows:
$ cmake -DLLVM_TARGETS_TO_BUILD="X86;AArch64;AMDGPU" …
Surround the list of targets with double quotes!
In some shells, such as
BASH, a semicolon is an ending symbol for a command. So, the rest of the CMake command will be cut off if you don't surround the list of targets with double-quotes.
Building as shared libraries
One of the most iconic features of LLVM is its modular design. Each component, optimization algorithm, code generation, and utility libraries, to name a few, are put into their own libraries where developers can link individual ones, depending on their usage. By default, each component is built as a static library (
*.a in Unix/Linux and
*.lib in Windows). However, in this case, static libraries have the following drawbacks:
- Linking against static libraries usually takes more time than linking against dynamic libraries (
*.soin Unix/Linux and
- If multiple executables link against the same set of libraries, like many of the LLVM tools do, the total size of these executables will be significantly larger when you adopt the static library approach compared to its dynamic library counterpart. This is because each of the executables has a copy of those libraries.
- When you're debugging LLVM programs with debuggers (GDB, for example), they usually spend quite some time loading the statically linked executables at the very beginning, hindering the debugging experience.
Thus, it's recommended to build every LLVM component as a dynamic library during the development phase by using the
BUILD_SHARED_LIBS CMake argument:
$ cmake -DBUILD_SHARED_LIBS=ON …
This will save you a significant amount of storage space and speed up the building process.
Splitting the debug info
When you're building a program in debug mode – adding the
-g flag when using you're GCC and Clang, for example – by default, the generated binary contains a section that stores debug information. This information is essential for using a debugger (for example, GDB) to debug that program. LLVM is a large and complex project, so when you're building it in debug mode – using the
AKE_BUILD_TYPE=Debug variable – the compiled libraries and executables come with a huge amount of debug information that takes up a lot of disk space. This causes the following problems:
- Due to the design of C/C++, several duplicates of the same debug information might be embedded in different object files (for example, the debug information for a header file might be embedded in every library that includes it), which wastes lots of disk space.
- The linker needs to load object files AND their associated debug information into memory during the linking stage, meaning that memory pressure will increase if the object file contains a non-trivial amount of debug information.
To solve these problems, the build system in LLVM provides allows us to split debug information into separate files from the original object files. By detaching debug information from object files, the debug info of the same source file is condensed into one place, thus avoiding unnecessary duplicates being created and saving lots of disk space. In addition, since debug info is not part of the object files anymore, the linker no longer needs to load them into memory and thus saves lots of memory resources. Last but not least, this feature can also improve our incremental building speed – that is, rebuild the project after a (small) code change – since we only need to update the modified debug information in a single place.
To use this feature, please use the
LLVM_USE_SPLIT_DWARF cmake variable:
$ cmake -DcmAKE_BUILD_TYPE=Debug -DLLVM_USE_SPLIT_DWARF=ON …
Note that this CMake variable only works for compilers that use the DWARF debug format, including GCC and Clang.
Building an optimized version of llvm-tblgen
TableGen is a Domain-Specific Language (DSL) for describing structural data that will be converted into the corresponding C/C++ code as part of LLVM's building process (we will learn more about this in the chapters to come). The conversion tool is called
llvm-tblgen. In other words, the running time of
llvm-tblgen will affect the building time of LLVM itself. Therefore, if you're not developing the TableGen part, it's always a good idea to build an optimized version of
llvm-tblgen, regardless of the global build type (that is,
llvm-tblgen run faster and shortening the overall building time.
The following CMake command, for example, will create build configurations that build a debug version of everything except the
llvm-tblgen executable, which will be built as an optimized version:
$ cmake -DLLVM_OPTIMIZED_TABLEGEN=ON -DCMAKE_BUILD_TYPE=Debug …
Lastly, you'll see how you can use Clang and the new PassManager.
Using the new PassManager and Clang
Clang is LLVM's official C-family frontend (including C, C++, and Objective-C). It uses LLVM's libraries to generate machine code, which is organized by one of the most important subsystems in LLVM – PassManager. PassManager puts together all the tasks (that is, the Passes) required for optimization and code generation.
In Chapter 9, Working with PassManager and AnalysisManager, will introduce LLVM's new PassManager, which builds from the ground up to replace the existing one somewhere in the future. The new PassManager has a faster runtime speed compared to the legacy PassManager. This advantage indirectly brings better runtime performance for Clang. Therefore, the idea here is pretty simple: if we build LLVM's source tree using Clang, with the new PassManager enabled, the compilation speed will be faster. Most of the mainstream Linux distribution package repositories already contain Clang. It's recommended to use Clang 6.0 or later if you want a more stable PassManager implementation. Use the
LLVM_USE_NEWPM CMake variable to build LLVM with the new PassManager, as follows:
$ env CC=`which clang` CXX=`which clang++` \ cmake -DLLVM_USE_NEWPM=ON …
LLVM is a huge project that takes a lot of time to build. The previous two sections introduced some useful tricks and tips for improving its building speed. In the next section, we're going to introduce an alternative build system to build LLVM. It has some advantages over the default CMake build system, which means it will be more suitable in some scenarios.
Using GN for a faster turnaround time
CMake is portable and flexible, and it has been battle-tested by many industrial projects. However, it has some serious issues when it comes to reconfigurations. As we saw in the previous sections, you can modify some of the CMake arguments once build files have been generated by editing the
CMakeCache.txt file in the build folder. When you invoke the
build command again, CMake will reconfigure the build files. If you edit the
CMakeLists.txt files in your source folders, the same reconfiguration will also kick in. There are primarily two drawbacks of CMake's reconfiguration process:
- In some systems, the CMake configuration process is pretty slow. Even for reconfiguration, which theoretically only runs part of the process, it still takes a long time sometimes.
- Sometimes, CMake will fail to resolve the dependencies among different variables and build targets, so your changes will not reflect this. In the worst case, it will just silently fail and take you a long time to dig out the problem.
Generate Ninja, better known as GN, is a build file generator used by many of Google's projects, such as Chromium. GN generates Ninja files from its own description language. It has a good reputation for having a fast configuration time and reliable argument management. LLVM has brought GN support as an alternative (and experimental) building method since late 2018 (around version 8.0.0). GN is especially useful if your developments make changes to build files, or if you want to try out different building options in a short period.
- LLVM's GN support is sitting in the
llvm/utils/gnfolder. After switching to that folder, run the following
get.pyscript to download GN's executable locally:
$ cd llvm/utils/gn $ ./get.py
Using a specific version of GN
If you want to use a custom GN executable instead of the one fetched by
get.py, simply put your version into the system's
PATH. If you are wondering what other GN versions are available, you might want to check out the instructions for installing
gn.pyin the same folder to generate build files (the local version of
gn.pyis just a wrapper around the real
gn, to set up the essential environment):
$ ./gn.py gen out/x64.release
out/x64.releaseis the name of the build folder. Usually, GN users will name the build folder in
<architecture>.<build type>.<other features>format.
- Finally, you can switch into the build folder and launch Ninja:
$ cd out/x64.release $ ninja <build target>
- Alternatively, you can use the
$ ninja -C out/x64.release <build target>
You probably already know that the initial build file generation process is super fast. Now, if you want to change some of the build arguments, please navigate to the
args.gn file under the build folder (
out/x64.release/args.gn, in this case); for example, if you want to change the build type to
debug and change the targets to build (that is, the
LLVM_TARGETS_TO_BUILD CMake argument) into
AArch64. It is recommended to use the following command to launch an editor to edit
$ ./gn.py args out/x64.release
In the editor of
args.gn, input the following contents:
# Inside args.gn is_debug = true llvm_targets_to_build = ["X86", "AArch64"]
Once you've saved and exited the editor, GN will do some syntax checking and regenerate the build files (of course, you can edit
args.gn without using the
gn command and the build files won't be regenerated until you invoke the
ninja command). This regeneration/reconfiguration will also be fast. Most importantly, there won't be any infidelity behavior. Thanks to GN's language design, relationships between different build arguments can be easily analyzed with little ambiguity.
The list of GN's build arguments can be found by running this command:
$ ./gn.py args --list out/x64.release
Unfortunately, at the time of writing this book, there are still plenty of CMake arguments that haven't been ported to GN. GN is not a replacement for LLVM's existing CMake build system, but it is an alternative. Nevertheless, GN is still a decent building method if you want a fast turnaround time in your developments that involve many build configuration changes.
LLVM is a useful framework when it comes to building tools for code optimization and code generation. However, the size and complexity of its code base induces a non-trivial amount of build time. This chapter provided some tips for speeding up the build time of the LLVM source tree, including using different building tools, choosing the right CMake arguments, and even adopting a build system other than CMake. These skills cut down on unnecessary resource wasting and improve your productivity when developing with LLVM.
In the next chapter, we will dig into LLVM's CMake-based building infrastructure and show you how to build system features and guidelines that are crucial in many different development environments.
- You can check out the complete list of CMake variables that are used by LLVM at https://llvm.org/docs/CMake.html#frequently-used-CMake-variables.
You can learn more about GN at https://gn.googlesource.com/gn. The quick start guides at https://gn.googlesource.com/gn/+/master/docs/quick_start.md are also very helpful.