To learn how to work with LLVM, it is best to begin by compiling LLVM from the source. LLVM is an umbrella project, and its GitHub repository contains the sources for all the projects that belong to LLVM. Each LLVM project is in a top-level directory of the repository. Besides cloning the repository, your system must also have all tools that are required by the build system installed.
In this chapter, you will learn about the following topics:
- Getting the prerequisites ready, which will show you how to set up your build system.
- Building with CMake, which will cover how to compile and install the LLVM core libraries and Clang with CMake and Ninja.
- Customizing the build process, which will talk about the various way we can influence the build process.
Getting the prerequisites ready
To work with LLVM, your development system must run a common operating system such as Linux, FreeBSD, macOS, or Windows. Building LLVM and Clang with debug symbols enabled easily need tens of gigabytes of disk space, so be sure that your system has plenty of disk space available – in this scenario, you should have 30 GB of free space.
The required disk space depends heavily on the chosen build options. For example, building only the LLVM core libraries in release mode, while targeting only one platform, requires about 2 GB of free disk space, which is the bare minimum needed. To reduce compile times, a fast CPU (such as a quadcore CPU with 2.5 GHz clock speed) and a fast SSD would also be helpful.
It is even possible to build LLVM on a small device such as a Raspberry Pi – it just takes a lot of time to do so. I developed the examples in this book on a laptop with an Intel quadcore CPU running at 2.7 GHz clock speed, with 40 GB RAM and 2.5 TB SSD disk space. This system is well-suited for the development task at hand.
Your development system must have some prerequisite software installed. Let's review the minimal required versions of these software packages.
Linux distributions often contain more recent versions that can be used. The version numbers are suitable for LLVM 12. Later versions of LLVM may require more recent versions of the packages mentioned here.
To check out the source from GitHub, you need git (https://git-scm.com/). There is no requirement for a specific version. The GitHub help pages recommend using at least version 1.17.10.
The LLVM project uses CMake (https://cmake.org/) as the build file generator. At least version 3.13.4 is required. CMake can generate build files for various build systems. In this book, Ninja (https://ninja-build.org/) is being used because it is fast and available on all platforms. The latest version, 1.9.0, is recommended.
Obviously, you also need a C/C++ compiler. The LLVM projects are written in modern C++, based on the C++14 standard. A conforming compiler and standard library are required. The following compilers are known to work with LLVM 12:
- gcc 5.1.0 or later
- Clang 3.5 or later
- Apple Clang 6.0 or later
- Visual Studio 2017 or later
Please be aware that with further development of the LLVM project, the requirements for the compiler are most likely to change. At the time of writing, there are discussions to use C++17 and drop Visual Studio 2017 support. In general, you should use the latest compiler version available for your system.
Python (https://python.org/) is used to generate the build files and to run the test suite. It should be at least version 3.6.
Although not covered in this book, there may be reasons why you need to use Make instead of Ninja. In this case, you need to use GNU Make (https://www.gnu.org/software/make/) version 3.79 or later. The usage of both build tools is very similar. It is sufficient to replace
ninja in each command with
make for the scenarios described here.
To install the prerequisite software, the easiest thing to do is use the package manager from your operating system. In the following sections, the commands you must enter to install the software for the most popular operating systems are shown.
$ sudo apt install –y gcc g++ git cmake ninja-build
Fedora and RedHat
The package manager for Fedora 33 and RedHat Enterprise Linux 8.3 is called DNF. Like Ubuntu, most of the basic utilities are already installed. To install all the packages at once, type the following:
$ sudo dnf install –y gcc gcc-c++ git cmake ninja-build
On FreeBSD 12 or later, you must use the PKG package manager. FreeBSD differs from Linux-based systems in that Clang is the preferred compiler. To install all the packages at once, type the following:
$ sudo pkg install –y clang git cmake ninja
For development on OS X, it is best to install Xcode from the Apple store. While the XCode IDE is not used in this book, it comes with the required C/C++ compilers and supporting utilities. To install the other tools, you can use the Homebrew package manager (https://brew.sh/). To install all the packages at once, type the following:
$ brew install git cmake ninja
Like OS X, Windows does not come with a package manager. The easiest way to install all the software is to use the Chocolately (https://chocolatey.org/) package manager. To install all the packages at once, type the following:
$ choco install visualstudio2019buildtools cmake ninja git\ gzip bzip2 gnuwin32-coreutils.install
Please note that this only installs the build tools from Visual Studio 2019. If you would like to get the Community Edition (which includes the IDE), then you must install
package visualstudio2019community instead of
visualstudio2019buildtools. Part of the Visual Studio 2019 installation is the x64 Native Tools Command Prompt for VS 2019. Upon using this command prompt, the compiler is automatically added to the search path.
The LLVM project uses Git for version control. If you have not used Git before, then you should do some basic configuration of Git first before continuing; that is, setting a username and email address. Both pieces of information are used if you commit changes. In the following commands, replace
Jane with your name and
[email protected] with your email:
$ git config --global user.email "[email protected]" $ git config --global user.name "Jane"
By default, Git uses the vi editor for commit messages. If you would prefer using another editor, then you can change the configuration in a similar way. To use the nano editor, type the following:
$ git config --global core.editor nano
For more information about git, please see the Git Version Control Cookbook - Second Edition by Packt Publishing (https://www.packtpub.com/product/git-version-control-cookbook/9781782168454).
Building with CMake
With the build tools ready, you can now check out all the LLVM projects from GitHub. The command for doing this is essentially the same on all platforms. However, on Windows, it is recommended to turn off auto-translation for line endings.
Let's review this process in three parts: cloning the repository, creating a build directory, and generating the build system files.
Cloning the repository
$ git clone https://github.com/llvm/llvm-project.git
On Windows, you must add the option to disable line endings from being auto-translated. Here, type the following:
$ git clone --config core.autocrlf=false\ https://github.com/llvm/llvm-project.git
git command clones the latest source code from GitHub into a local directory named
llvm-project. Now, change the current directory to the new
llvm-project directory with the following command:
$ cd llvm-project
Inside the directory is all the LLVM projects, each in its own directory. Most notably, the LLVM core libraries are in the
llvm subdirectory. The LLVM project uses branches for subsequent release development ("release/12.x") and tags ("llvmorg-12.0.0") to mark a certain release. With the preceding
clone command, you get the current development state. This book uses LLVM 12. To check out the first release of LLVM 12, type the following:
$ git checkout -b llvmorg-12.0.0
With this, you have cloned the whole repository and checked out a tag. This is the most flexible approach.
Git also allows you to clone only a branch or a tag (including history). With
git clone --branch llvmorg-12.0.0 https://github.com/llvm/llvm-project, you check out the same label, as we did previously, but only the history for this tag is cloned. With the additional
–-depth=1 option, you prevent the history from being cloned too. This saves time and space but obviously limits what you can do locally.
The next step is to create a build directory.
Creating a build directory
Unlike many other projects, LLVM does not support inline builds and requires a separate
build directory. This can easily be created inside the
llvm-project directory. Change into this directory with the following command:
$ cd llvm-project
Then, create a build directory called
build for simplicity. Here, the commands for Unix and Windows systems differ. On Unix-likes system, you should use the following command:
$ mkdir build
On Windows, you should use the following command:
$ md build
Then, change into the
$ cd build
Now, you are ready to create the build system files with the CMake tool inside this directory.
Generating the build system files
$ cmake –G Ninja -DLLVM_ENABLE_PROJECTS=clang ../llvm
On Windows, the backslash character,
\, is the directory name separator. On Windows, CMake automatically translates the Unix separator,
/, into the Windows one.
-G option tells CMake which system to generate build files for. The most often used options are as follows:
Ninja: For the Ninja build system
Unix Makefiles: For GNU Make
Visual Studio 15 VS2017and
Visual Studio 16 VS2019: For Visual Studio and MS Build
Xcode: For XCode projects
The generation process can be influenced by setting various variables with the
–D option. Usually, they are prefixed with
CMAKE_ (if defined by CMake) or
LLVM_ (if defined by LLVM). With the
LLVM_ENABLE_PROJECTS=clang variable setting, CMake generates build files for Clang in addition to LLVM. The last part of the command tells CMake where to find the LLVM core library source. More on that in the next section.
Once the build files have been generated, LLVM and Clang can be compiled with the following command:
Depending on the hardware resources, this command takes between 15 minutes (a server with lots of CPU cores and memory and fast storage) and several hours (dual-core Windows notebook with limited memory) to run. By default, Ninja utilizes all available CPU cores. This is good for compilation speed but may prevent other tasks from running. For example, on a Windows-based notebook, it is almost impossible to surf the internet while Ninja is running. Fortunately, you can limit resource usage with the
Let's assume you have four CPU cores available and that Ninja should only use two (because you have parallel tasks to run). Here, you should use the following command for compilation:
$ ninja –j2
Once compilation is finished, a best practice is to run the test suite to check if everything works as expected:
$ ninja check-all
Again, the runtime of this command varies widely due to the available hardware resources. The Ninja
check-all target runs all test cases. Targets are generated for each directory containing test cases. Using
check-llvm, instead of
check-all runs the LLVM tests but not the Clang tests;
check-llvm-codegen only runs the tests in the
CodeGen directory from LLVM (that is, the
You can also do a quick manual check. One of the LLVM applications you will be using is llc, the LLVM compiler. If you run it with the
-version option, it shows the LLVM version of it, its host CPU, and all its supported architectures:
$ bin/llc -version
If you have trouble getting LLVM compiled, then you should consult the Common Problems section of the Getting Started with the LLVM System documentation (https://llvm.org/docs/GettingStarted.html#common-problems) for solutions to typical problems.
Finally, install the binaries:
$ ninja install
On a Unix-like system, the install directory is
/usr/local. On Windows,
C:\Program Files\LLVM is used. This can be changed, of course. The next section explains how.
Customizing the build process
The CMake system uses a project description in the
CMakeLists.txt file. The top-level file is in the
llvm directory; that is,
llvm/CMakeLists.txt. Other directories also contain
CMakeLists.txt files, which are recursively included during the build-file generation.
Based on the information provided in the project description, CMake checks which compilers have been installed, detects libraries and symbols, and creates the build system files, such as
Makefile (depending on the chosen generator). It is also possible to define reusable modules, such as a function to detect if LLVM is installed. These scripts are placed in the special
cmake directory (
llvm/cmake), which is searched automatically during the generation process.
The build process can be customized by defining CMake variables. The
–D command-line option is used to set a variable to a value. These variables are used in CMake scripts. Variables defined by CMake itself are almost always prefixed with
CMAKE_, and these variables can be used in all projects. Variables defined by LLVM are prefixed with
LLVM_ but they can only be used if the project definition includes the use of LLVM.
Variables defined by CMake
Some variables are initialized with the values of environment variables. The most notable are
CXX, which define the C and C++ compilers to be used for building. CMake tries to locate a C and a C++ compiler automatically, using the current shell search path. It picks the first compiler that's found. If you have several compilers installed, such as gcc and Clang or different versions of Clang, then this might not be the compiler you want for building LLVM.
Suppose you like to use
clang9 as a C compiler and clang++9 as a C++ compiler. Here, you can invoke CMake in a Unix shell in the following way:
$ CC=clang9 CXX=clang++9 cmake ../llvm
This sets the value of the environment variables for the invocation of
cmake. If necessary, you can specify an absolute path for the compiler executables.
CC is the default value of the
CMAKE_C_COMPILER CMake variable, while
CXX is the default value of the
CMAKE_CXX_COMPILER CMake variable. Instead of using the environment variables, you can set the CMake variables directly. This is equivalent to the preceding call:
$ cmake –DCMAKE_C_COMPILER=clang9\ -DCMAKE_CXX_COMPILER=clang++9 ../llvm
Other useful variables defined by CMake are as follows:
CMAKE_INSTALL_PREFIX: A path prefix that is prepended to every path during installation. The default is
/usr/localon Unix and
C:\Program Files\<Project>on Windows. To install LLVM in the
/opt/llvmdirectory, you must specify
-DCMAKE_INSTALL_PREFIX=/opt/llvm. The binaries are copied to
/opt/llvm/bin, the library files are copied to
/opt/llvm/lib, and so on.
CMAKE_BUILD_TYPE: Different types of builds require different settings. For example, a debug build needs to specify options for generating debug symbols and are usually linking against debug versions of system libraries. In contrast, a release build uses optimization flags and links against production versions of libraries. This variable is only used for build systems that can only handle one build type, such as Ninja or Make. For IDE build systems, all variants are generated, and you must use the mechanism of the IDE to switch between build types. Some possible values are as follows:
DEBUG: Build with debug symbols
RELEASE: Build with optimization for speed
RELWITHDEBINFO: Release build with debug symbols
MINSIZEREL: Build with optimization for size
The default build type is
DEBUG. To generate build files for a release build, you must specify
CMAKE_CXX_FLAGS: These are extra flags that are used when we're compiling C and C++ source files. The initial values are taken from the
CXXFLAGSenvironment variables, which can be used as alternatives.
CMAKE_MODULE_PATH: Specifies additional directories that are searched for in CMake modules. The specified directories are searched before the default ones. The value is a semicolon-separated list of directories.
PYTHON_EXECUTABLE: If the Python interpreter is not found or if the wrong one is picked if you have installed multiple versions of it, you can set this variable to the path of the Python binary. This variable only takes effect if the Python module of CMake is included (which is the case for LLVM).
CMake provides built-in help for variables. The
--help-variable var option prints help for the
var variable. For instance, you can type the following to get help for
$ cmake --help-variable CMAKE_BUILD_TYPE
You can also list all the variables with the following command:
$ cmake --help-variablelist
This list is very long. You may want to pipe the output to
more or a similar program.
Variables defined by LLVM
LLVM_TARGETS_TO_BUILD: LLVM supports code generation for different CPU architectures. By default, all these targets are built. Use this variable to specify the list of targets to build, separated by semicolons. The current targets are
allcan be used as shorthand for all targets. The names are case-sensitive. To only enable PowerPC and the System Z target, you must specify
LLVM_ENABLE_PROJECTS: This is a list of the projects you want to build, separated by semicolons. The source for the projects must be at the same level as the
llvmdirectory (side-by-side layout). The current list is
allcan be used as shorthand for all the projects in this list. To build Clang and llgo together with LLVM, you must specify
LLVM_ENABLE_ASSERTIONS: If set to
ON, then assertion checks are enabled. These checks help find errors and are very useful during development. The default value is
OFFotherwise. To turn assertion checks on (for example, for a
RELEASEbuild), you must specify
LLVM_ENABLE_EXPENSIVE_CHECKS: This enables some expensive checks that can really slow down your compilation speed or consume large amounts of memory. The default value is
OFF. To turn these checks on, you must specify
LLVM_APPEND_VC_REV: LLVM tools such as
llcdisplay the LLVM version they are based on, besides other information if the
–versioncommand-line option is provided. This version information is based on the
LLVM_REVISIONC macro. By default, not only the LLVM version but also the Git hash of the latest commit is part of the version information. This is handy in case you are following the development of the master branch because it makes it clear which Git commit the tool is based on. If this isn't required, then this can be turned off with
LLVM_ENABLE_THREADS: LLVM automatically includes thread support if a threading library is detected (usually, the pthreads library). Furthermore, in this case, LLVM assumes that the compiler supports thread-local storage (TLS). If you don't want thread support or your compiler does not support TLS, then you can turn it off with
LLVM_ENABLE_EH: The LLVM projects do not use C++ exception handling, so they turn exception support off by default. This setting can be incompatible with other libraries your project is linking with. If needed, you can enable exception support by specifying
LLVM_ENABLE_RTTI: LVM uses a lightweight, self-built system for runtime type information. Generating C++ RTTI is turned off by default. Like the exception handling support, this may be incompatible with other libraries. To turn generation for C++ RTTI on, you must specify
LLVM_ENABLE_WARNINGS: Compiling LLVM should generate no warning messages if possible. Due to this, the option to print warning messages is turned on by default. To turn it off, you must specify
LLVM_ENABLE_PEDANTIC: The LLVM source should be C/C++ language standard-conforming; hence, pedantic checking of the source is enabled by default. If possible, compiler-specific extensions are also disabled. To reverse this setting, you must specify
LLVM_ENABLE_WERROR: If set to
ON, then all the warnings are treated as errors – the compilation aborts as soon as warnings are found. It helps to find all the remaining warnings in the source. By default, it is turned off. To turn it on, you must specify
LLVM_OPTIMIZED_TABLEGEN: Usually, the tablegen tool is built with the same options as the other parts of LLVM. At the same time, tablegen is used to generate large parts of the code generator. As a result, tablegen is much slower in a debug build, thus increasing the compile time noticeably. If this option is set to
ON, then tablegen is compiled with optimization turned on, even for a debug build, possibly reducing compile time. The default is
OFF. To turn this on, you must specify
LLVM_USE_SPLIT_DWARF: If the build compiler is gcc or Clang, then turning on this option will instruct the compiler to generate the DWARF debug information in a separate file. The reduced size of the object files reduces the link time of debug builds significantly. The default is
OFF. To turn this on, you must specify
LLVM defines many more CMake variables. You can find the complete list in the LLVM documentation of CMake (https://releases.llvm.org/12.0.0/docs/CMake.html#llvm-specific-variables). The preceding list only contains the ones you are likely to need.
In this chapter, you prepared your development machine to compile LLVM. You cloned the LLVM GitHub repository and compiled your own versions of LLVM and Clang. The build process can be customized with CMake variables. You also learned about useful variables and how to change them. Equipped with this knowledge, you can tweak LLVM for your needs.
In the next chapter, we will take a closer look at the contents of the LLVM mono repository. You will learn which projects are in it and how the projects are structured. You will then use this information to create your own project using LLVM libraries. Finally, you will learn how to compile LLVM for a different CPU architecture.