Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
$9.99 | ALL EBOOKS & VIDEOS
Over 7,000 tech titles at $9.99 each with AI-powered learning assistants on new releases
LLVM Essentials
LLVM Essentials

LLVM Essentials: Become familiar with the LLVM infrastructure and start using LLVM libraries to design a compiler

By Mayur Pandey , Suyog Sarda , David Farago
$21.99 $9.99
Book Dec 2015 166 pages 1st Edition
eBook
$21.99 $9.99
Print
$26.99
Subscription
$15.99 Monthly
eBook
$21.99 $9.99
Print
$26.99
Subscription
$15.99 Monthly

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now
Table of content icon View table of contents Preview book icon Preview Book

LLVM Essentials

Chapter 1. Playing with LLVM

The LLVM Compiler infrastructure project, started in 2000 in University of Illinois, was originally a research project to provide modern, SSA based compilation technique for arbitrary static and dynamic programming languages. Now it has grown to be an umbrella project with many sub projects within it, providing a set of reusable libraries having well defined interfaces.

LLVM is implemented in C++ and the main crux of it is the LLVM core libraries it provides. These libraries provide us with opt tool, the target independent optimizer, and code generation support for various target architectures. There are other tools which make use of core libraries, but our main focus in the book will be related to the three mentioned above. These are built around LLVM Intermediate Representation (LLVM IR), which can almost map all the high-level languages. So basically, to use LLVM's optimizer and code generation technique for code written in a certain programming language, all we need to do is write a frontend for a language that takes the high level language and generates LLVM IR. There are already many frontends available for languages such as C, C++, Go, Python, and so on. We will cover the following topics in this chapter:

  • Modular design and collection of libraries

  • Getting familiar with LLVM IR

  • LLVM Tools and using them at command line

Modular design and collection of libraries


The most important thing about LLVM is that it is designed as a collection of libraries. Let's understand these by taking the example of LLVM optimizer opt. There are many different optimization passes that the optimizer can run. Each of these passes is written as a C++ class derived from the Pass class of LLVM. Each of the written passes can be compiled into a .o file and subsequently they are archived into a .a library. This library will contain all the passes for opt tool. All the passes in this library are loosely coupled, that is they mention explicitly the dependencies on other passes.

When the optimizer is ran, the LLVM PassManager uses the explicitly mentioned dependency information and runs the passes in optimal way. The library based design allows the implementer to choose the order in which passes will execute and also choose which passes are to be executed based on the requirements. Only the passes that are required are linked to the final application, not the entire optimizer.

The following figure demonstrates how each pass can be linked to a specific object file within a specific library. In the following figure, the PassA references LLVMPasses.a for PassA.o, whereas the custom pass refers to a different library MyPasses.a for the MyPass.o object file.

The code generator also makes use of this modular design like the Optimizer, for splitting the code generation into individual passes, namely, instruction selection, register allocation, scheduling, code layout optimization, and assembly emission.

In each of the following phases mentioned there are some common things for almost every target, such as an algorithm for assigning physical registers available to virtual registers even though the set of registers for different targets vary. So, the compiler writer can modify each of the passes mentioned above and create custom target-specific passes. The use of the tablegen tool helps in achieving this using table description .td files for specific architectures. We will discuss how this happens later in the book.

Another capability that arises out of this is the ability to easily pinpoint a bug to a particular pass in the optimizer. A tool name Bugpoint makes use of this capability to automatically reduce the test case and pinpoint the pass that is causing the bug.

Getting familiar with LLVM IR


LLVM Intermediate Representation (IR) is the heart of the LLVM project. In general every compiler produces an intermediate representation on which it runs most of its optimizations. For a compiler targeting multiple-source languages and different architectures the important decision while selecting an IR is that it should neither be of very high-level, as in very closely attached to the source language, nor it should be very low-level, as in close to the target machine instructions. LLVM IR aims to be a universal IR of a kind, by being at a low enough level that high-level ideas may be cleanly mapped to it. Ideally the LLVM IR should have been target-independent, but it is not so because of the inherent target dependence in some of the programming languages itself. For example, when using standard C headers in a Linux system, the header files itself are target dependent, which may specify a particular type to an entity so that it matches the system calls of the particular target architecture.

Most of the LLVM tools revolve around this Intermediate Representation. The frontends of different languages generate this IR from the high-level source language. The optimizer tool of LLVM runs on this generated IR to optimize the code for better performance and the code generator makes use of this IR for target specific code generation. This IR has three equivalent forms:

  • An in-memory compiler IR

  • An on-disk bitcode representation

  • A Human readable form (LLVM Assembly)

Now let's take an example to see how this LLVM IR looks like. We will take a small C code and convert it into LLVM IR using clang and try to understand the details of LLVM IR by mapping it back to the source language.

$ cat add.c
int globvar = 12;

int add(int a) {
return globvar + a;
}

Use the clang frontend with the following options to convert it to LLVM IR:

$ clang -emit-llvm -c -S add.c
$ cat add.ll
; ModuleID = 'add.c'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@globvar = global i32 12, align 4

; Function Attrs: nounwind uwtable
define i32 @add(i32 %a) #0 {
  %1 = alloca i32, align 4
  store i32 %a, i32* %1, align 4
  %2 = load i32, i32* @globvar, align 4
  %3 = load i32, i32* %1, align 4
  %4 = add nsw i32 %2, %3
  ret i32 %4
}

attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.ident = !{!0}

Now let's look at the IR generated and see what it is all about. You can see the very first line giving the ModuleID, that it defines the LLVM module for add.c file. An LLVM module is a top–level data structure that has the entire contents of the input LLVM file. It consists of functions, global variables, external function prototypes, and symbol table entries.

The following lines show the target data layout and target triple from which we can know that the target is x86_64 processor with Linux running on it. The datalayout string tells us what is the endianess of machine ('e' meaning little endian), and the name mangling (m : e denotes elf type). Each specification is separated by ''and each following spec gives information about the type and size of that type. For example, i64:64 says 64 bit integer is of 64 bits.

Then we have a global variable globvar. In LLVM IR all globals start with '@' and all local variables start with '%'. There are two main reasons why the variables are prefixed with these symbols. The first one being, the compiler won't have to bother about a name clash with reserved words, the other being that the compiler can come up quickly with a temporary name without having to worry about a conflict with symbol table conflicts. This second property is useful for representing the IR in static single assignment (SSA) from where each variable is assigned only a single time and every use of a variable is preceded by its definition. So, while converting a normal program to SSA form, we create a new temporary name for every redefinition of a variable and limit the range of earlier definition till this redefinition.

LLVM views global variables as pointers, so an explicit dereference of the global variable using load instruction is required. Similarly, to store a value, an explicit store instruction is required.

Local variables have two categories:

  • Register allocated local variables: These are the temporaries and allocated virtual registers. The virtual registers are allocated physical registers during the code generation phase which we will see in a later chapter of the book. They are created by using a new symbol for the variable like:

    %1 = some value
    
  • Stack allocated local variables: These are created by allocating variables on the stack frame of a currently executing function, using the alloca instruction. The alloca instruction gives a pointer to the allocated type and explicit use of load and store instructions is required to access and store the value.

    %2 = alloca i32
    

Now let's see how the add function is represented in LLVM IR. define i32 @add(i32 %a) is very similar to how functions are declared in C. It specifies the function returns integer type i32 and takes an integer argument. Also, the function name is preceded by '@', meaning it has global visibility.

Within the function is actual processing for functionality. Some important things to note here are that LLVM uses a three-address instruction, that is a data processing instruction, which has two source operands and places the result in a separate destination operand (%4 = add i32 %2, %3). Also the code is in SSA form, that is each value in the IR has a single assignment which defines the value. This is useful for a number of optimizations.

The attributes string that follows in the generated IR specifies the function attributes which are very similar to C++ attributes. These attributes are for the function that has been defined. For each function defined there is a set of attributes defined in the LLVM IR.

The code that follows the attributes is for the ident directive that identifies the module and compiler version.

LLVM tools and using them in the command line


Until now, we have understood what LLVM IR (human readable form) is and how it can be used to represent a high-level language. Now, we will take a look at some of the tools that LLVM provides so that we can play around with this IR converting to other formats and back again to the original form. Let's take a look at these tools one by one along with examples.

  • llvm-as: This is the LLVM assembler that takes LLVM IR in assembly form (human readable) and converts it to bitcode format. Use the preceding add.ll as an example to convert it into bitcode. To know more about the LLVM Bitcode file format refer to http://llvm.org/docs/BitCodeFormat.html

    $ llvm-as add.ll –o add.bc
    

    To view the content of this bitcode file, a tool such as hexdump can be used.

    $ hexdump –c add.bc
    
  • llvm-dis: This is the LLVM disassembler. It takes a bitcode file as input and outputs the llvm assembly.

    $ llvm-dis add.bc –o add.ll
    

    If you check add.ll and compare it with the previous version, it will be the same as the previous one.

  • llvm-link: llvm-link links two or more llvm bitcode files and outputs one llvm bitcode file. To view a demo write a main.c file that calls the function in the add.c file.

    $ cat main.c
    #include<stdio.h>
    
    extern int add(int);
    
    int main() {
    int a = add(2);
    printf("%d\n",a);
    return 0;
    }
    

    Convert the C source code to LLVM bitcode format using the following command.

    $ clang -emit-llvm -c main.c
    

    Now link main.bc and add.bc to generate output.bc.

    $ llvm-link main.bc add.bc -o output.bc
    
  • lli: lli directly executes programs in LLVM bitcode format using a just-in-time compiler or interpreter, if one is available for the current architecture. lli is not like a virtual machine and cannot execute IR of different architecture and can only interpret for host architecture. Use the bitcode format file generated by llvm-link as input to lli. It will display the output on the standard output.

    $ lli output.bc
    14
    
  • llc: llc is the static compiler. It compiles LLVM inputs (assembly form/ bitcode form) into assembly language for a specified architecture. In the following example it takes the output.bc file generated by llvm-link and generates the assembly file output.s.

    $ llc output.bc –o output.s
    

    Let's look at the content of the output.s assembly, specifically the two functions of the generated code, which is very similar to what a native assembler would have generated.

    Function main:
      .type  main,@function
    main:                                   # @main
      .cfi_startproc
    # BB#0:
      pushq  %rbp
    .Ltmp0:
      .cfi_def_cfa_offset 16
    .Ltmp1:
      .cfi_offset %rbp, -16
      movq  %rsp, %rbp
    .Ltmp2:
      .cfi_def_cfa_register %rbp
      subq  $16, %rsp
      movl  $0, -4(%rbp)
      movl  $2, %edi
      callq  add
      movl  %eax, %ecx
      movl  %ecx, -8(%rbp)
      movl  $.L.str, %edi
      xorl  %eax, %eax
      movl  %ecx, %esi
      callq  printf
      xorl  %eax, %eax
      addq  $16, %rsp
      popq  %rbp
      retq
    .Lfunc_end0:
    
    
    Function: add
    add:                                    # @add
      .cfi_startproc
    # BB#0:
      pushq  %rbp
    .Ltmp3:
      .cfi_def_cfa_offset 16
    .Ltmp4:
      .cfi_offset %rbp, -16
      movq  %rsp, %rbp
    .Ltmp5:
      .cfi_def_cfa_register %rbp
      movl  %edi, -4(%rbp)
      addl  globvar(%rip), %edi
      movl  %edi, %eax
      popq  %rbp
      retq
    .Lfunc_end1:
  • opt: This is modular LLVM analyzer and optimizer. It takes the input file and runs the optimization or analysis specified on the command line. Whether it runs the analyzer or optimizer depends on the command-line option.

    opt [options] [input file name]

    When the –analyze option is provided it performs various analysis on the input. There is a set of analysis options already provided that can be specified through command line or else one can write down their own analysis pass and provide the library to that analysis pass. Some of the useful analysis passes that can be specified using the following command line arguments are:

    • basicaa: basic alias analysis

    • da: dependence analysis

    • instcount: count the various instruction types.

    • loops: information about loops

    • scalar evolution: analysis of scalar evolution

    When the –analyze option is not passed, the opt tool does the actual optimization work and tries to optimize the code depending upon the command-line options passed. Similarly to the preceding case, you can use some of the optimization passes already present or write your own pass for optimization. Some of the useful optimization passes that can be specified using the following command-line arguments are:

    • constprop: simple constant propagation.

    • dce: dead code elimination pass

    • globalopt: pass for global variable optimization

    • inline: pass for function inlining

    • instcombine: for combining redundant instructions

    • licm: loop invariant code motion

    • tailcallelim: Tail Call elimination

Note

Before going ahead we must note that all the tools mentioned in this chapter are meant for compiler writers. An end user can directly use clang for compilation of C code without converting the C code into intermediate representation

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Summary


In this chapter, we looked into the modular design of LLVM: How it is used in the opt tool of LLVM, and how it is applicable across LLVM core libraries. Then we took a look into LLVM intermediate representation, and how various entities (variables, functions etc.) of a language are mapped to LLVM IR. In the last section, we discussed about some of the important LLVM tools, and how they can be used to transform the LLVM IR from one form to another.

In the next chapter, we will see how we can write a frontend for a language that can output LLVM IR using the LLVM machinery.

Left arrow icon Right arrow icon

Key benefits

  • Learn to use the LLVM libraries to emit intermediate representation (IR) from high-level language
  • Build your own optimization pass for better code generation
  • Understand AST generation and use it in a meaningful way

Description

LLVM is currently the point of interest for many firms, and has a very active open source community. It provides us with a compiler infrastructure that can be used to write a compiler for a language. It provides us with a set of reusable libraries that can be used to optimize code, and a target-independent code generator to generate code for different backends. It also provides us with a lot of other utility tools that can be easily integrated into compiler projects. This book details how you can use the LLVM compiler infrastructure libraries effectively, and will enable you to design your own custom compiler with LLVM in a snap. We start with the basics, where you’ll get to know all about LLVM. We then cover how you can use LLVM library calls to emit intermediate representation (IR) of simple and complex high-level language paradigms. Moving on, we show you how to implement optimizations at different levels, write an optimization pass, generate code that is independent of a target, and then map the code generated to a backend. The book also walks you through CLANG, IR to IR transformations, advanced IR block transformations, and target machines. By the end of this book, you’ll be able to easily utilize the LLVM libraries in your own projects.

What you will learn

[*]Get an introduction to LLVM modular design and LLVM tools [*]Convert frontend code to LLVM IR [*]Implement advanced LLVM IR paradigms [*]Understand the LLVM IR Optimization Pass Manager infrastructure and write an optimization pass [*]Absorb LLVM IR transformations [*]Understand the steps involved in converting LLVM IR to Selection DAG [*]Implement a custom target using the LLVM infrastructure [*]Get a grasp of C’s frontend clang, an AST dump, and static analysis

Product Details

Country selected

Publication date : Dec 21, 2015
Length 166 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781785280801
Category :
Languages :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Dec 21, 2015
Length 166 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781785280801
Category :
Languages :

Table of Contents

14 Chapters
LLVM Essentials Chevron down icon Chevron up icon
Credits Chevron down icon Chevron up icon
About the Authors Chevron down icon Chevron up icon
About the Reviewer Chevron down icon Chevron up icon
www.PacktPub.com Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
1. Playing with LLVM Chevron down icon Chevron up icon
2. Building LLVM IR Chevron down icon Chevron up icon
3. Advanced LLVM IR Chevron down icon Chevron up icon
4. Basic IR Transformations Chevron down icon Chevron up icon
5. Advanced IR Block Transformations Chevron down icon Chevron up icon
6. IR to Selection DAG phase Chevron down icon Chevron up icon
7. Generating Code for Target Architecture Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.