Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Clang Compiler Frontend

You're reading from  Clang Compiler Frontend

Product type Book
Published in Mar 2024
Publisher Packt
ISBN-13 9781837630981
Pages 326 pages
Edition 1st Edition
Languages
Author (1):
Ivan Murashko Ivan Murashko
Profile icon Ivan Murashko

Table of Contents (17) Chapters

Preface 1. Part I: Clang Setup and Architecture
2. Chapter 1: Environment Setup 3. Chapter 2: Clang Architecture 4. Chapter 3: Clang AST 5. Chapter 4: Basic Libraries and Tools 6. Part II: Clang Tools
7. Chapter 5: Clang-Tidy Linter Framework 8. Chapter 6: Advanced Code Analysis 9. Chapter 7: Refactoring Tools 10. Chapter 8: IDE Support and Clangd 11. Part III: Appendix
12. Bibliography
13. Index 14. Other Books You Might Enjoy Appendix 1: Compilation Database 1. Appendix 2: Build Speed Optimization

3

Clang AST

The parsing stage of any compiler generates a parse tree, and the Abstract Syntax Tree (AST) is a fundamental algorithmic structure that is generated during the parsing of a given input program. The AST serves as the framework for the Clang frontend and is the primary tool for various Clang utilities, including linters. Clang offers sophisticated tools for searching (or matching) various AST nodes. These tools are implemented using a Domain-Specific Language (DSL). It’s crucial to understand its implementation to use it effectively.

We will start with the basic data structures and the class hierarchy that Clang uses to construct the AST. Additionally, we will explore the methods used for AST traversal and highlight some helper classes that facilitate node matching during this traversal. We will cover the following topics:

  • Basic blocks used to construct the AST

  • How the AST can be traversed

  • The recursive visitor as the fundamental AST traversal tool

  • AST matchers and...

3.1 Technical requirements

The source code for this chapter is located in the chapter3 folder of the book’s GitHub repository: https://github.com/PacktPublishing/Clang-Compiler-Frontend-Packt/tree/main/chapter3.

3.2 AST

The AST is usually depicted as a tree, with its leaf nodes corresponding to various objects, such as function declarations and loop bodies. Typically, the AST represents the result of syntax analysis, i.e., parsing. Clang’s AST nodes were designed to be immutable. This design requires that the Clang AST stores results from semantic analysis, meaning the Clang AST represents the outcomes of both syntax and semantic analyses.

Important note

Although Clang also employs an AST, it’s worth noting that the Clang AST is not a true tree. The presence of backward edges makes ”graph” a more appropriate term for describing Clang’s AST.

Typical tree structure implemented in C++ has all nodes derived from a base class. Clang uses a different approach. It splits different C++ constructions into separate groups with basic classes for each of them:

  • Statements: clang::Stmt is the basic class for all statements. That includes ordinary statements such as if...

3.3 AST traversal

The compiler requires traversal of the AST to generate IR code. Thus, having a well-structured data structure for tree traversal is paramount for AST design. To put it another way, the design of the AST should prioritize facilitating easy tree traversal. A standard approach in many systems is to have a common base class for all AST nodes. This class typically provides a method to retrieve the node’s children, allowing for tree traversal using popular algorithms such as Breadth-First Search (BFS) [19]. Clang, however, takes a different approach: its AST nodes don’t share a common ancestor. This poses the question: how is tree traversal organized in Clang?

Clang employs three unique techniques:

  • The Curiously Recurring Template Pattern (CRTP) for visitor class definition

  • Ad hoc methods tailored specifically for different nodes

  • Macros, which can be perceived as the connecting layer between the ad hoc methods and CRTP

We will explore these techniques through...

3.4 Recursive AST visitor

Recursive AST visitors address the limitations observed with specialized visitors. We will create the same program, which searches for and prints function declarations along with their parameters, but we’ll use a recursive visitor this time.

The CMakeLists.txt for recursive visitor test tool will be used in a similar way as before. Only the project name (Lines 2 and 15-17 in Figure 3.20) and source filename (Line 14 in Figure 3.20 were changed:

cmake_minimum_required(VERSION 3.16) 
 

project("recursivevisitor") 
 
 
 
if ( NOT DEFINED ENV{LLVM_HOME}) 
 

  message(FATAL_ERROR "$LLVM_HOME is not defined") 
 

else() 
 
  message(STATUS "$LLVM_HOME found: $ENV{LLVM_HOME}") 
 

  set(LLVM_HOME $ENV{LLVM_HOME} CACHE PATH "Root of LLVM installation") 
 

  set(LLVM_LIB...

3.5 AST matchers

AST matchers [16] provide another approach for locating specific AST nodes. They can be particularly useful in linters when searching for improper pattern usage or in refactoring tools when identifying AST nodes for modification.

We will create a simple program to test AST matches. The program will identify a function definition with the name max. We will use a slightly modified CMakeLists.txt file from the previous examples to include the libraries required to support AST matches:

cmake_minimum_required(VERSION 3.16) 
 

project("matchvisitor") 
 
 
 
if ( NOT DEFINED ENV{LLVM_HOME}) 
 

  message(FATAL_ERROR "$LLVM_HOME is not defined") 
 

else() 
 
  message(STATUS "$LLVM_HOME found: $ENV{LLVM_HOME}") 
 

  set(LLVM_HOME $ENV{LLVM_HOME} CACHE PATH "Root of LLVM installation") 
 

  set...

3.6 Explore Clang AST with clang-query

AST matchers are incredibly useful, and there’s a utility that facilitates checking various matchers and analyzing the AST of your source code. This utility is known as clang-query tool. You can build and install this utility using the following command:

$ ninja install-clang-query

Figure 3.29: The clang-query installation

You can run the tool as follows:

$ <...>/llvm-project/install/bin/clang-query minmax.cpp

Figure 3.30: Running clang-query on a test file

We can use the match command as follows:

clang-query> match functionDecl(decl().bind("match-id"), matchesName("max"))
Match #1:
minmax.cpp:1:1: note: "match-id" binds here
int max(int a, int b) {
^~~~~~~~~~~~~~~~~~~~~~~
minmax.cpp:1:1: note: "root" binds here
int max(int a, int b) {
^~~~~~~~~~...

3.7 Processing AST in the case of errors

One of the most interesting aspects of Clang pertains to error processing. Error processing encompasses error detection, the display of corresponding error messages, and potential error recovery. The latter is particularly intriguing in terms of the Clang AST. Error recovery occurs when Clang doesn’t halt upon encountering a compilation error but continues to compile in order to detect additional issues.

Such behavior is beneficial for various reasons. The most evident one is user convenience. When programmers compile a program, they typically prefer to be informed about as many errors as possible in a single compilation run. If the compiler were to stop at the first error, the programmer would have to correct that error, recompile, then address the subsequent error, and recompile again, and so forth. This iterative process can be tedious and frustrating, especially with larger code bases or intricate errors. While this behavior is particularly...

3.8 Summary

We explored the Clang AST, a major instrument for creating various Clang tools. We learned about the architectural design principles chosen for the implementation of the Clang AST and investigated different methods for AST traversal. We delved into specialized traversal techniques, such as those for C/C++ declarations, and also looked into more universal techniques that employ recursive visitors and Clang AST matchers. Our exploration concluded with the clang-query tool and how it can be used for Clang AST exploration. Specifically, we used it to understand how Clang processes compilation errors.

The next chapter will discuss the basic libraries used in Clang and LLVM development. We will explore the LLVM code style and foundational Clang/LLVM classes, such as SourceManager and SourceLocation. We will also cover the TableGen library, which is used for code generation, and the LLVM Integration Test (LIT) framework.

3.9 Further reading

lock icon The rest of the chapter is locked
You have been reading a chapter from
Clang Compiler Frontend
Published in: Mar 2024 Publisher: Packt ISBN-13: 9781837630981
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}