Reader small image

You're reading from  Clang Compiler Frontend

Product typeBook
Published inMar 2024
PublisherPackt
ISBN-139781837630981
Edition1st Edition
Right arrow
Author (1)
Ivan Murashko
Ivan Murashko
author image
Ivan Murashko

Ivan V. Murashko is a C++ software engineer: He got his PhD from Peter the Great St.Petersburg Polytechnic University and has over 20 years of C++ programming experience; since 2020 he has worked with LLVM compilers. His area of interest includes clang compiler frontend and clang tools (clang-tidy, clangd).
Read more about Ivan Murashko

Right arrow

6

Advanced Code Analysis

Clang-Tidy checks, as discussed in the previous chapter, rely on advanced matching provided by the AST. However, this approach might not be sufficient for detecting more complex problems, such as lifetime issues (that is, when an object or resource is accessed or referenced after it has been deallocated or has gone out of scope, potentially leading to unpredictable behavior or crashes). In this chapter, we will introduce advanced code analysis tools based on the Control Flow Graph (CFG). The Clang Static Analyzer is an excellent example of such tools, and Clang-Tidy also integrates some aspects of CFGs. We will begin with typical usage examples and then delve into the implementation details. The chapter will conclude with a custom check that employs advanced techniques and extends the concept of class complexity to method implementations. We will define cyclomatic complexity and demonstrate how to calculate it using the CFG library provided by Clang. In this chapter...

6.1 Technical requirements

The source code for this chapter is located in the chapter6 folder of the book’s GitHub repository: https://github.com/PacktPublishing/Clang-Compiler-Frontend-Packt/tree/main/chapter6.

6.2 Static analysis

Static analysis is a crucial technique in software development that involves inspecting the code without actually running the program. This method focuses on analyzing either the source code or its compiled version to detect a variety of issues, such as errors, vulnerabilities, and deviations from coding standards. Unlike dynamic analysis, which requires the execution of the program, static analysis allows for examining the code in a non-runtime environment.

More generally, static analysis aims to check a specific property of a computer program based on its meaning; that is, it can be considered a part of semantic analysis (see Figure 2.6, Parser). For instance, if 𝒞 is the set of all C/C++ programs and 𝒫 is a property of such a program, then the goal of static analysis is to check the property for a specific program P ∈𝒞, that is, to answer the question of whether 𝒫(P) is true or false.

Our Clang-Tidy check from the previous...

6.3 CFG

A CFG is a fundamental data structure in compiler design and static program analysis, representing all paths that might be traversed through a program during execution.

A CFG consists of the following key components:

  • Nodes: Correspond to basic blocks, a straight-line sequence of operations with one entry and one exit point

  • Edges: Represent the flow of control from one block to another, including both conditional and unconditional branches

  • Start and end nodes: Every CFG has a unique entry node and one or more exit nodes

As an example of a CFG, consider the function to calculate the maximum of two integer numbers that we used as an example before; see Figure 2.5:

int max(int a, int b) { 
 
  if (a > b) 
 

    return a; 
 
  return b; 
 
}

Figure 6.1: CFG example C++ code: max.cpp

The corresponding CFG can be represented as follows:

Figure 6.2: CFG example for max.cpp

Figure 6.2: CFG example for...

6.4 Custom CFG check

We are going to use the knowledge gained in Section 5.4, Custom Clang-Tidy check to create a custom CFG check. As mentioned previously, the check will use Clang’s CFG to calculate cyclomatic complexity. The check should issue a warning if the calculated complexity exceeds a threshold. This threshold will be set up as a configuration parameter, allowing us to change it during our tests. Let’s start with the creation of the project skeleton.

6.4.1 Creating the project skeleton

We will use cyclomaticcomplexity as the name for our check, and our project skeleton can be created as follows:

$ ./clang-tools-extra/clang-tidy/add_new_check.py misc cyclomaticcomplexity

Figure 6.3: Creating a skeleton for the misc-cyclomaticcomplexity check

As a result of the run, we will get a number of modified and new files. The most important ones for us are the following two files located in the clang-tools-extra/clang-tidy/misc/ folder:

  • misc...

6.5 CFG on Clang

A CFG is the basic data structure for advanced static analysis using Clang tools. Clang constructs the CFG for a function from its AST, identifying basic blocks and control flow edges. Clang’s CFG construction handles various C/C++ constructs, including loops, conditional statements, switch cases, and complex constructs such as setjmp/longjmp and C++ exceptions. Let’s consider the process using our example from Figure 6.1.

6.5.1 CFG construction by example

Our example from Figure 6.1 has five nodes, as shown in Figure 6.2. Lets run a debugger to investigate the process, as follows:

1$ lldb <...>/llvm-project/install/bin/clang-tidy --                   \ 
 

  -checks="-*,misc-cyclomaticcomplexity"                    ...

6.6 Brief description of Clang analysis tools

As mentioned earlier, the CFG is foundational for other analysis tools in Clang, several of which have been created atop the CFG. These tools also employ advanced mathematics to analyze various cases. The most notable tools are as follows [32]:

  • LivenessAnalysis: Determines whether a computed value will be used before being overwritten, producing liveness sets for each statement and CFGBlock

  • UninitializedVariables: Identifies the use of uninitialized variables through multiple passes, including initial categorization of statements and subsequent calculation of variable usages

  • Thread Safety Analysis: Analyzes annotated functions and variables to ensure thread safety

LivenessAnalysis in Clang is essential for optimizing code by determining whether a value computed at one point will be used before being overwritten. It produces liveness sets for each statement and CFGBlock, indicating potential future use of variables or expressions. This backward...

6.7 Knowing the limitations of analysis

It’s worth mentioning some limitations of the analysis that can be conducted with Clang’s AST and CFG. The most notable ones are mentioned here [2]:

  • Limitations of Clang’s AST: Clang’s AST is unsuitable for data flow analysis and control flow reasoning, leading to inaccurate results and inefficient analysis due to the loss of vital language information. Soundness of analysis is also a consideration, where the precision of certain analyses, such as liveness analysis, can be valuable if they are precise enough rather than always being conservative.

  • Issues with Clang’s CFG: While Clang’s CFG aims to bridge the gap between AST and LLVM IR, it encounters known problems, has limited interprocedural capabilities, and lacks adequate testing coverage.

One example mentioned in [2] relates to C++ coroutines, a new feature introduced in C++20. Some aspects of this functionality are implemented outside the Clang frontend...

6.8 Summary

In this chapter, we investigated Clang’s CFG, a powerful data structure that represents the symbolic execution of a program. We created a simple Clang-Tidy check using a CFG to calculate cyclomatic complexity, a metric useful for estimating code complexity. Additionally, we explored the details of CFG creation and the formation of its basic internal structures. We discussed some tools developed with CFGs, which are useful for detecting lifetime issues, thread safety, and uninitialized variables. We also briefly described the limitations of CFGs and how other tools can address these limitations.

The next chapter will cover refactoring tools. These tools can perform complex code modifications using the AST provided by the Clang compiler.

6.9 Future reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Clang Compiler Frontend
Published in: Mar 2024Publisher: PacktISBN-13: 9781837630981
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ivan Murashko

Ivan V. Murashko is a C++ software engineer: He got his PhD from Peter the Great St.Petersburg Polytechnic University and has over 20 years of C++ programming experience; since 2020 he has worked with LLVM compilers. His area of interest includes clang compiler frontend and clang tools (clang-tidy, clangd).
Read more about Ivan Murashko