Lexical analysis
As already seen in the example in the previous section, a programming language consists of many elements such as keywords, identifiers, numbers, operators, and so on. The task of the lexical analyzer is to take the textual input and create a sequence of tokens from it. The calc language consists of the tokens with, :, +, -, *, /, (, ), and regular expressions ([a-zA-Z])+ (an identifier) and ([0-9])+ (a number). We assign a unique number to each token to make the handling of tokens easier.
A hand-written lexer
The implementation of a lexical analyzer is often called Lexer. Let’s create a header file called Lexer.h and get started with the definition of Token. It begins with the usual header guard and the inclusion of the required headers:
#ifndef LEXER_H #define LEXER_H #include "llvm/ADT/StringRef.h" #include "llvm/Support/MemoryBuffer.h"
The llvm::MemoryBuffer class provides read-only access to a block of memory, filled with the...