Structuring the lexer
As we know from the previous chapter, we need a Token class and a Lexer class. Additionally, a TokenKind enumeration is required to give each token class a unique number. Having an all-in-one header and implementation file does not scale, so let’s move the items. TokenKind can be used universally and is placed in the Basic component. The Token and Lexer classes belong to the Lexer component but are placed in different headers and implementation files.
There are three different classes of tokens: keywords, punctuators, and tokens, which represent sets of many values. Examples are the CONST keyword, the; delimiter, and the ident token, respectively, each of which represents identifiers in the source. Each token needs a member name for the enumeration. Keywords and punctuators have natural display names that can be used for messages.
Like in many programming languages, the keywords are a subset of the identifiers. To classify a token as a keyword, we...