Reader small image

You're reading from  Soar with Haskell

Product typeBook
Published inDec 2023
Reading LevelBeginner
PublisherPackt
ISBN-139781805128458
Edition1st Edition
Languages
Right arrow
Author (1)
Tom Schrijvers
Tom Schrijvers
author image
Tom Schrijvers

Tom Schrijvers is a professor of computer science at KU Leuven in Belgium since 2014, and previously from 2011 until 2014 at Ghent University in Belgium. He has over 20 years of research experience in programming languages and has co-authored more than 100 scientific papers. Much of his research focuses on functional programming and on the Haskell programming language in particular: he has made many contributions to the language, its ecosystem and applications, and chaired academic events like the Haskell Symposium. At the same time, he has more than a decade of teaching experience (including functional programming with Haskell) and received several teaching awards.
Read more about Tom Schrijvers

Right arrow

Parser Combinators

Parsing is the act of turning a usually human-readable structured text into a data structure that can be easily processed in software. The best-known application of parsers is on compilers. A compiler frontend takes the text written by programmers, the source code, and turns it into an abstract syntax tree, which is a more convenient data structure to work with in the next stages of the compiler. Besides parsing source code, many other structured text formats are parsed, such as JSON, YMAL, and XML, which are used for all kinds of data entry, data exchange, and configuration.

To make parsing possible, the text cannot take on an arbitrary form. It needs to be structured in a particular way, and follow certain rules. Formally, the expected text structure is sometimes codified in a grammar, which serves as a non-executable specification for the parser. This formal background, together with the development of many sophisticated and intricate parsing approaches for...

Parsing

Before we dive into parser combinators, let’s briefly review what parsing is, and consider some alternative options for obtaining Haskell parsers.

What is parsing?

The basic idea of parsing is to take a string representation of some data and turn it into the corresponding value of a structured data type. For example, consider the "1 + 2" string. With a parser, we could turn this into a value, Plus (Lit 1) (Lit 2), of the Expr algebraic data type:

data Expr = Lit Int | Plus Expr Expr

Typically, the textual value is convenient for humans, but further programmatic processing is much easier on the structured value. For example, it is much easier to write an evaluation function that takes Expr than a string.

Because the string can easily be ill-formed – for example, "1 +" or "1 ++ 2" – parsing is an operation that may fail. We model this in Haskell with the result type, Maybe Expr. Thus, the basic parsing interface...

Parser combinators

Parser combinators are a compositional approach for defining parsers in the style of the embedded DSLs, which we covered in the previous chapter. In this section, we will cover basic combinators and the essence of their implementation.

A parser for sums

A parser is a kind of data structure that is assembled from combinators. The (abstract) type of the parser is Parser a; this type denotes a parser that processes a string and produces a result of the a type. For example, to parse sum expressions, we want a parser, exprP, of the Parser Expr type.

Once we have a parser, we can apply it to an input string using the following interpretation function:

parse :: Parser a -> String -> Maybe a

Because the input string may not be in the expected format, the parser can fail; this is modeled with the Maybe a result type.

We expect the following behavior:

*Main> parse exprP "1+2"
Just (Plus (Lit 1) (Lit 2))
*Main> parse exprP "1+...

The Parsec library

In practice, you will want to use an off-the-shelf parser combinator library. In this chapter, we’ll study Parsec, which is one of the older and more established libraries. On Hackage, you can find various new libraries with more bells and whistles, but Parsec will do nicely as a starting point.

Different types of parsers

To provide additional flexibility and expressive power, Parsec’s parser type features three additional type parameters beyond the a parameter for the result type:

ParsecT s u m a

Let’s discuss the three additional parameters from right to left:

  • The monad parameter, m, signals that ParsecT s u is a monad transformer; it can be layered on top of a monad, m. For basic uses, we don’t need any underlying monad and can default to using the trivial Identity monad. For that, Parsec provides a type synonym:
    type Parsec s u = ParsecT s u Identity
  • The u parameter is for a user-defined state. Indeed, some parsers...

Parsing challenges – expressions revisited

In this section, we’ll revisit the parser for simple arithmetic expressions, now using Parsec. We explore several variations and extensions, consolidate earlier lessons, and handle new challenges.

As our starting point, consider again the simple type for arithmetic expressions:

data Expr = Lit Int | Plus Expr Expr

We could write a very native parser for it that follows the structure of the data type definition:

exprP = literalP <|> sumP
literalP = Lit <$> number
sumP = do x <- exprP
          plus
          y <- exprP
          pure (Plus x y)

When used naively, the parser does not consume all the input:

*Main> parse exprP "1+2"
Right (Lit 1)

To amend this, we have to check for the end of the file:

*Main> parse (exprP <...

Summary

In this chapter, we learned how parser combinators make it easy to turn strings into structured data. We covered how to write parsers and also looked under the hood of a minimal parser combinator implementation to get a basic understanding of how the approach works. Then, we moved on to the industrial-strength Parsec library for parser combinators. We studied its character consumption behavior and its support for error messages. Finally, we explored how to satisfy several requirements and avoid common pitfalls when writing parsers.

Chapter 15, Lenses, presents an elegant, purely functional approach to a mundane but ubiquitous programming task: data access in nested data types. First, we’ll identify the disadvantages of Haskell’s built-in support for record access and then present the concept of lenses as a much more convenient alternative for both reading and updating fields. We’ll not only show that lenses compose trivially to reach deep into data structures...

Questions

Answer the following questions to test your knowledge of this chapter:

  1. What is parsing?
  2. What are parser combinators?
  3. What are the advantages and disadvantages of parser combinators?
  4. What are the particular features of Parsec?

Further reading

To learn more about the topics that were covered in this chapter, take a look at the following resource:

Answers

Here are the answers to this chapter’s questions:

  1. Parsing is the act of turning a, usually human-readable, structured text into a data structure that can be easily processed in software.
  2. Parser combinators are an embedded DSL for parsing. They define parsers in a compositional way, assembling primitive parsers into larger ones.
  3. Parser combinators have several advantages:
    • They are more convenient to write than hand-rolling your own parser
    • As an embedded DSL, they are easily integrated into a code base and do not hamper the development process
    • They do not require learning a new language and have a low threshold for entry
    • Monad parsers are very flexible and expressive; the parsing behavior can be determined dynamically
  4. Parser combinators also have several disadvantages compared to parser generators:
    • Their performance is usually not as good as that of parser generators and can be pathologically bad if we aren’t careful (for example, in the case of left...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Soar with Haskell
Published in: Dec 2023Publisher: PacktISBN-13: 9781805128458
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Tom Schrijvers

Tom Schrijvers is a professor of computer science at KU Leuven in Belgium since 2014, and previously from 2011 until 2014 at Ghent University in Belgium. He has over 20 years of research experience in programming languages and has co-authored more than 100 scientific papers. Much of his research focuses on functional programming and on the Haskell programming language in particular: he has made many contributions to the language, its ecosystem and applications, and chaired academic events like the Haskell Symposium. At the same time, he has more than a decade of teaching experience (including functional programming with Haskell) and received several teaching awards.
Read more about Tom Schrijvers