The first version of Haskell was standardized in 1990. After a series of intermediate standards, the minimal, stable, and portable version of the language was published as "The Haskell 98 Report" in February 1999. This successful standard was revised in 2003 and published as "Haskell 98 Language and Libraries: The Revised Report". This is the most supported version of the language and it is implemented in many compilers and interpreters of Haskell. The latest specification, Haskell 2010, adds Foreign Function Interface (FFI) for binding to other programming languages, fixes some syntax issues, and introduces several pluggable language extensions. Throughout this book, we will use Haskell 2010.
First, a purely functional programming language means that, in general, functions in Haskell don't have side effects. There is a special type for impure functions, or functions with side effects.
Then, Haskell has a strong, static type system with an automatic and robust type inference. This, in practice, means that you do not usually need to specify types of functions and also the type checker does not allow passing incompatible types. In strongly typed languages, types are considered to be a specification, due to the Curry-Howard correspondence, the direct relationship between programs and mathematical proofs. Under this great simplification, the theorem states that if a value of the type exists (or is inhabited), then the corresponding mathematical proof is correct. Or jokingly saying, if a program compiles, then there is 99 percent probability that it works according to specification. Though the question if the types conform, the specification in natural language is still open; Haskell won't help you with it.
The glorious Glasgow Haskell Compilation System, or simply Glasgow Haskell Compiler (GHC), is the most widely used Haskell compiler. It is the current de facto standard. The compiler is packaged into the Haskell platform that follows Python's principle, "Batteries included". The platform is updated twice a year with new compilers and libraries. It usually includes a compiler, an interactive Read-Evaluate-Print Loop (REPL) interpreter, Haskell 98/2010 libraries (so-called Prelude) that includes most of the common definitions and functions, and a set of commonly used libraries. If you are on Windows or Mac OS X, it is strongly recommended to use prepackaged installers of the Haskell platform at http://www.haskell.org/platform/.
Many Linux distributions include the Haskell platform in their repositories. Though it becomes less maintained, as more and more developers tend to stay on the bleeding edge, it is better to just install the core. For instance, on Debian-based systems, you can get started by running the following command:
sudo apt-get install ghc cabal-install cabal update
Even though Windows support is quite impressive, Linux/Mac OS X installation usually has a broader support because they include the free and complete C/C++ compiler tool chain, GCC. Although MinGW32 for Windows tries to provide such toolkit, it usually lacks a few GNU libraries to be completely compatible with the Linux environment. The missing 64-bit support on Windows might be crucial for data intensive applications. So for advanced development, you require a Unix-based machine.
Install the Command Line Tools for Xcode package from http://developer.apple.com to get GNU Compiler Chain (GCC). It is required for Haskell compiler to work.
Download a 64-bit package of Haskell platform from http://haskell.org.
Double-click on the package and the installer appears. It warns you that the Haskell platform requires GCC. Click on Continue, as shown in the following screenshot:
The installer asks for elevated credentials and you need to enter a correct password:
The installation is now complete!
After installing it, the following tools will be available:
But also you'll need a text editor. If you prefer a full-pledged IDE, then Leksah (http://leksah.org) might be a good choice. Otherwise, both Emacs and VIM have excellent support of Haskell including, but not limited to, syntax highlighting, code formatting, integrating with GHCi and type and tag browsers; even though, examples in this chapter require only a working interpreter.
To start with development, first we should be familiar with a few basic features of Haskell. We really need to know about laziness, datatypes, pattern matching, type classes, and basic notion of monads to start with Haskell.
Haskell is a language with lazy evaluation. From a programmer's point of view that means that the value is evaluated if and only if it is really needed. Imperative languages usually have a strict evaluation, that is, function arguments are evaluated before function application. To see the difference, let's take a look at a simple expression in Haskell:
let x = 2 + 3
In a strict or eager language, the
2 + 3 expression would be immediately evaluated to 5, whereas in Haskell, only a promise to do this evaluation will be created and the value will be evaluated only when it is needed. In other words, this statement just introduces definition of
x which might be used afterwards, unlike in strict language where it is an operator that assigns the computed value, 5 to a memory cell named
x. Also, this strategy allows sharing of evaluations, because laziness assumes that computations can be executed whenever it is needed and therefore, the result can be memorized. It might reduce the running time by exponential factor over strict evaluation.
Laziness also allows to manipulate with infinite data structures. For instance, we can construct an infinite list of natural numbers as follows:
let ns = [1..]
And moreover, we can manipulate it as if it is a normal list, even though some caution is needed, as you can get an infinite loop. We can take the first five elements of this infinite list by means of the built-in function,
take as follows:
take 5 ns
By running this example in GHCi you will get
The notion of function is one of the core ideas in functional languages and Haskell is not an exception at all. The definition of a function includes a body of function and an optional type declaration. For instance, the
take function is defined in
Prelude as follows:
take :: Int -> [a] -> [a] take = ...
Here, the type declaration says that the function takes an integer as the argument and a list of objects of the
a type, and returns a new list of the same type. Also Haskell allows partial application of a function. For example, you can construct a function that takes first five elements of the list as follows:
take5 :: [a] -> [a] take5 = take 5
Also functions are themselves objects, that is, you may pass a function as an argument to another function.
Prelude defines map function as a function of a function:
map :: (a -> b) -> [a] -> [b]
map takes a function and applies it to each element of the list. Thus functions are first-class citizens in the language and it is possible to manipulate with them as if they were normal objects.
Datatype is the core of a strongly-typed language as Haskell. The distinctive feature of Haskell datatypes is that they all are immutable; that is, after an object is constructed it cannot be changed. It might be weird for the first sight, but in the long run, it has several positive consequences. First, it enables computation parallelization. Second, all data is referentially transparent; that is, there is no difference between the reference to an object and the object itself. These two properties allow the compiler to reason about code optimization at a higher level than what the C/C++ compiler can.
Now, let's turn to our domain model to show how datatypes are defined in Haskell. Consider data model in a quote-driven market. The market maker posts bid and ask quotes. We can express these facts in
The first declaration just creates a synonym for an existing datatype and type checker won't prevent you from using
TimeStamp. Also, you can use a value of the
TimeStamptype with any function that expects to work with an
The second declaration creates a new type for prices and you are not allowed to use
Price. The compiler will raise an error in such cases and thus it will enforce the correct usage.
The last declaration introduces Algebraic Data Type (ADT) and says that the
Quotetype might be constructed either by the
AskQuotedata constructor, or by
BidQuotewith timestamp and price as its parameters. The
Quoteitself is called as a type constructor.
Type constructor might be parameterized by type variables. For example, the
Either types quite oftenly used standard types, and are defined in the
Quotes2.hs file. Here,
b are type variables; that is, any type can be placed instead of them there. The
Maybe type is often used to represent calculation that might have ended without results. If the logarithm is defined only on the positive half of the real line, we can define a type-safe version of the logarithm function in
Log.hs. This version will always remind you to handle the
NaN operation in your code. What is more, the compiler will make an exhaustive check to see if all paths are covered, and it will warn you if not. Please, also note that we used a new syntactic construction, a guard, to define conditions of function application. You can use any expression evaluating to
Bool in guards. Also, the compiler checks to see if all the right-hand definitions have the same type, even if the type is not declared explicitly. It ensures that the function type is unambiguous.
Either type is commonly used for functions that can result either with an error, or with a value, and the
Right constructor used to hold "right" values in
In fact, data constructors are normal functions in Haskell that return or construct a new object. If you query GHCi about type of the
Right constructor, it will print out the following types:
Prelude> :t Left Left :: a -> Either a b Prelude> :t Right Right :: b -> Either a b
This fact allows us to pass a data constructor as the function argument, create a partially "constructed" datatype, and everything else that is possible to do with common functions in Haskell.
Type classes in Haskell are not classes as in object-oriented languages. It is more similar to the interfaces with optional implementation. You can find them in other languages as traits, mixins, and so on but unlike in them, this feature in Haskell enables ad-hoc polymorphism; that is, a function can be applied to arguments of different types. It is also known as function overloading, or operator overloading. A polymorphic function can specify different implementation for different types.
By principle, the type class consists of function declarations over some objects. The
Eq type class is a standard type class that specifies how to compare two objects of the same type as given in
Eq is the name of the type class,
a is a type variable and
/= are the operations defined in the type class. This definition means that some type,
a is of
Eq class if it has defined the
/= operation. Moreover, the definition provides default implementation of the
/= operation. And if you decide to implement this class for a datatype, then you need to provide single operation implementation. For example, we might make
Price as an instance of the
Eq type class as given in
We have just defined a quite naïve implementation of the
Eq type class. With such definition, we can start to use functions that work over any type that implements the
Eq type class. For instance,
Prelude has the following functions for searching in lists that rely on the implementation of
elem :: (Eq a) => a -> [a] -> Bool notElem :: (Eq a) => a -> [a] -> Bool lookup :: (Eq a) => a -> [(a,b)] -> Maybe b
(Eq a) => requires the
a type to have the
Eq type class implemented. The first function checks if the element is in the list. The second one is a negation of an element. And the last one looks up for a key in the list of tuples. But it is important to note that type classes help to write very generic functions that work across many types just by putting a minimal required constraint on them. In object-oriented languages, it is usually implemented by interfaces, though Haskell allows you to extend library-defined datatype by implementing any type class.
There are a large number of useful type classes defined in
Prelude. For example,
Eq is used to define equality between two objects,
Ord is used to specify total ordering,
Read are used to introduce a string representation of the object, and the
Enum type class is used to describe enumerations, that is, datatypes with null constructors. It might be quite boring to implement this quite trivial but useful type classes for each datatype, so Haskell supports automatic derivation of most of the standard type classes. This is given in
Now, objects of the
Price type can be converted from/to string by means of read/show functions and compared with themselves using equality operators. We will not cover all the details of the derivation. Those who want to know all the details can look them up in the Haskell language report.
Pattern matching is a core feature that makes Haskell so concise and readable. In simple words, you should specify a pattern to which some input should conform and then provide implementation. Consider the classic example of factorial as given in
We can see that the
factorial function accepts objects that can be compared for equality, specified by the
Eq type class, and are of type class
Num, or numbers. Then, if the input is zero, the output is one. And after that for all the other inputs, the function is defined recursively, which means by itself. Recursion is an important construct in Haskell, because the language supports declarative description of computations; that is, you need to specify what should be computed but not how it should be done. That is why there is no for and while loops; you should use recursion instead of them. The compiler has optimization techniques for recursion, and it doesn't introduce a penalty that is usual for imperative languages in the form of stack overflows. Although imperative languages usually have a tail recursion optimization built into compilers, GHC supports a generalized tail recursion that covers a lot of cases. So it is usually better to pay more attention to readability and correctness of the recursion code than to achieve pure tail recursion.
The order of pattern matching is quite important; here we will explicitly state that the first rule should be used before the second. If you swap these two declarations, the compiler will display the following warning:
Warning: Pattern match(es) are overlapped In an equation for `factorial': factorial 0 = ...
The compiler warns you that the last pattern has been already matched by the previous declaration as given in
Also the datatype structure needs to be matched. Here, we will provide a "special" treatment of zero prices and only then a standard implementation,
It is also permitted to omit some parts that are not important for pattern matching, as it is done with
TimeStamp, and it is used in nested matching for deconstructing inner datatypes.
The term monad comes from the category theory, which is a branch of mathematics that formalizes mathematics itself, and its concepts as a collection of objects and arrows. But from the point of view of a Haskell programmer, it is better to think of a monad as an abstract datatype that represents computation and supports common semantics. Monad defines how to chain computations of this type together. Thus, it allows a programmer to build computation pipelines to process data in steps. You can find monadic characteristics in many programming languages. Sometimes they are named as "programmable semicolons" by analogy from imperative languages, where semicolons are used to chain individual statements. The practical side of monads is that they allow to hide state. In fact, they are required only for I/O code but usage of monads simplifies a code that passes the state around. Thus, we don't need to write functions that takes input and state and returns output and, probably, the new state.
A monad contains a type constructor and two operations,
return and bind (
return operation injects a plain value into the monad. The bind operation does the reverse process, that is, it extracts the value and passes it to the next function in the pipeline. But, it is worth mentioning that monads don't need to provide a way to get values outside them. The real-world definition of monad is extended by the fail operation and some shortcuts of the bind operation.
Prelude lists and
Maybe are also monads. We can use the
Maybe type as a primitive type of checked exception; that is, at any point if the computation fails, then the rest of computation should be skipped and return
Nothing, otherwise, if all steps are successful, then the final result should be
Just x for some value
Consider a simple example of summation of two
Maybe values. Let's try to write an explicit version of such function:
Here we define an explicit type of function; this function takes two numbers of type
Num class wrapped into the
Maybe monad and returns their sum in case of success. As you can see, it is a quite lengthy and ugly definition of such sum. We have to duplicate the same case logic twice for each argument.
Let's take a look at how
Maybe implements a monad type class in the
If the value is
Just x for some
x, bind is defined as
f applied to x. If the value is
Nothing, then nothing can be passed on to
f, and hence
Nothing is returned. This means that once a function in a chain of monadic
Maybe operations returns
Nothing, the whole chain will return
Just data constructor is a return operation. So let's rewrite summation using bind and return operations:
It becomes more cryptic, but it is definitely a shorter version of the same function. To avoid such secret messages, Haskell has a
do syntactic sugar for monads:
Being a pure functional language, Haskell requires a marker of impure functions and IO monad is that marker. The
Main function is an entry point for any Haskell program. It cannot be a pure function because it changes state of the "World", at least by creating a new process in OS. Let us take a look at its type:
IO () type signifies that there is a computation that performs an I/O operation and returns an empty result. There is really only one way to perform I/O in Haskell: Use it in the
main procedure of your program. GHCi allows direct execution of any I/O code. It does so only because all the execution is already carried inside the IO monad of GHCi. Also it is not possible to execute an I/O action from an arbitrary function, unless that function is in the IO monad and called from
main, directly or indirectly. Since IO is a monad that doesn't provide a way to extract a "real-world" component, once you get into the IO monad, you will be stuck with it. A pure function can be and will be always called from the I/O monad as program without inputs and outputs doesn't make any sense. But a pure function cannot call impure code. The do notation is used in the
main function, just as it is used for any other monad.
I/O functions for standard input/output device are always useful:
You can use
putStrLn to output a string to the standard output and
getLine to read a line from the standard input.
In this chapter, we learned a bit of Haskell history, made a quick overview of language basics, and figured out how to install the Haskell platform. In the next chapter, we will continue getting into more advanced Haskell features, such as template Haskell and quasiquotes, along with building a tick database from CSV sources.