Haskell Financial Data Modeling and Predictive Analytics

By Pavel Ryzhov
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this book

Haskell is one of the three most influential functional programming languages available today along with Lisp and Standard ML. When used for financial analysis, you can achieve a much-improved level of prediction and clear problem descriptions.

Haskell Financial Data Modeling and Predictive Analytics is a hands-on guide that employs a mix of theory and practice. Starting with the basics of Haskell, this book walks you through the mathematics involved and how this is implemented in Haskell.

The book starts with an introduction to the Haskell platform and the Glasgow Haskell Compiler (GHC). You will then learn about the basics of high frequency financial data mathematics as well as how to implement these mathematical algorithms in Haskell.

You will also learn about the most popular Haskell libraries and frameworks like Attoparsec, QuickCheck, and HMatrix. You will also become familiar with database access using Yesod’s Persistence library, allowing you to keep your data organized. The book then moves on to discuss the mathematics of counting processes and autoregressive conditional duration models, which are quite common modeling tools for high frequency tick data. At the end of the book, you will also learn about the volatility prediction technique.

With Haskell Financial Data Modeling and Predictive Analytics, you will learn everything you need to know about financial data modeling and predictive analytics using functional programming in Haskell.

Publication date:
October 2013
Publisher
Packt
Pages
112
ISBN
9781782169437

 

Chapter 1. Getting Started with the Haskell Platform

The first version of Haskell was standardized in 1990. After a series of intermediate standards, the minimal, stable, and portable version of the language was published as "The Haskell 98 Report" in February 1999. This successful standard was revised in 2003 and published as "Haskell 98 Language and Libraries: The Revised Report". This is the most supported version of the language and it is implemented in many compilers and interpreters of Haskell. The latest specification, Haskell 2010, adds Foreign Function Interface (FFI) for binding to other programming languages, fixes some syntax issues, and introduces several pluggable language extensions. Throughout this book, we will use Haskell 2010.

So what is Haskell? It is a fast, type-safe, purely functional programming language with a powerful type inference. Having said that, let us try to understand what it gives us.

First, a purely functional programming language means that, in general, functions in Haskell don't have side effects. There is a special type for impure functions, or functions with side effects.

Then, Haskell has a strong, static type system with an automatic and robust type inference. This, in practice, means that you do not usually need to specify types of functions and also the type checker does not allow passing incompatible types. In strongly typed languages, types are considered to be a specification, due to the Curry-Howard correspondence, the direct relationship between programs and mathematical proofs. Under this great simplification, the theorem states that if a value of the type exists (or is inhabited), then the corresponding mathematical proof is correct. Or jokingly saying, if a program compiles, then there is 99 percent probability that it works according to specification. Though the question if the types conform, the specification in natural language is still open; Haskell won't help you with it.

 

The Haskell platform


The glorious Glasgow Haskell Compilation System, or simply Glasgow Haskell Compiler (GHC), is the most widely used Haskell compiler. It is the current de facto standard. The compiler is packaged into the Haskell platform that follows Python's principle, "Batteries included". The platform is updated twice a year with new compilers and libraries. It usually includes a compiler, an interactive Read-Evaluate-Print Loop (REPL) interpreter, Haskell 98/2010 libraries (so-called Prelude) that includes most of the common definitions and functions, and a set of commonly used libraries. If you are on Windows or Mac OS X, it is strongly recommended to use prepackaged installers of the Haskell platform at http://www.haskell.org/platform/.

Many Linux distributions include the Haskell platform in their repositories. Though it becomes less maintained, as more and more developers tend to stay on the bleeding edge, it is better to just install the core. For instance, on Debian-based systems, you can get started by running the following command:

sudo apt-get install ghc cabal-install
cabal update

Even though Windows support is quite impressive, Linux/Mac OS X installation usually has a broader support because they include the free and complete C/C++ compiler tool chain, GCC. Although MinGW32 for Windows tries to provide such toolkit, it usually lacks a few GNU libraries to be completely compatible with the Linux environment. The missing 64-bit support on Windows might be crucial for data intensive applications. So for advanced development, you require a Unix-based machine.

For example, installation on Mac OS X 10.8 is quite straightforward and can be done as follows:

  1. Install the Command Line Tools for Xcode package from http://developer.apple.com to get GNU Compiler Chain (GCC). It is required for Haskell compiler to work.

  2. Download a 64-bit package of Haskell platform from http://haskell.org.

  3. Double-click on the package and the installer appears. It warns you that the Haskell platform requires GCC. Click on Continue, as shown in the following screenshot:

  4. The platform can be installed only for all users, so just click on Continue:

  5. Now, you can modify the location of installation. However, it is recommended to leave it to default:

  6. The installer asks for elevated credentials and you need to enter a correct password:

  7. The platform writes a lot of files and sets up the environment:

  8. The installation is now complete!

  9. Now, you can try to run the Haskell interpreter by running GHCi. And if you can see something similar to the following screenshot, this installation was successful:

After installing it, the following tools will be available:

  • The GHC compiler

  • An interactive GHCi interpreter

  • The packaging tool Cabal

But also you'll need a text editor. If you prefer a full-pledged IDE, then Leksah (http://leksah.org) might be a good choice. Otherwise, both Emacs and VIM have excellent support of Haskell including, but not limited to, syntax highlighting, code formatting, integrating with GHCi and type and tag browsers; even though, examples in this chapter require only a working interpreter.

 

Quick tour of Haskell


To start with development, first we should be familiar with a few basic features of Haskell. We really need to know about laziness, datatypes, pattern matching, type classes, and basic notion of monads to start with Haskell.

Laziness

Haskell is a language with lazy evaluation. From a programmer's point of view that means that the value is evaluated if and only if it is really needed. Imperative languages usually have a strict evaluation, that is, function arguments are evaluated before function application. To see the difference, let's take a look at a simple expression in Haskell:

let x = 2 + 3

In a strict or eager language, the 2 + 3 expression would be immediately evaluated to 5, whereas in Haskell, only a promise to do this evaluation will be created and the value will be evaluated only when it is needed. In other words, this statement just introduces definition of x which might be used afterwards, unlike in strict language where it is an operator that assigns the computed value, 5 to a memory cell named x. Also, this strategy allows sharing of evaluations, because laziness assumes that computations can be executed whenever it is needed and therefore, the result can be memorized. It might reduce the running time by exponential factor over strict evaluation.

Laziness also allows to manipulate with infinite data structures. For instance, we can construct an infinite list of natural numbers as follows:

let ns = [1..]

And moreover, we can manipulate it as if it is a normal list, even though some caution is needed, as you can get an infinite loop. We can take the first five elements of this infinite list by means of the built-in function, take as follows:

take 5 ns

By running this example in GHCi you will get [1,2,3,4,5].

Functions as first-class citizens

The notion of function is one of the core ideas in functional languages and Haskell is not an exception at all. The definition of a function includes a body of function and an optional type declaration. For instance, the take function is defined in Prelude as follows:

take :: Int -> [a] -> [a] 
take = ...

Here, the type declaration says that the function takes an integer as the argument and a list of objects of the a type, and returns a new list of the same type. Also Haskell allows partial application of a function. For example, you can construct a function that takes first five elements of the list as follows:

take5 :: [a] -> [a]
take5 = take 5

Also functions are themselves objects, that is, you may pass a function as an argument to another function. Prelude defines map function as a function of a function:

map :: (a -> b) -> [a] -> [b]

map takes a function and applies it to each element of the list. Thus functions are first-class citizens in the language and it is possible to manipulate with them as if they were normal objects.

Datatypes

Datatype is the core of a strongly-typed language as Haskell. The distinctive feature of Haskell datatypes is that they all are immutable; that is, after an object is constructed it cannot be changed. It might be weird for the first sight, but in the long run, it has several positive consequences. First, it enables computation parallelization. Second, all data is referentially transparent; that is, there is no difference between the reference to an object and the object itself. These two properties allow the compiler to reason about code optimization at a higher level than what the C/C++ compiler can.

Now, let's turn to our domain model to show how datatypes are defined in Haskell. Consider data model in a quote-driven market. The market maker posts bid and ask quotes. We can express these facts in Quote.hs.

The following are the three common ways to define datatypes in Haskell:

  • The first declaration just creates a synonym for an existing datatype and type checker won't prevent you from using Integer instead of TimeStamp. Also, you can use a value of the TimeStamp type with any function that expects to work with an Integer datatype.

  • The second declaration creates a new type for prices and you are not allowed to use Double instead of Price. The compiler will raise an error in such cases and thus it will enforce the correct usage.

  • The last declaration introduces Algebraic Data Type (ADT) and says that the Quote type might be constructed either by the AskQuote data constructor, or by BidQuote with timestamp and price as its parameters. The Quote itself is called as a type constructor.

Type constructor might be parameterized by type variables. For example, the Maybe and Either types quite oftenly used standard types, and are defined in the Quotes2.hs file. Here, a and b are type variables; that is, any type can be placed instead of them there. The Maybe type is often used to represent calculation that might have ended without results. If the logarithm is defined only on the positive half of the real line, we can define a type-safe version of the logarithm function in Log.hs. This version will always remind you to handle the NaN operation in your code. What is more, the compiler will make an exhaustive check to see if all paths are covered, and it will warn you if not. Please, also note that we used a new syntactic construction, a guard, to define conditions of function application. You can use any expression evaluating to Bool in guards. Also, the compiler checks to see if all the right-hand definitions have the same type, even if the type is not declared explicitly. It ensures that the function type is unambiguous.

The Either type is commonly used for functions that can result either with an error, or with a value, and the Right constructor used to hold "right" values in LogEither.hs.

In fact, data constructors are normal functions in Haskell that return or construct a new object. If you query GHCi about type of the Left or Right constructor, it will print out the following types:

Prelude> :t Left
Left :: a -> Either a b
Prelude> :t Right
Right :: b -> Either a b

This fact allows us to pass a data constructor as the function argument, create a partially "constructed" datatype, and everything else that is possible to do with common functions in Haskell.

Type classes

Type classes in Haskell are not classes as in object-oriented languages. It is more similar to the interfaces with optional implementation. You can find them in other languages as traits, mixins, and so on but unlike in them, this feature in Haskell enables ad-hoc polymorphism; that is, a function can be applied to arguments of different types. It is also known as function overloading, or operator overloading. A polymorphic function can specify different implementation for different types.

By principle, the type class consists of function declarations over some objects. The Eq type class is a standard type class that specifies how to compare two objects of the same type as given in Eq.hs.

Here, Eq is the name of the type class, a is a type variable and == and /= are the operations defined in the type class. This definition means that some type, a is of Eq class if it has defined the == and /= operation. Moreover, the definition provides default implementation of the /= operation. And if you decide to implement this class for a datatype, then you need to provide single operation implementation. For example, we might make Price as an instance of the Eq type class as given in EqPrice.hs.

We have just defined a quite naïve implementation of the Eq type class. With such definition, we can start to use functions that work over any type that implements the Eq type class. For instance, Prelude has the following functions for searching in lists that rely on the implementation of Eq:

elem :: (Eq a) => a -> [a] -> Bool
notElem :: (Eq a) => a -> [a] -> Bool
lookup :: (Eq a) => a -> [(a,b)] -> Maybe b

The constraint (Eq a) => requires the a type to have the Eq type class implemented. The first function checks if the element is in the list. The second one is a negation of an element. And the last one looks up for a key in the list of tuples. But it is important to note that type classes help to write very generic functions that work across many types just by putting a minimal required constraint on them. In object-oriented languages, it is usually implemented by interfaces, though Haskell allows you to extend library-defined datatype by implementing any type class.

There are a large number of useful type classes defined in Prelude. For example, Eq is used to define equality between two objects, Ord is used to specify total ordering, Show and Read are used to introduce a string representation of the object, and the Enum type class is used to describe enumerations, that is, datatypes with null constructors. It might be quite boring to implement this quite trivial but useful type classes for each datatype, so Haskell supports automatic derivation of most of the standard type classes. This is given in Price.hs.

Now, objects of the Price type can be converted from/to string by means of read/show functions and compared with themselves using equality operators. We will not cover all the details of the derivation. Those who want to know all the details can look them up in the Haskell language report.

Pattern matching

Pattern matching is a core feature that makes Haskell so concise and readable. In simple words, you should specify a pattern to which some input should conform and then provide implementation. Consider the classic example of factorial as given in Pattern1.hs.

We can see that the factorial function accepts objects that can be compared for equality, specified by the Eq type class, and are of type class Num, or numbers. Then, if the input is zero, the output is one. And after that for all the other inputs, the function is defined recursively, which means by itself. Recursion is an important construct in Haskell, because the language supports declarative description of computations; that is, you need to specify what should be computed but not how it should be done. That is why there is no for and while loops; you should use recursion instead of them. The compiler has optimization techniques for recursion, and it doesn't introduce a penalty that is usual for imperative languages in the form of stack overflows. Although imperative languages usually have a tail recursion optimization built into compilers, GHC supports a generalized tail recursion that covers a lot of cases. So it is usually better to pay more attention to readability and correctness of the recursion code than to achieve pure tail recursion.

The order of pattern matching is quite important; here we will explicitly state that the first rule should be used before the second. If you swap these two declarations, the compiler will display the following warning:

    Warning: Pattern match(es) are overlapped
             In an equation for `factorial': factorial 0 = ...

The compiler warns you that the last pattern has been already matched by the previous declaration as given in Pattern2.hs.

Also the datatype structure needs to be matched. Here, we will provide a "special" treatment of zero prices and only then a standard implementation, Pattern3.hs.

It is also permitted to omit some parts that are not important for pattern matching, as it is done with TimeStamp, and it is used in nested matching for deconstructing inner datatypes.

Monads

The term monad comes from the category theory, which is a branch of mathematics that formalizes mathematics itself, and its concepts as a collection of objects and arrows. But from the point of view of a Haskell programmer, it is better to think of a monad as an abstract datatype that represents computation and supports common semantics. Monad defines how to chain computations of this type together. Thus, it allows a programmer to build computation pipelines to process data in steps. You can find monadic characteristics in many programming languages. Sometimes they are named as "programmable semicolons" by analogy from imperative languages, where semicolons are used to chain individual statements. The practical side of monads is that they allow to hide state. In fact, they are required only for I/O code but usage of monads simplifies a code that passes the state around. Thus, we don't need to write functions that takes input and state and returns output and, probably, the new state.

A monad contains a type constructor and two operations, return and bind (>>=). The return operation injects a plain value into the monad. The bind operation does the reverse process, that is, it extracts the value and passes it to the next function in the pipeline. But, it is worth mentioning that monads don't need to provide a way to get values outside them. The real-world definition of monad is extended by the fail operation and some shortcuts of the bind operation.

MonadPrelude.hs

The Prelude lists and Maybe are also monads. We can use the Maybe type as a primitive type of checked exception; that is, at any point if the computation fails, then the rest of computation should be skipped and return Nothing, otherwise, if all steps are successful, then the final result should be Just x for some value x.

Consider a simple example of summation of two Maybe values. Let's try to write an explicit version of such function:

SumExplicit.hs

Here we define an explicit type of function; this function takes two numbers of type Num class wrapped into the Maybe monad and returns their sum in case of success. As you can see, it is a quite lengthy and ugly definition of such sum. We have to duplicate the same case logic twice for each argument.

Let's take a look at how Maybe implements a monad type class in the Prelude list:

MaybePrelude.hs

If the value is Just x for some x, bind is defined as f applied to x. If the value is Nothing, then nothing can be passed on to f, and hence Nothing is returned. This means that once a function in a chain of monadic Maybe operations returns Nothing, the whole chain will return Nothing. The Just data constructor is a return operation. So let's rewrite summation using bind and return operations:

SumMonad.hs

It becomes more cryptic, but it is definitely a shorter version of the same function. To avoid such secret messages, Haskell has a do syntactic sugar for monads:

SumClear.hs

Now it is significantly better. The rule of thumb is that you should remember that implicit bind operations are inserted between lines.

The IO monad

Being a pure functional language, Haskell requires a marker of impure functions and IO monad is that marker. The Main function is an entry point for any Haskell program. It cannot be a pure function because it changes state of the "World", at least by creating a new process in OS. Let us take a look at its type:

Main.hs

The IO () type signifies that there is a computation that performs an I/O operation and returns an empty result. There is really only one way to perform I/O in Haskell: Use it in the main procedure of your program. GHCi allows direct execution of any I/O code. It does so only because all the execution is already carried inside the IO monad of GHCi. Also it is not possible to execute an I/O action from an arbitrary function, unless that function is in the IO monad and called from main, directly or indirectly. Since IO is a monad that doesn't provide a way to extract a "real-world" component, once you get into the IO monad, you will be stuck with it. A pure function can be and will be always called from the I/O monad as program without inputs and outputs doesn't make any sense. But a pure function cannot call impure code. The do notation is used in the main function, just as it is used for any other monad.

I/O functions for standard input/output device are always useful:

StdIO.hs

You can use putStr/putStrLn to output a string to the standard output and getLine to read a line from the standard input.

 

Summary


In this chapter, we learned a bit of Haskell history, made a quick overview of language basics, and figured out how to install the Haskell platform. In the next chapter, we will continue getting into more advanced Haskell features, such as template Haskell and quasiquotes, along with building a tick database from CSV sources.

About the Author

  • Pavel Ryzhov

    Pavel Ryzhov has graduated from the Lomonosov Moscow State University in Russia in the field of mathematical physics, Toda equations and Lie algebras. In the past 10 years, he has worked as a Technical Lead and Senior Software Engineer. In the last three years, Pavel lead a startup company that mainly provided mathematical and web software development in Haskell. Also, he works on port of Quantlib, an HQuantLib project in his spare time.

    Browse publications by this author
Haskell Financial Data Modeling and Predictive Analytics
Unlock this book and the full library for FREE
Start free trial