Haskell Data Analysis Cookbook

Haskell Data Analysis Cookbook
eBook: $32.99
Formats: PDF, PacktLib, ePub and Mobi formats
save 15%!
Print + free eBook + free PacktLib access to the book: $87.98    Print cover: $54.99
save 6%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Table of Contents
Sample Chapters
  • A practical and concise guide to using Haskell when getting to grips with data analysis
  • Recipes for every stage of data analysis, from collection to visualization
  • In-depth examples demonstrating various tools, solutions and techniques

Book Details

Language : English
Paperback : 334 pages [ 235mm x 191mm ]
Release Date : June 2014
ISBN : 1783286334
ISBN 13 : 9781783286331
Author(s) : Nishant Shukla
Topics and Technologies : All Books, Big Data and Business Intelligence, Cookbooks, Open Source

Table of Contents

Chapter 1: The Hunt for Data
Chapter 2: Integrity and Inspection
Chapter 3: The Science of Words
Chapter 4: Data Hashing
Chapter 5: The Dance with Trees
Chapter 6: Graph Fundamentals
Chapter 7: Statistics and Analysis
Chapter 8: Clustering and Classification
Chapter 9: Parallel and Concurrent Design
Chapter 10: Real-time Data
Chapter 11: Visualizing Data
Chapter 12: Exporting and Presenting
  • Chapter 1: The Hunt for Data
    • Introduction
    • Harnessing data from various sources
    • Accumulating text data from a file path
    • Catching I/O code faults
    • Keeping and representing data from a CSV file
    • Examining a JSON file with the aeson package
    • Reading an XML file using the HXT package
    • Capturing table rows from an HTML page
    • Understanding how to perform HTTP GET requests
    • Learning how to perform HTTP POST requests
    • Traversing online directories for data
    • Using MongoDB queries in Haskell
    • Reading from a remote MongoDB server
    • Exploring data from a SQLite database
  • Chapter 2: Integrity and Inspection
    • Introduction
    • Trimming excess whitespace
    • Ignoring punctuation and specific characters
    • Coping with unexpected or missing input
    • Validating records by matching regular expressions
    • Lexing and parsing an e-mail address
    • Deduplication of nonconflicting data items
    • Deduplication of conflicting data items
    • Implementing a frequency table using Data.List
    • Implementing a frequency table using Data.MultiSet
    • Computing the Manhattan distance
    • Computing the Euclidean distance
    • Comparing scaled data using the Pearson correlation coefficient
    • Comparing sparse data using cosine similarity
  • Chapter 3: The Science of Words
    • Introduction
    • Displaying a number in another base
    • Reading a number from another base
    • Searching for a substring using Data.ByteString
    • Searching a string using the Boyer-Moore-Horspool algorithm
    • Searching a string using the Rabin-Karp algorithm
    • Splitting a string on lines, words, or arbitrary tokens
    • Finding the longest common subsequence
    • Computing a phonetic code
    • Computing the edit distance
    • Computing the Jaro-Winkler distance between two strings
    • Finding strings within one-edit distance
    • Fixing spelling mistakes
  • Chapter 4: Data Hashing
    • Introduction
    • Hashing a primitive data type
    • Hashing a custom data type
    • Running popular cryptographic hash functions
    • Running a cryptographic checksum on a file
    • Performing fast comparisons between data types
    • Using a high-performance hash table
    • Using Google's CityHash hash functions for strings
    • Computing a Geohash for location coordinates
    • Using a bloom filter to remove unique items
    • Running MurmurHash, a simple but speedy hashing algorithm
    • Measuring image similarity with perceptual hashes
  • Chapter 5: The Dance with Trees
    • Introduction
    • Defining a binary tree data type
    • Defining a rose tree (multiway tree) data type
    • Traversing a tree depth-first
    • Traversing a tree breadth-first
    • Implementing a Foldable instance for a tree
    • Calculating the height of a tree
    • Implementing a binary search tree data structure
    • Verifying the order property of a binary search tree
    • Using a self-balancing tree
    • Implementing a min-heap data structure
    • Encoding a string using a Huffman tree
    • Decoding a Huffman code
  • Chapter 6: Graph Fundamentals
    • Introduction
    • Representing a graph from a list of edges
    • Representing a graph from an adjacency list
    • Conducting a topological sort on a graph
    • Traversing a graph depth-first
    • Traversing a graph breadth-first
    • Visualizing a graph using Graphviz
    • Using Directed Acyclic Word Graphs
    • Working with hexagonal and square grid networks
    • Finding maximal cliques in a graph
    • Determining whether any two graphs are isomorphic
  • Chapter 7: Statistics and Analysis
    • Introduction
    • Calculating a moving average
    • Calculating a moving median
    • Approximating a linear regression
    • Approximating a quadratic regression
    • Obtaining the covariance matrix from samples
    • Finding all unique pairings in a list
    • Using the Pearson correlation coefficient
    • Evaluating a Bayesian network
    • Creating a data structure for playing cards
    • Using a Markov chain to generate text
    • Creating n-grams from a list
    • Creating a neural network perceptron
  • Chapter 8: Clustering and Classification
    • Introduction
    • Implementing the k-means clustering algorithm
    • Implementing hierarchical clustering
    • Using a hierarchical clustering library
    • Finding the number of clusters
    • Clustering words by their lexemes
    • Classifying the parts of speech of words
    • Identifying key words in a corpus of text
    • Training a parts-of-speech tagger
    • Implementing a decision tree classifier
    • Implementing a k-Nearest Neighbors classifier
    • Visualizing points using Graphics.EasyPlot
  • Chapter 9: Parallel and Concurrent Design
    • Introduction
    • Using the Haskell Runtime System options
    • Evaluating a procedure in parallel
    • Controlling parallel algorithms in sequence
    • Forking I/O actions for concurrency
    • Communicating with a forked I/O action
    • Killing forked threads
    • Parallelizing pure functions using the Par monad
    • Mapping over a list in parallel
    • Accessing tuple elements in parallel
    • Implementing MapReduce to count word frequencies
    • Manipulating images in parallel using Repa
    • Benchmarking runtime performance in Haskell
    • Using the criterion package to measure performance
    • Benchmarking runtime performance in the terminal
  • Chapter 10: Real-time Data
    • Introduction
    • Streaming Twitter for real-time sentiment analysis
    • Reading IRC chat room messages
    • Responding to IRC messages
    • Polling a web server for latest updates
    • Detecting real-time file directory changes
    • Communicating in real time through sockets
    • Detecting faces and eyes through a camera stream
    • Streaming camera frames for template matching
  • Chapter 11: Visualizing Data
    • Introduction
    • Plotting a line chart using Google's Chart API
    • Plotting a pie chart using Google's Chart API
    • Plotting bar graphs using Google's Chart API
    • Displaying a line graph using gnuplot
    • Displaying a scatter plot of two-dimensional points
    • Interacting with points in a three-dimensional space
    • Visualizing a graph network
    • Customizing the looks of a graph network diagram
    • Rendering a bar graph in JavaScript using D3.js
    • Rendering a scatter plot in JavaScript using D3.js
    • Diagramming a path from a list of vectors
  • Chapter 12: Exporting and Presenting
    • Introduction
    • Exporting data to a CSV file
    • Exporting data as JSON
    • Using SQLite to store data
    • Saving data to a MongoDB database
    • Presenting results in an HTML web page
    • Creating a LaTeX table to display results
    • Personalizing messages using a text template
    • Exporting matrix values to a file

Nishant Shukla

Nishant Shukla is a computer scientist with a passion for mathematics. Throughout the years, he has worked for a handful of start-ups and large corporations including WillowTree Apps, Microsoft, Facebook, and Foursquare.

Stepping into the world of Haskell was his excuse for better understanding Category Theory at first, but eventually, he found himself immersed in the language. His semester-long introductory Haskell course in the engineering school at the University of Virginia (http://shuklan.com/haskell) has been accessed by individuals from over 154 countries around the world, gathering over 45,000 unique visitors.

Besides Haskell, he is a proponent of decentralized Internet and open source software. His academic research in the fields of Machine Learning, Neural Networks, and Computer Vision aim to supply a fundamental contribution to the world of computing.

Sorry, we don't have any reviews for this title yet.

Code Downloads

Download the code and support files for this book.

Submit Errata

Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.


- 1 submitted: last submission 02 Jul 2014

Type: Code     |     Page:17


age [a,b] = toInt a must be age [a,b] = toInt b

Sample chapters

You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

Frequently bought together

Haskell Data Analysis Cookbook +    Learning Dart =
50% Off
the second eBook
Price for both: $47.10

Buy both these recommended eBooks together and get 50% off the cheapest eBook.

What you will learn from this book

  • Obtain and analyze raw data from various sources including text files, CSV files, databases, and websites
  • Implement practical tree and graph algorithms on various datasets
  • Apply statistical methods such as moving average and linear regression to understand patterns
  • Fiddle with parallel and concurrent code to speed up and simplify time-consuming algorithms
  • Find clusters in data using some of the most popular machine learning algorithms
  • Manage results by visualizing or exporting data

In Detail

This book will take you on a voyage through all the steps involved in data analysis. It provides synergy between Haskell and data modeling, consisting of carefully chosen examples featuring some of the most popular machine learning techniques.

You will begin with how to obtain and clean data from various sources. You will then learn how to use various data structures such as trees and graphs. The meat of data analysis occurs in the topics involving statistical techniques, parallelism, concurrency, and machine learning algorithms, along with various examples of visualizing and exporting results. By the end of the book, you will be empowered with techniques to maximize your potential when using Haskell for data analysis.


Step-by-step recipes filled with practical code samples and engaging examples demonstrate Haskell in practice, and then the concepts behind the code.

Who this book is for

This book shows functional developers and analysts how to leverage their existing knowledge of Haskell specifically for high-quality data analysis. A good understanding of data sets and functional programming is assumed.

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software