# Haskell Data Analysis Cookbook

Formats:

save 15%!

save 37%!

**Free Shipping!**

Also available on: |

- A practical and concise guide to using Haskell when getting to grips with data analysis
- Recipes for every stage of data analysis, from collection to visualization
- In-depth examples demonstrating various tools, solutions and techniques

### Book Details

**Language :**English

**Paperback :**334 pages [ 235mm x 191mm ]

**Release Date :**June 2014

**ISBN :**1783286334

**ISBN 13 :**9781783286331

**Author(s) :**Nishant Shukla

**Topics and Technologies :**All Books, Big Data and Business Intelligence, Cookbooks, Open Source

## Table of Contents

PrefaceChapter 1: The Hunt for Data

Chapter 2: Integrity and Inspection

Chapter 3: The Science of Words

Chapter 4: Data Hashing

Chapter 5: The Dance with Trees

Chapter 6: Graph Fundamentals

Chapter 7: Statistics and Analysis

Chapter 8: Clustering and Classification

Chapter 9: Parallel and Concurrent Design

Chapter 10: Real-time Data

Chapter 11: Visualizing Data

Chapter 12: Exporting and Presenting

Index

- Chapter 1: The Hunt for Data
- Introduction
- Harnessing data from various sources
- Accumulating text data from a file path
- Catching I/O code faults
- Keeping and representing data from a CSV file
- Examining a JSON file with the aeson package
- Reading an XML file using the HXT package
- Capturing table rows from an HTML page
- Understanding how to perform HTTP GET requests
- Learning how to perform HTTP POST requests
- Traversing online directories for data
- Using MongoDB queries in Haskell
- Reading from a remote MongoDB server
- Exploring data from a SQLite database

- Chapter 2: Integrity and Inspection
- Introduction
- Trimming excess whitespace
- Ignoring punctuation and specific characters
- Coping with unexpected or missing input
- Validating records by matching regular expressions
- Lexing and parsing an e-mail address
- Deduplication of nonconflicting data items
- Deduplication of conflicting data items
- Implementing a frequency table using Data.List
- Implementing a frequency table using Data.MultiSet
- Computing the Manhattan distance
- Computing the Euclidean distance
- Comparing scaled data using the Pearson correlation coefficient
- Comparing sparse data using cosine similarity

- Chapter 3: The Science of Words
- Introduction
- Displaying a number in another base
- Reading a number from another base
- Searching for a substring using Data.ByteString
- Searching a string using the Boyer-Moore-Horspool algorithm
- Searching a string using the Rabin-Karp algorithm
- Splitting a string on lines, words, or arbitrary tokens
- Finding the longest common subsequence
- Computing a phonetic code
- Computing the edit distance
- Computing the Jaro-Winkler distance between two strings
- Finding strings within one-edit distance
- Fixing spelling mistakes

- Chapter 4: Data Hashing
- Introduction
- Hashing a primitive data type
- Hashing a custom data type
- Running popular cryptographic hash functions
- Running a cryptographic checksum on a file
- Performing fast comparisons between data types
- Using a high-performance hash table
- Using Google's CityHash hash functions for strings
- Computing a Geohash for location coordinates
- Using a bloom filter to remove unique items
- Running MurmurHash, a simple but speedy hashing algorithm
- Measuring image similarity with perceptual hashes

- Chapter 5: The Dance with Trees
- Introduction
- Defining a binary tree data type
- Defining a rose tree (multiway tree) data type
- Traversing a tree depth-first
- Traversing a tree breadth-first
- Implementing a Foldable instance for a tree
- Calculating the height of a tree
- Implementing a binary search tree data structure
- Verifying the order property of a binary search tree
- Using a self-balancing tree
- Implementing a min-heap data structure
- Encoding a string using a Huffman tree
- Decoding a Huffman code

- Chapter 6: Graph Fundamentals
- Introduction
- Representing a graph from a list of edges
- Representing a graph from an adjacency list
- Conducting a topological sort on a graph
- Traversing a graph depth-first
- Traversing a graph breadth-first
- Visualizing a graph using Graphviz
- Using Directed Acyclic Word Graphs
- Working with hexagonal and square grid networks
- Finding maximal cliques in a graph
- Determining whether any two graphs are isomorphic

- Chapter 7: Statistics and Analysis
- Introduction
- Calculating a moving average
- Calculating a moving median
- Approximating a linear regression
- Approximating a quadratic regression
- Obtaining the covariance matrix from samples
- Finding all unique pairings in a list
- Using the Pearson correlation coefficient
- Evaluating a Bayesian network
- Creating a data structure for playing cards
- Using a Markov chain to generate text
- Creating n-grams from a list
- Creating a neural network perceptron

- Chapter 8: Clustering and Classification
- Introduction
- Implementing the k-means clustering algorithm
- Implementing hierarchical clustering
- Using a hierarchical clustering library
- Finding the number of clusters
- Clustering words by their lexemes
- Classifying the parts of speech of words
- Identifying key words in a corpus of text
- Training a parts-of-speech tagger
- Implementing a decision tree classifier
- Implementing a k-Nearest Neighbors classifier
- Visualizing points using Graphics.EasyPlot

- Chapter 9: Parallel and Concurrent Design
- Introduction
- Using the Haskell Runtime System options
- Evaluating a procedure in parallel
- Controlling parallel algorithms in sequence
- Forking I/O actions for concurrency
- Communicating with a forked I/O action
- Killing forked threads
- Parallelizing pure functions using the Par monad
- Mapping over a list in parallel
- Accessing tuple elements in parallel
- Implementing MapReduce to count word frequencies
- Manipulating images in parallel using Repa
- Benchmarking runtime performance in Haskell
- Using the criterion package to measure performance
- Benchmarking runtime performance in the terminal

- Chapter 10: Real-time Data
- Introduction
- Streaming Twitter for real-time sentiment analysis
- Reading IRC chat room messages
- Responding to IRC messages
- Polling a web server for latest updates
- Detecting real-time file directory changes
- Communicating in real time through sockets
- Detecting faces and eyes through a camera stream
- Streaming camera frames for template matching

- Chapter 11: Visualizing Data
- Introduction
- Plotting a line chart using Google's Chart API
- Plotting a pie chart using Google's Chart API
- Plotting bar graphs using Google's Chart API
- Displaying a line graph using gnuplot
- Displaying a scatter plot of two-dimensional points
- Interacting with points in a three-dimensional space
- Visualizing a graph network
- Customizing the looks of a graph network diagram
- Rendering a bar graph in JavaScript using D3.js
- Rendering a scatter plot in JavaScript using D3.js
- Diagramming a path from a list of vectors

- Chapter 12: Exporting and Presenting
- Introduction
- Exporting data to a CSV file
- Exporting data as JSON
- Using SQLite to store data
- Saving data to a MongoDB database
- Presenting results in an HTML web page
- Creating a LaTeX table to display results
- Personalizing messages using a text template
- Exporting matrix values to a file

### Nishant Shukla

### Code Downloads

Download the code and support files for this book.

### Submit Errata

Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.

### Errata

- 1 submitted: last submission 02 Jul 2014Type: Code | Page:17

**age [a,b] = toInt a** must be **age [a,b] = toInt b**

### Sample chapters

You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

- Obtain and analyze raw data from various sources including text files, CSV files, databases, and websites
- Implement practical tree and graph algorithms on various datasets
- Apply statistical methods such as moving average and linear regression to understand patterns
- Fiddle with parallel and concurrent code to speed up and simplify time-consuming algorithms
- Find clusters in data using some of the most popular machine learning algorithms
- Manage results by visualizing or exporting data

This book will take you on a voyage through all the steps involved in data analysis. It provides synergy between Haskell and data modeling, consisting of carefully chosen examples featuring some of the most popular machine learning techniques.

You will begin with how to obtain and clean data from various sources. You will then learn how to use various data structures such as trees and graphs. The meat of data analysis occurs in the topics involving statistical techniques, parallelism, concurrency, and machine learning algorithms, along with various examples of visualizing and exporting results. By the end of the book, you will be empowered with techniques to maximize your potential when using Haskell for data analysis.

Step-by-step recipes filled with practical code samples and engaging examples demonstrate Haskell in practice, and then the concepts behind the code.

This book shows functional developers and analysts how to leverage their existing knowledge of Haskell specifically for high-quality data analysis. A good understanding of data sets and functional programming is assumed.