Haskell Data Analysis Cookbook

Explore intuitive data analysis techniques and powerful machine learning methods using over 130 practical recipes

Haskell Data Analysis Cookbook

Cookbook
Nishant Shukla

Explore intuitive data analysis techniques and powerful machine learning methods using over 130 practical recipes
$32.99
$54.99
RRP $32.99
RRP $54.99
eBook
Print + eBook
$12.99 p/month

Want this title & more? Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.
+ Collection
Free Sample

Book Details

ISBN 139781783286331
Paperback334 pages

About This Book

  • A practical and concise guide to using Haskell when getting to grips with data analysis
  • Recipes for every stage of data analysis, from collection to visualization
  • In-depth examples demonstrating various tools, solutions and techniques

Who This Book Is For

This book shows functional developers and analysts how to leverage their existing knowledge of Haskell specifically for high-quality data analysis. A good understanding of data sets and functional programming is assumed.

Table of Contents

Chapter 1: The Hunt for Data
Introduction
Harnessing data from various sources
Accumulating text data from a file path
Catching I/O code faults
Keeping and representing data from a CSV file
Examining a JSON file with the aeson package
Reading an XML file using the HXT package
Capturing table rows from an HTML page
Understanding how to perform HTTP GET requests
Learning how to perform HTTP POST requests
Traversing online directories for data
Using MongoDB queries in Haskell
Reading from a remote MongoDB server
Exploring data from a SQLite database
Chapter 2: Integrity and Inspection
Introduction
Trimming excess whitespace
Ignoring punctuation and specific characters
Coping with unexpected or missing input
Validating records by matching regular expressions
Lexing and parsing an e-mail address
Deduplication of nonconflicting data items
Deduplication of conflicting data items
Implementing a frequency table using Data.List
Implementing a frequency table using Data.MultiSet
Computing the Manhattan distance
Computing the Euclidean distance
Comparing scaled data using the Pearson correlation coefficient
Comparing sparse data using cosine similarity
Chapter 3: The Science of Words
Introduction
Displaying a number in another base
Reading a number from another base
Searching for a substring using Data.ByteString
Searching a string using the Boyer-Moore-Horspool algorithm
Searching a string using the Rabin-Karp algorithm
Splitting a string on lines, words, or arbitrary tokens
Finding the longest common subsequence
Computing a phonetic code
Computing the edit distance
Computing the Jaro-Winkler distance between two strings
Finding strings within one-edit distance
Fixing spelling mistakes
Chapter 4: Data Hashing
Introduction
Hashing a primitive data type
Hashing a custom data type
Running popular cryptographic hash functions
Running a cryptographic checksum on a file
Performing fast comparisons between data types
Using a high-performance hash table
Using Google's CityHash hash functions for strings
Computing a Geohash for location coordinates
Using a bloom filter to remove unique items
Running MurmurHash, a simple but speedy hashing algorithm
Measuring image similarity with perceptual hashes
Chapter 5: The Dance with Trees
Introduction
Defining a binary tree data type
Defining a rose tree (multiway tree) data type
Traversing a tree depth-first
Traversing a tree breadth-first
Implementing a Foldable instance for a tree
Calculating the height of a tree
Implementing a binary search tree data structure
Verifying the order property of a binary search tree
Using a self-balancing tree
Implementing a min-heap data structure
Encoding a string using a Huffman tree
Decoding a Huffman code
Chapter 6: Graph Fundamentals
Introduction
Representing a graph from a list of edges
Representing a graph from an adjacency list
Conducting a topological sort on a graph
Traversing a graph depth-first
Traversing a graph breadth-first
Visualizing a graph using Graphviz
Using Directed Acyclic Word Graphs
Working with hexagonal and square grid networks
Finding maximal cliques in a graph
Determining whether any two graphs are isomorphic
Chapter 7: Statistics and Analysis
Introduction
Calculating a moving average
Calculating a moving median
Approximating a linear regression
Approximating a quadratic regression
Obtaining the covariance matrix from samples
Finding all unique pairings in a list
Using the Pearson correlation coefficient
Evaluating a Bayesian network
Creating a data structure for playing cards
Using a Markov chain to generate text
Creating n-grams from a list
Creating a neural network perceptron
Chapter 8: Clustering and Classification
Introduction
Implementing the k-means clustering algorithm
Implementing hierarchical clustering
Using a hierarchical clustering library
Finding the number of clusters
Clustering words by their lexemes
Classifying the parts of speech of words
Identifying key words in a corpus of text
Training a parts-of-speech tagger
Implementing a decision tree classifier
Implementing a k-Nearest Neighbors classifier
Visualizing points using Graphics.EasyPlot
Chapter 9: Parallel and Concurrent Design
Introduction
Using the Haskell Runtime System options
Evaluating a procedure in parallel
Controlling parallel algorithms in sequence
Forking I/O actions for concurrency
Communicating with a forked I/O action
Killing forked threads
Parallelizing pure functions using the Par monad
Mapping over a list in parallel
Accessing tuple elements in parallel
Implementing MapReduce to count word frequencies
Manipulating images in parallel using Repa
Benchmarking runtime performance in Haskell
Using the criterion package to measure performance
Benchmarking runtime performance in the terminal
Chapter 10: Real-time Data
Introduction
Streaming Twitter for real-time sentiment analysis
Reading IRC chat room messages
Responding to IRC messages
Polling a web server for latest updates
Detecting real-time file directory changes
Communicating in real time through sockets
Detecting faces and eyes through a camera stream
Streaming camera frames for template matching
Chapter 11: Visualizing Data
Introduction
Plotting a line chart using Google's Chart API
Plotting a pie chart using Google's Chart API
Plotting bar graphs using Google's Chart API
Displaying a line graph using gnuplot
Displaying a scatter plot of two-dimensional points
Interacting with points in a three-dimensional space
Visualizing a graph network
Customizing the looks of a graph network diagram
Rendering a bar graph in JavaScript using D3.js
Rendering a scatter plot in JavaScript using D3.js
Diagramming a path from a list of vectors
Chapter 12: Exporting and Presenting
Introduction
Exporting data to a CSV file
Exporting data as JSON
Using SQLite to store data
Saving data to a MongoDB database
Presenting results in an HTML web page
Creating a LaTeX table to display results
Personalizing messages using a text template
Exporting matrix values to a file

What You Will Learn

  • Obtain and analyze raw data from various sources including text files, CSV files, databases, and websites
  • Implement practical tree and graph algorithms on various datasets
  • Apply statistical methods such as moving average and linear regression to understand patterns
  • Fiddle with parallel and concurrent code to speed up and simplify time-consuming algorithms
  • Find clusters in data using some of the most popular machine learning algorithms
  • Manage results by visualizing or exporting data

In Detail

This book will take you on a voyage through all the steps involved in data analysis. It provides synergy between Haskell and data modeling, consisting of carefully chosen examples featuring some of the most popular machine learning techniques.

You will begin with how to obtain and clean data from various sources. You will then learn how to use various data structures such as trees and graphs. The meat of data analysis occurs in the topics involving statistical techniques, parallelism, concurrency, and machine learning algorithms, along with various examples of visualizing and exporting results. By the end of the book, you will be empowered with techniques to maximize your potential when using Haskell for data analysis.

Authors

Table of Contents

Chapter 1: The Hunt for Data
Introduction
Harnessing data from various sources
Accumulating text data from a file path
Catching I/O code faults
Keeping and representing data from a CSV file
Examining a JSON file with the aeson package
Reading an XML file using the HXT package
Capturing table rows from an HTML page
Understanding how to perform HTTP GET requests
Learning how to perform HTTP POST requests
Traversing online directories for data
Using MongoDB queries in Haskell
Reading from a remote MongoDB server
Exploring data from a SQLite database
Chapter 2: Integrity and Inspection
Introduction
Trimming excess whitespace
Ignoring punctuation and specific characters
Coping with unexpected or missing input
Validating records by matching regular expressions
Lexing and parsing an e-mail address
Deduplication of nonconflicting data items
Deduplication of conflicting data items
Implementing a frequency table using Data.List
Implementing a frequency table using Data.MultiSet
Computing the Manhattan distance
Computing the Euclidean distance
Comparing scaled data using the Pearson correlation coefficient
Comparing sparse data using cosine similarity
Chapter 3: The Science of Words
Introduction
Displaying a number in another base
Reading a number from another base
Searching for a substring using Data.ByteString
Searching a string using the Boyer-Moore-Horspool algorithm
Searching a string using the Rabin-Karp algorithm
Splitting a string on lines, words, or arbitrary tokens
Finding the longest common subsequence
Computing a phonetic code
Computing the edit distance
Computing the Jaro-Winkler distance between two strings
Finding strings within one-edit distance
Fixing spelling mistakes
Chapter 4: Data Hashing
Introduction
Hashing a primitive data type
Hashing a custom data type
Running popular cryptographic hash functions
Running a cryptographic checksum on a file
Performing fast comparisons between data types
Using a high-performance hash table
Using Google's CityHash hash functions for strings
Computing a Geohash for location coordinates
Using a bloom filter to remove unique items
Running MurmurHash, a simple but speedy hashing algorithm
Measuring image similarity with perceptual hashes
Chapter 5: The Dance with Trees
Introduction
Defining a binary tree data type
Defining a rose tree (multiway tree) data type
Traversing a tree depth-first
Traversing a tree breadth-first
Implementing a Foldable instance for a tree
Calculating the height of a tree
Implementing a binary search tree data structure
Verifying the order property of a binary search tree
Using a self-balancing tree
Implementing a min-heap data structure
Encoding a string using a Huffman tree
Decoding a Huffman code
Chapter 6: Graph Fundamentals
Introduction
Representing a graph from a list of edges
Representing a graph from an adjacency list
Conducting a topological sort on a graph
Traversing a graph depth-first
Traversing a graph breadth-first
Visualizing a graph using Graphviz
Using Directed Acyclic Word Graphs
Working with hexagonal and square grid networks
Finding maximal cliques in a graph
Determining whether any two graphs are isomorphic
Chapter 7: Statistics and Analysis
Introduction
Calculating a moving average
Calculating a moving median
Approximating a linear regression
Approximating a quadratic regression
Obtaining the covariance matrix from samples
Finding all unique pairings in a list
Using the Pearson correlation coefficient
Evaluating a Bayesian network
Creating a data structure for playing cards
Using a Markov chain to generate text
Creating n-grams from a list
Creating a neural network perceptron
Chapter 8: Clustering and Classification
Introduction
Implementing the k-means clustering algorithm
Implementing hierarchical clustering
Using a hierarchical clustering library
Finding the number of clusters
Clustering words by their lexemes
Classifying the parts of speech of words
Identifying key words in a corpus of text
Training a parts-of-speech tagger
Implementing a decision tree classifier
Implementing a k-Nearest Neighbors classifier
Visualizing points using Graphics.EasyPlot
Chapter 9: Parallel and Concurrent Design
Introduction
Using the Haskell Runtime System options
Evaluating a procedure in parallel
Controlling parallel algorithms in sequence
Forking I/O actions for concurrency
Communicating with a forked I/O action
Killing forked threads
Parallelizing pure functions using the Par monad
Mapping over a list in parallel
Accessing tuple elements in parallel
Implementing MapReduce to count word frequencies
Manipulating images in parallel using Repa
Benchmarking runtime performance in Haskell
Using the criterion package to measure performance
Benchmarking runtime performance in the terminal
Chapter 10: Real-time Data
Introduction
Streaming Twitter for real-time sentiment analysis
Reading IRC chat room messages
Responding to IRC messages
Polling a web server for latest updates
Detecting real-time file directory changes
Communicating in real time through sockets
Detecting faces and eyes through a camera stream
Streaming camera frames for template matching
Chapter 11: Visualizing Data
Introduction
Plotting a line chart using Google's Chart API
Plotting a pie chart using Google's Chart API
Plotting bar graphs using Google's Chart API
Displaying a line graph using gnuplot
Displaying a scatter plot of two-dimensional points
Interacting with points in a three-dimensional space
Visualizing a graph network
Customizing the looks of a graph network diagram
Rendering a bar graph in JavaScript using D3.js
Rendering a scatter plot in JavaScript using D3.js
Diagramming a path from a list of vectors
Chapter 12: Exporting and Presenting
Introduction
Exporting data to a CSV file
Exporting data as JSON
Using SQLite to store data
Saving data to a MongoDB database
Presenting results in an HTML web page
Creating a LaTeX table to display results
Personalizing messages using a text template
Exporting matrix values to a file

Book Details

ISBN 139781783286331
Paperback334 pages
Read More

Recommended for You

Haskell Financial Data Modeling and Predictive Analytics
$ 23.99
Learning Haskell Data Analysis
$ 16.00
Mastering Machine Learning with scikit-learn
$ 26.99