Free Sample
+ Collection

Clojure Data Analysis Cookbook

Eric Rochester

Make more of your data using Clojure and this brilliant cookbook full of real-world recipes. From creating revealing graphs to using data analysis libraries, you’ll learn both the basics and advanced techniques.
RRP $32.99
RRP $54.99
Print + eBook

Want this title & more?

$12.99 p/month

Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

Book Details

ISBN 139781782162643
Paperback342 pages

About This Book

  • Get a handle on the torrent of data the modern Internet has created
  • Recipes for every stage from collection to analysis
  • A practical approach to analyzing data to help you make informed decisions

Who This Book Is For

Prior experience with Clojure and data analysis techniques and workflows will be beneficial, but not essential.

Table of Contents

Chapter 1: Importing Data for Analysis
Creating a new project
Reading CSV data into Incanter datasets
Reading JSON data into Incanter datasets
Reading data from Excel with Incanter
Reading data from JDBC databases
Reading XML data into Incanter datasets
Scraping data from tables in web pages
Scraping textual data from web pages
Reading RDF data
Reading RDF data with SPARQL
Aggregating data from different formats
Chapter 2: Cleaning and Validating Data
Cleaning data with regular expressions
Maintaining consistency with synonym maps
Identifying and removing duplicate data
Normalizing numbers
Rescaling values
Normalizing dates and times
Lazily processing very large data sets
Sampling from very large data sets
Fixing spelling errors
Parsing custom data formats
Validating data with Valip
Chapter 3: Managing Complexity with Concurrent Programming
Managing program complexity with STM
Managing program complexity with agents
Getting better performance with commute
Combining agents and STM
Maintaining consistency with ensure
Introducing safe side effects into the STM
Maintaining data consistency with validators
Tracking processing with watchers
Debugging concurrent programs with watchers
Recovering from errors in agents
Managing input with sized queues
Chapter 4: Improving Performance with Parallel Programming
Parallelizing processing with pmap
Parallelizing processing with Incanter
Partitioning Monte Carlo simulations for better pmap performance
Finding the optimal partition size with simulated annealing
Parallelizing with reducers
Generating online summary statistics with reducers
Harnessing your GPU with OpenCL and Calx
Using type hints
Benchmarking with Criterium
Chapter 5: Distributed Data Processing with Cascalog
Distributed processing with Cascalog and Hadoop
Querying data with Cascalog
Distributing data with Apache HDFS
Parsing CSV files with Cascalog
Complex queries with Cascalog
Aggregating data with Cascalog
Defining new Cascalog operators
Composing Cascalog queries
Handling errors in Cascalog workflows
Transforming data with Cascalog
Executing Cascalog queries in the Cloud with Pallet
Chapter 6: Working with Incanter Datasets
Loading Incanter's sample datasets
Loading Clojure data structures into datasets
Viewing datasets interactively with view
Converting datasets to matrices
Using infix formulas in Incanter
Selecting columns with $
Selecting rows with $
Filtering datasets with $where
Grouping data with $group-by
Saving datasets to CSV and JSON
Projecting from multiple datasets with $join
Chapter 7: Preparing for and Performing Statistical Data Analysis with Incanter
Generating summary statistics with $rollup
Differencing variables to show changes
Scaling variables to simplify variable relationships
Working with time series data with Incanter Zoo
Smoothing variables to decrease noise
Validating sample statistics with bootstrapping
Modeling linear relationships
Modeling non-linear relationships
Modeling multimodal Bayesian distributions
Finding data errors with Benford's law
Chapter 8: Working with Mathematica and R
Setting up Mathematica to talk to Clojuratica for Mac OS X and Linux
Setting up Mathematica to talk to Clojuratica for Windows
Calling Mathematica functions from Clojuratica
Sending matrices to Mathematica from Clojuratica
Evaluating Mathematica scripts from Clojuratica
Creating functions from Mathematica
Processing functions in parallel in Mathematica
Setting up R to talk to Clojure
Calling R functions from Clojure
Passing vectors into R
Evaluating R files from Clojure
Plotting in R from Clojure
Chapter 9: Clustering, Classifying, and Working with Weka
Loading CSV and ARFF files into Weka
Filtering and renaming columns in Weka datasets
Discovering groups of data using K-means clustering
Finding hierarchical clusters in Weka
Clustering with SOMs in Incanter
Classifying data with decision trees
Classifying data with the Naive Bayesian classifier
Classifying data with support vector machines
Finding associations in data with the Apriori algorithm
Chapter 10: Graphing in Incanter
Creating scatter plots with Incanter
Creating bar charts with Incanter
Graphing non-numeric data in bar charts
Creating histograms with Incanter
Creating function plots with Incanter
Adding equations to Incanter charts
Adding lines to scatter charts
Customizing charts with JFreeChart
Saving Incanter graphs to PNG
Using PCA to graph multi-dimensional data
Creating dynamic charts with Incanter
Chapter 11: Creating Charts for the Web
Serving data with Ring and Compojure
Creating HTML with Hiccup
Setting up to use ClojureScript
Creating scatter plots with NVD3
Creating bar charts with NVD3
Creating histograms with NVD3
Visualizing graphs with force-directed layouts
Creating interactive visualizations with D3

What You Will Learn

  • Create beautiful, insightful graphs that you can publish to the Internet
  • Apply powerful clustering and data mining techniques to better understand your data
  • Use powerful data analysis libraries like Incanter, Hadoop, and Weka to get things done quickly
  • Interface with Mathematica and R to use the powerful analysis features they provide
  • Process data concurrently and in parallel for faster performance
  • Transform data to make it more useful and easier to analyze


In Detail

Data is everywhere and it's increasingly important to be able to gain insights that we can act on. Using Clojure for data analysis and collection, this book will show you how to gain fresh insights and perspectives from your data with an essential collection of practical, structured recipes.

"The Clojure Data Analysis Cookbook" presents recipes for every stage of the data analysis process. Whether scraping data off a web page, performing data mining, or creating graphs for the web, this book has something for the task at hand.

You'll learn how to acquire data, clean it up, and transform it into useful graphs which can then be analyzed and published to the Internet. Coverage includes advanced topics like processing data concurrently, applying powerful statistical techniques like Bayesian modelling, and even data mining algorithms such as K-means clustering, neural networks, and association rules.


Read More