Clojure Data Analysis Cookbook - Second Edition

Dive into data analysis with Clojure through over 100 practical recipes for every stage of the analysis and collection process

Clojure Data Analysis Cookbook - Second Edition

This ebook is included in a Mapt subscription
Eric Rochester

1 customer reviews
Dive into data analysis with Clojure through over 100 practical recipes for every stage of the analysis and collection process
$0.00
$28.05
$54.99
$29.99p/m after trial
RRP $32.99
RRP $54.99
Subscription
eBook
Print + eBook
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 4,000+ eBooks & Videos
  • 40+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Preview in Mapt

Book Details

ISBN 139781784390297
Paperback372 pages

Book Description

As data invades more and more of life and business, the need to analyze it effectively has never been greater. With Clojure and this book, you'll soon be getting to grips with every aspect of data analysis. You'll start with practical recipes that show you how to load and clean your data, then get concise instructions to perform all the essential analysis tasks from basic statistics to sophisticated machine learning and data clustering algorithms. Get a more intuitive handle on your data through hands-on visualization techniques that allow you to provide interesting, informative, and compelling reports, and use Clojure to publish your findings to the Web.

Table of Contents

Chapter 1: Importing Data for Analysis
Introduction
Creating a new project
Reading CSV data into Incanter datasets
Reading JSON data into Incanter datasets
Reading data from Excel with Incanter
Reading data from JDBC databases
Reading XML data into Incanter datasets
Scraping data from tables in web pages
Scraping textual data from web pages
Reading RDF data
Querying RDF data with SPARQL
Aggregating data from different formats
Chapter 2: Cleaning and Validating Data
Introduction
Cleaning data with regular expressions
Maintaining consistency with synonym maps
Identifying and removing duplicate data
Regularizing numbers
Calculating relative values
Parsing dates and times
Lazily processing very large data sets
Sampling from very large data sets
Fixing spelling errors
Parsing custom data formats
Validating data with Valip
Chapter 3: Managing Complexity with Concurrent Programming
Introduction
Managing program complexity with STM
Managing program complexity with agents
Getting better performance with commute
Combining agents and STM
Maintaining consistency with ensure
Introducing safe side effects into the STM
Maintaining data consistency with validators
Monitoring processing with watchers
Debugging concurrent programs with watchers
Recovering from errors in agents
Managing large inputs with sized queues
Chapter 4: Improving Performance with Parallel Programming
Introduction
Parallelizing processing with pmap
Parallelizing processing with Incanter
Partitioning Monte Carlo simulations for better pmap performance
Finding the optimal partition size with simulated annealing
Combining function calls with reducers
Parallelizing with reducers
Generating online summary statistics for data streams with reducers
Using type hints
Benchmarking with Criterium
Chapter 5: Distributed Data Processing with Cascalog
Introduction
Initializing Cascalog and Hadoop for distributed processing
Querying data with Cascalog
Distributing data with Apache HDFS
Parsing CSV files with Cascalog
Executing complex queries with Cascalog
Aggregating data with Cascalog
Defining new Cascalog operators
Composing Cascalog queries
Transforming data with Cascalog
Chapter 6: Working with Incanter Datasets
Introduction
Loading Incanter's sample datasets
Loading Clojure data structures into datasets
Viewing datasets interactively with view
Converting datasets to matrices
Using infix formulas in Incanter
Selecting columns with $
Selecting rows with $
Filtering datasets with $where
Grouping data with $group-by
Saving datasets to CSV and JSON
Projecting from multiple datasets with $join
Chapter 7: Statistical Data Analysis with Incanter
Introduction
Generating summary statistics with $rollup
Working with changes in values
Scaling variables to simplify variable relationships
Working with time series data with Incanter Zoo
Smoothing variables to decrease variation
Validating sample statistics with bootstrapping
Modeling linear relationships
Modeling non-linear relationships
Modeling multinomial Bayesian distributions
Finding data errors with Benford's law
Chapter 8: Working with Mathematica and R
Introduction
Setting up Mathematica to talk to Clojuratica for Mac OS X and Linux
Setting up Mathematica to talk to Clojuratica for Windows
Calling Mathematica functions from Clojuratica
Sending matrixes to Mathematica from Clojuratica
Evaluating Mathematica scripts from Clojuratica
Creating functions from Mathematica
Setting up R to talk to Clojure
Calling R functions from Clojure
Passing vectors into R
Evaluating R files from Clojure
Plotting in R from Clojure
Chapter 9: Clustering, Classifying, and Working with Weka
Introduction
Loading CSV and ARFF files into Weka
Filtering, renaming, and deleting columns in Weka datasets
Discovering groups of data using K-Means clustering
Finding hierarchical clusters in Weka
Clustering with SOMs in Incanter
Classifying data with decision trees
Classifying data with the Naive Bayesian classifier
Classifying data with support vector machines
Finding associations in data with the Apriori algorithm
Chapter 10: Working with Unstructured and Textual Data
Introduction
Tokenizing text
Finding sentences
Focusing on content words with stoplists
Getting document frequencies
Scaling document frequencies by document size
Scaling document frequencies with TF-IDF
Finding people, places, and things with Named Entity Recognition
Mapping documents to a sparse vector space representation
Performing topic modeling with MALLET
Performing naïve Bayesian classification with MALLET
Chapter 11: Graphing in Incanter
Introduction
Creating scatter plots with Incanter
Graphing non-numeric data in bar charts
Creating histograms with Incanter
Creating function plots with Incanter
Adding equations to Incanter charts
Adding lines to scatter charts
Customizing charts with JFreeChart
Customizing chart colors and styles
Saving Incanter graphs to PNG
Using PCA to graph multi-dimensional data
Creating dynamic charts with Incanter
Chapter 12: Creating Charts for the Web
Introduction
Serving data with Ring and Compojure
Creating HTML with Hiccup
Setting up to use ClojureScript
Creating scatter plots with NVD3
Creating bar charts with NVD3
Creating histograms with NVD3
Creating time series charts with D3
Visualizing graphs with force-directed layouts
Creating interactive visualizations with D3

What You Will Learn

  • Read data from a variety of data formats
  • Transform data to make it more useful and easier to analyze
  • Process data concurrently and in parallel for faster performance
  • Harness multiple computers to analyze big data
  • Use powerful data analysis libraries such as Incanter, Hadoop, and Weka to get things done quickly
  • Apply powerful clustering and data mining techniques to better understand your data

Authors

Table of Contents

Chapter 1: Importing Data for Analysis
Introduction
Creating a new project
Reading CSV data into Incanter datasets
Reading JSON data into Incanter datasets
Reading data from Excel with Incanter
Reading data from JDBC databases
Reading XML data into Incanter datasets
Scraping data from tables in web pages
Scraping textual data from web pages
Reading RDF data
Querying RDF data with SPARQL
Aggregating data from different formats
Chapter 2: Cleaning and Validating Data
Introduction
Cleaning data with regular expressions
Maintaining consistency with synonym maps
Identifying and removing duplicate data
Regularizing numbers
Calculating relative values
Parsing dates and times
Lazily processing very large data sets
Sampling from very large data sets
Fixing spelling errors
Parsing custom data formats
Validating data with Valip
Chapter 3: Managing Complexity with Concurrent Programming
Introduction
Managing program complexity with STM
Managing program complexity with agents
Getting better performance with commute
Combining agents and STM
Maintaining consistency with ensure
Introducing safe side effects into the STM
Maintaining data consistency with validators
Monitoring processing with watchers
Debugging concurrent programs with watchers
Recovering from errors in agents
Managing large inputs with sized queues
Chapter 4: Improving Performance with Parallel Programming
Introduction
Parallelizing processing with pmap
Parallelizing processing with Incanter
Partitioning Monte Carlo simulations for better pmap performance
Finding the optimal partition size with simulated annealing
Combining function calls with reducers
Parallelizing with reducers
Generating online summary statistics for data streams with reducers
Using type hints
Benchmarking with Criterium
Chapter 5: Distributed Data Processing with Cascalog
Introduction
Initializing Cascalog and Hadoop for distributed processing
Querying data with Cascalog
Distributing data with Apache HDFS
Parsing CSV files with Cascalog
Executing complex queries with Cascalog
Aggregating data with Cascalog
Defining new Cascalog operators
Composing Cascalog queries
Transforming data with Cascalog
Chapter 6: Working with Incanter Datasets
Introduction
Loading Incanter's sample datasets
Loading Clojure data structures into datasets
Viewing datasets interactively with view
Converting datasets to matrices
Using infix formulas in Incanter
Selecting columns with $
Selecting rows with $
Filtering datasets with $where
Grouping data with $group-by
Saving datasets to CSV and JSON
Projecting from multiple datasets with $join
Chapter 7: Statistical Data Analysis with Incanter
Introduction
Generating summary statistics with $rollup
Working with changes in values
Scaling variables to simplify variable relationships
Working with time series data with Incanter Zoo
Smoothing variables to decrease variation
Validating sample statistics with bootstrapping
Modeling linear relationships
Modeling non-linear relationships
Modeling multinomial Bayesian distributions
Finding data errors with Benford's law
Chapter 8: Working with Mathematica and R
Introduction
Setting up Mathematica to talk to Clojuratica for Mac OS X and Linux
Setting up Mathematica to talk to Clojuratica for Windows
Calling Mathematica functions from Clojuratica
Sending matrixes to Mathematica from Clojuratica
Evaluating Mathematica scripts from Clojuratica
Creating functions from Mathematica
Setting up R to talk to Clojure
Calling R functions from Clojure
Passing vectors into R
Evaluating R files from Clojure
Plotting in R from Clojure
Chapter 9: Clustering, Classifying, and Working with Weka
Introduction
Loading CSV and ARFF files into Weka
Filtering, renaming, and deleting columns in Weka datasets
Discovering groups of data using K-Means clustering
Finding hierarchical clusters in Weka
Clustering with SOMs in Incanter
Classifying data with decision trees
Classifying data with the Naive Bayesian classifier
Classifying data with support vector machines
Finding associations in data with the Apriori algorithm
Chapter 10: Working with Unstructured and Textual Data
Introduction
Tokenizing text
Finding sentences
Focusing on content words with stoplists
Getting document frequencies
Scaling document frequencies by document size
Scaling document frequencies with TF-IDF
Finding people, places, and things with Named Entity Recognition
Mapping documents to a sparse vector space representation
Performing topic modeling with MALLET
Performing naïve Bayesian classification with MALLET
Chapter 11: Graphing in Incanter
Introduction
Creating scatter plots with Incanter
Graphing non-numeric data in bar charts
Creating histograms with Incanter
Creating function plots with Incanter
Adding equations to Incanter charts
Adding lines to scatter charts
Customizing charts with JFreeChart
Customizing chart colors and styles
Saving Incanter graphs to PNG
Using PCA to graph multi-dimensional data
Creating dynamic charts with Incanter
Chapter 12: Creating Charts for the Web
Introduction
Serving data with Ring and Compojure
Creating HTML with Hiccup
Setting up to use ClojureScript
Creating scatter plots with NVD3
Creating bar charts with NVD3
Creating histograms with NVD3
Creating time series charts with D3
Visualizing graphs with force-directed layouts
Creating interactive visualizations with D3

Book Details

ISBN 139781784390297
Paperback372 pages
Read More
From 1 reviews

Read More Reviews