Clojure Data Analysis Cookbook - Second Edition

Dive into data analysis with Clojure through over 100 practical recipes for every stage of the analysis and collection process
Preview in Mapt

Clojure Data Analysis Cookbook - Second Edition

Eric Rochester

1 customer reviews
Dive into data analysis with Clojure through over 100 practical recipes for every stage of the analysis and collection process
Mapt Subscription
FREE
$29.99/m after trial
eBook
$23.10
RRP $32.99
Save 29%
Print + eBook
$54.99
RRP $54.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$23.10
$54.99
$29.99p/m after trial
RRP $32.99
RRP $54.99
Subscription
eBook
Print + eBook
Start 30 Day Trial

Frequently bought together


Clojure Data Analysis Cookbook - Second Edition Book Cover
Clojure Data Analysis Cookbook - Second Edition
$ 32.99
$ 23.10
R Data Analysis Cookbook - Second Edition Book Cover
R Data Analysis Cookbook - Second Edition
$ 39.99
$ 28.00
Buy 2 for $35.00
Save $37.98
Add to Cart
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 

Book Details

ISBN 139781784390297
Paperback372 pages

Book Description

As data invades more and more of life and business, the need to analyze it effectively has never been greater. With Clojure and this book, you'll soon be getting to grips with every aspect of data analysis. You'll start with practical recipes that show you how to load and clean your data, then get concise instructions to perform all the essential analysis tasks from basic statistics to sophisticated machine learning and data clustering algorithms. Get a more intuitive handle on your data through hands-on visualization techniques that allow you to provide interesting, informative, and compelling reports, and use Clojure to publish your findings to the Web.

Table of Contents

Chapter 1: Importing Data for Analysis
Introduction
Creating a new project
Reading CSV data into Incanter datasets
Reading JSON data into Incanter datasets
Reading data from Excel with Incanter
Reading data from JDBC databases
Reading XML data into Incanter datasets
Scraping data from tables in web pages
Scraping textual data from web pages
Reading RDF data
Querying RDF data with SPARQL
Aggregating data from different formats
Chapter 2: Cleaning and Validating Data
Introduction
Cleaning data with regular expressions
Maintaining consistency with synonym maps
Identifying and removing duplicate data
Regularizing numbers
Calculating relative values
Parsing dates and times
Lazily processing very large data sets
Sampling from very large data sets
Fixing spelling errors
Parsing custom data formats
Validating data with Valip
Chapter 3: Managing Complexity with Concurrent Programming
Introduction
Managing program complexity with STM
Managing program complexity with agents
Getting better performance with commute
Combining agents and STM
Maintaining consistency with ensure
Introducing safe side effects into the STM
Maintaining data consistency with validators
Monitoring processing with watchers
Debugging concurrent programs with watchers
Recovering from errors in agents
Managing large inputs with sized queues
Chapter 4: Improving Performance with Parallel Programming
Introduction
Parallelizing processing with pmap
Parallelizing processing with Incanter
Partitioning Monte Carlo simulations for better pmap performance
Finding the optimal partition size with simulated annealing
Combining function calls with reducers
Parallelizing with reducers
Generating online summary statistics for data streams with reducers
Using type hints
Benchmarking with Criterium
Chapter 5: Distributed Data Processing with Cascalog
Introduction
Initializing Cascalog and Hadoop for distributed processing
Querying data with Cascalog
Distributing data with Apache HDFS
Parsing CSV files with Cascalog
Executing complex queries with Cascalog
Aggregating data with Cascalog
Defining new Cascalog operators
Composing Cascalog queries
Transforming data with Cascalog
Chapter 6: Working with Incanter Datasets
Introduction
Loading Incanter's sample datasets
Loading Clojure data structures into datasets
Viewing datasets interactively with view
Converting datasets to matrices
Using infix formulas in Incanter
Selecting columns with $
Selecting rows with $
Filtering datasets with $where
Grouping data with $group-by
Saving datasets to CSV and JSON
Projecting from multiple datasets with $join
Chapter 7: Statistical Data Analysis with Incanter
Introduction
Generating summary statistics with $rollup
Working with changes in values
Scaling variables to simplify variable relationships
Working with time series data with Incanter Zoo
Smoothing variables to decrease variation
Validating sample statistics with bootstrapping
Modeling linear relationships
Modeling non-linear relationships
Modeling multinomial Bayesian distributions
Finding data errors with Benford's law
Chapter 8: Working with Mathematica and R
Introduction
Setting up Mathematica to talk to Clojuratica for Mac OS X and Linux
Setting up Mathematica to talk to Clojuratica for Windows
Calling Mathematica functions from Clojuratica
Sending matrixes to Mathematica from Clojuratica
Evaluating Mathematica scripts from Clojuratica
Creating functions from Mathematica
Setting up R to talk to Clojure
Calling R functions from Clojure
Passing vectors into R
Evaluating R files from Clojure
Plotting in R from Clojure
Chapter 9: Clustering, Classifying, and Working with Weka
Introduction
Loading CSV and ARFF files into Weka
Filtering, renaming, and deleting columns in Weka datasets
Discovering groups of data using K-Means clustering
Finding hierarchical clusters in Weka
Clustering with SOMs in Incanter
Classifying data with decision trees
Classifying data with the Naive Bayesian classifier
Classifying data with support vector machines
Finding associations in data with the Apriori algorithm
Chapter 10: Working with Unstructured and Textual Data
Introduction
Tokenizing text
Finding sentences
Focusing on content words with stoplists
Getting document frequencies
Scaling document frequencies by document size
Scaling document frequencies with TF-IDF
Finding people, places, and things with Named Entity Recognition
Mapping documents to a sparse vector space representation
Performing topic modeling with MALLET
Performing naïve Bayesian classification with MALLET
Chapter 11: Graphing in Incanter
Introduction
Creating scatter plots with Incanter
Graphing non-numeric data in bar charts
Creating histograms with Incanter
Creating function plots with Incanter
Adding equations to Incanter charts
Adding lines to scatter charts
Customizing charts with JFreeChart
Customizing chart colors and styles
Saving Incanter graphs to PNG
Using PCA to graph multi-dimensional data
Creating dynamic charts with Incanter
Chapter 12: Creating Charts for the Web
Introduction
Serving data with Ring and Compojure
Creating HTML with Hiccup
Setting up to use ClojureScript
Creating scatter plots with NVD3
Creating bar charts with NVD3
Creating histograms with NVD3
Creating time series charts with D3
Visualizing graphs with force-directed layouts
Creating interactive visualizations with D3

What You Will Learn

  • Read data from a variety of data formats
  • Transform data to make it more useful and easier to analyze
  • Process data concurrently and in parallel for faster performance
  • Harness multiple computers to analyze big data
  • Use powerful data analysis libraries such as Incanter, Hadoop, and Weka to get things done quickly
  • Apply powerful clustering and data mining techniques to better understand your data

Authors

Table of Contents

Chapter 1: Importing Data for Analysis
Introduction
Creating a new project
Reading CSV data into Incanter datasets
Reading JSON data into Incanter datasets
Reading data from Excel with Incanter
Reading data from JDBC databases
Reading XML data into Incanter datasets
Scraping data from tables in web pages
Scraping textual data from web pages
Reading RDF data
Querying RDF data with SPARQL
Aggregating data from different formats
Chapter 2: Cleaning and Validating Data
Introduction
Cleaning data with regular expressions
Maintaining consistency with synonym maps
Identifying and removing duplicate data
Regularizing numbers
Calculating relative values
Parsing dates and times
Lazily processing very large data sets
Sampling from very large data sets
Fixing spelling errors
Parsing custom data formats
Validating data with Valip
Chapter 3: Managing Complexity with Concurrent Programming
Introduction
Managing program complexity with STM
Managing program complexity with agents
Getting better performance with commute
Combining agents and STM
Maintaining consistency with ensure
Introducing safe side effects into the STM
Maintaining data consistency with validators
Monitoring processing with watchers
Debugging concurrent programs with watchers
Recovering from errors in agents
Managing large inputs with sized queues
Chapter 4: Improving Performance with Parallel Programming
Introduction
Parallelizing processing with pmap
Parallelizing processing with Incanter
Partitioning Monte Carlo simulations for better pmap performance
Finding the optimal partition size with simulated annealing
Combining function calls with reducers
Parallelizing with reducers
Generating online summary statistics for data streams with reducers
Using type hints
Benchmarking with Criterium
Chapter 5: Distributed Data Processing with Cascalog
Introduction
Initializing Cascalog and Hadoop for distributed processing
Querying data with Cascalog
Distributing data with Apache HDFS
Parsing CSV files with Cascalog
Executing complex queries with Cascalog
Aggregating data with Cascalog
Defining new Cascalog operators
Composing Cascalog queries
Transforming data with Cascalog
Chapter 6: Working with Incanter Datasets
Introduction
Loading Incanter's sample datasets
Loading Clojure data structures into datasets
Viewing datasets interactively with view
Converting datasets to matrices
Using infix formulas in Incanter
Selecting columns with $
Selecting rows with $
Filtering datasets with $where
Grouping data with $group-by
Saving datasets to CSV and JSON
Projecting from multiple datasets with $join
Chapter 7: Statistical Data Analysis with Incanter
Introduction
Generating summary statistics with $rollup
Working with changes in values
Scaling variables to simplify variable relationships
Working with time series data with Incanter Zoo
Smoothing variables to decrease variation
Validating sample statistics with bootstrapping
Modeling linear relationships
Modeling non-linear relationships
Modeling multinomial Bayesian distributions
Finding data errors with Benford's law
Chapter 8: Working with Mathematica and R
Introduction
Setting up Mathematica to talk to Clojuratica for Mac OS X and Linux
Setting up Mathematica to talk to Clojuratica for Windows
Calling Mathematica functions from Clojuratica
Sending matrixes to Mathematica from Clojuratica
Evaluating Mathematica scripts from Clojuratica
Creating functions from Mathematica
Setting up R to talk to Clojure
Calling R functions from Clojure
Passing vectors into R
Evaluating R files from Clojure
Plotting in R from Clojure
Chapter 9: Clustering, Classifying, and Working with Weka
Introduction
Loading CSV and ARFF files into Weka
Filtering, renaming, and deleting columns in Weka datasets
Discovering groups of data using K-Means clustering
Finding hierarchical clusters in Weka
Clustering with SOMs in Incanter
Classifying data with decision trees
Classifying data with the Naive Bayesian classifier
Classifying data with support vector machines
Finding associations in data with the Apriori algorithm
Chapter 10: Working with Unstructured and Textual Data
Introduction
Tokenizing text
Finding sentences
Focusing on content words with stoplists
Getting document frequencies
Scaling document frequencies by document size
Scaling document frequencies with TF-IDF
Finding people, places, and things with Named Entity Recognition
Mapping documents to a sparse vector space representation
Performing topic modeling with MALLET
Performing naïve Bayesian classification with MALLET
Chapter 11: Graphing in Incanter
Introduction
Creating scatter plots with Incanter
Graphing non-numeric data in bar charts
Creating histograms with Incanter
Creating function plots with Incanter
Adding equations to Incanter charts
Adding lines to scatter charts
Customizing charts with JFreeChart
Customizing chart colors and styles
Saving Incanter graphs to PNG
Using PCA to graph multi-dimensional data
Creating dynamic charts with Incanter
Chapter 12: Creating Charts for the Web
Introduction
Serving data with Ring and Compojure
Creating HTML with Hiccup
Setting up to use ClojureScript
Creating scatter plots with NVD3
Creating bar charts with NVD3
Creating histograms with NVD3
Creating time series charts with D3
Visualizing graphs with force-directed layouts
Creating interactive visualizations with D3

Book Details

ISBN 139781784390297
Paperback372 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

Clojure for Domain-specific Languages Book Cover
Clojure for Domain-specific Languages
$ 32.99
$ 23.10
Clojure Reactive Programming Book Cover
Clojure Reactive Programming
$ 35.99
$ 25.20
Mastering Clojure Data Analysis Book Cover
Mastering Clojure Data Analysis
$ 35.99
$ 25.20
Clojure for Machine Learning Book Cover
Clojure for Machine Learning
$ 29.99
$ 21.00
Clojure High Performance Programming Book Cover
Clojure High Performance Programming
$ 20.99
$ 14.70
Clojure for Data Science Book Cover
Clojure for Data Science
$ 35.99
$ 25.20