Clojure Data Analysis Cookbook


Clojure Data Analysis Cookbook
eBook: $32.99
Formats: PDF, PacktLib, ePub and Mobi formats
$28.04
save 15%!
Print + free eBook + free PacktLib access to the book: $87.98    Print cover: $54.99
$54.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Overview
Table of Contents
Author
Support
Sample Chapters
  • Get a handle on the torrent of data the modern Internet has created
  • Recipes for every stage from collection to analysis
  • A practical approach to analyzing data to help you make informed decisions

Book Details

Language : English
Paperback : 342 pages [ 235mm x 191mm ]
Release Date : March 2013
ISBN : 178216264X
ISBN 13 : 9781782162643
Author(s) : Eric Rochester
Topics and Technologies : All Books, Big Data and Business Intelligence, Data, Cookbooks, Open Source

Table of Contents

Preface
Chapter 1: Importing Data for Analysis
Chapter 2: Cleaning and Validating Data
Chapter 3: Managing Complexity with Concurrent Programming
Chapter 4: Improving Performance with Parallel Programming
Chapter 5: Distributed Data Processing with Cascalog
Chapter 6: Working with Incanter Datasets
Chapter 7: Preparing for and Performing Statistical Data Analysis with Incanter
Chapter 8: Working with Mathematica and R
Chapter 9: Clustering, Classifying, and Working with Weka
Chapter 10: Graphing in Incanter
Chapter 11: Creating Charts for the Web
Index
  • Chapter 1: Importing Data for Analysis
    • Introduction
    • Creating a new project
    • Reading CSV data into Incanter datasets
    • Reading JSON data into Incanter datasets
    • Reading data from Excel with Incanter
    • Reading data from JDBC databases
    • Reading XML data into Incanter datasets
    • Scraping data from tables in web pages
    • Scraping textual data from web pages
    • Reading RDF data
    • Reading RDF data with SPARQL
    • Aggregating data from different formats
    • Chapter 2: Cleaning and Validating Data
      • Introduction
      • Cleaning data with regular expressions
      • Maintaining consistency with synonym maps
      • Identifying and removing duplicate data
      • Normalizing numbers
      • Rescaling values
      • Normalizing dates and times
      • Lazily processing very large data sets
      • Sampling from very large data sets
      • Fixing spelling errors
      • Parsing custom data formats
      • Validating data with Valip
      • Chapter 3: Managing Complexity with Concurrent Programming
        • Introduction
        • Managing program complexity with STM
        • Managing program complexity with agents
        • Getting better performance with commute
        • Combining agents and STM
        • Maintaining consistency with ensure
        • Introducing safe side effects into the STM
        • Maintaining data consistency with validators
        • Tracking processing with watchers
        • Debugging concurrent programs with watchers
        • Recovering from errors in agents
        • Managing input with sized queues
        • Chapter 4: Improving Performance with Parallel Programming
          • Introduction
          • Parallelizing processing with pmap
          • Parallelizing processing with Incanter
          • Partitioning Monte Carlo simulations for better pmap performance
          • Finding the optimal partition size with simulated annealing
          • Parallelizing with reducers
          • Generating online summary statistics with reducers
          • Harnessing your GPU with OpenCL and Calx
          • Using type hints
          • Benchmarking with Criterium
          • Chapter 5: Distributed Data Processing with Cascalog
            • Introduction
            • Distributed processing with Cascalog and Hadoop
            • Querying data with Cascalog
            • Distributing data with Apache HDFS
            • Parsing CSV files with Cascalog
            • Complex queries with Cascalog
            • Aggregating data with Cascalog
            • Defining new Cascalog operators
            • Composing Cascalog queries
            • Handling errors in Cascalog workflows
            • Transforming data with Cascalog
            • Executing Cascalog queries in the Cloud with Pallet
            • Chapter 6: Working with Incanter Datasets
              • Introduction
              • Loading Incanter's sample datasets
              • Loading Clojure data structures into datasets
              • Viewing datasets interactively with view
              • Converting datasets to matrices
              • Using infix formulas in Incanter
              • Selecting columns with $
              • Selecting rows with $
              • Filtering datasets with $where
              • Grouping data with $group-by
              • Saving datasets to CSV and JSON
              • Projecting from multiple datasets with $join
              • Chapter 7: Preparing for and Performing Statistical Data Analysis with Incanter
                • Introduction
                • Generating summary statistics with $rollup
                • Differencing variables to show changes
                • Scaling variables to simplify variable relationships
                • Working with time series data with Incanter Zoo
                • Smoothing variables to decrease noise
                • Validating sample statistics with bootstrapping
                • Modeling linear relationships
                • Modeling non-linear relationships
                • Modeling multimodal Bayesian distributions
                • Finding data errors with Benford's law
                • Chapter 8: Working with Mathematica and R
                  • Introduction
                  • Setting up Mathematica to talk to Clojuratica for Mac OS X and Linux
                  • Setting up Mathematica to talk to Clojuratica for Windows
                  • Calling Mathematica functions from Clojuratica
                  • Sending matrices to Mathematica from Clojuratica
                  • Evaluating Mathematica scripts from Clojuratica
                  • Creating functions from Mathematica
                  • Processing functions in parallel in Mathematica
                  • Setting up R to talk to Clojure
                  • Calling R functions from Clojure
                  • Passing vectors into R
                  • Evaluating R files from Clojure
                  • Plotting in R from Clojure
                  • Chapter 9: Clustering, Classifying, and Working with Weka
                    • Introduction
                    • Loading CSV and ARFF files into Weka
                    • Filtering and renaming columns in Weka datasets
                    • Discovering groups of data using K-means clustering
                    • Finding hierarchical clusters in Weka
                    • Clustering with SOMs in Incanter
                    • Classifying data with decision trees
                    • Classifying data with the Naive Bayesian classifier
                    • Classifying data with support vector machines
                    • Finding associations in data with the Apriori algorithm
                    • Chapter 10: Graphing in Incanter
                      • Introduction
                      • Creating scatter plots with Incanter
                      • Creating bar charts with Incanter
                      • Graphing non-numeric data in bar charts
                      • Creating histograms with Incanter
                      • Creating function plots with Incanter
                      • Adding equations to Incanter charts
                      • Adding lines to scatter charts
                      • Customizing charts with JFreeChart
                      • Saving Incanter graphs to PNG
                      • Using PCA to graph multi-dimensional data
                      • Creating dynamic charts with Incanter
                      • Chapter 11: Creating Charts for the Web
                        • Introduction
                        • Serving data with Ring and Compojure
                        • Creating HTML with Hiccup
                        • Setting up to use ClojureScript
                        • Creating scatter plots with NVD3
                        • Creating bar charts with NVD3
                        • Creating histograms with NVD3
                        • Visualizing graphs with force-directed layouts
                        • Creating interactive visualizations with D3

                        Eric Rochester

                        Eric Rochester enjoys reading, writing, and spending time with his wife and kids. When he’s not doing those things, he programs in a variety of languages and platforms. Currently, he’s been exploring functional programming languages, including Clojure and Haskell. He's also the author of the Clojure Data Analysis Cookbook. He works at the Scholars’ Lab in the library at the University of Virginia, helping humanities professors and graduate students realize their digitally informed research agendas.




                        Sorry, we don't have any reviews for this title yet.

                        Code Downloads

                        Download the code and support files for this book.


                        Submit Errata

                        Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.


                        Errata

                        - 15 submitted: last submission 07 Nov 2013

                         

                        Errata Type: Code | Chapter 5

                        If you get a series of cascading.flow.FlowException exceptions when you run this, there's a good chance that the JVM requires its heap size increased. You can easily do that from the projects.clj file by adding this line to that file: 

                         :jvm-opts ["-Xmx1g"] 

                         

                        Errata Type: Code | Page no: 12

                         

                        user=> (to-dataset (read-json (slurp "data/small-sample.json")))

                        should be

                        user=> (to-dataset (read-str (slurp "data/small-sample.json")))

                         

                        Errata type: Code | Page no 11 

                        The libraries used in the REPL   

                        (use 'incanter.core 'clojure.data.json)

                        Might give a warning as follows: 

                        WARNING: read already refers to: #'clojure.core/read in namespace: user,  

                        being replaced by: #'clojure.data.json/read

                        IllegalStateException pprint already refers to: #'clojure.pprint/pprint in  

                        namespace: user  clojure.lang.Namespace.warnOrFailOnReplace (Namespace.java:88)

                        For this example to work import the read-json function from clojure.data.json using the following:

                        (use '[clojure.data.json :only (read-json)] 'incanter.core)

                         

                        Errata Type: Technical | Page no: 185

                        In the How to do it... section of the Differencing variables to show changes recipe, the second bullet for the function replace-error needs to check that it passes only strings to empty?.

                        In the book it currently reads:

                        (defn replace-empty [x] (if (empty? x) 0 x))

                        It should be replaced by:

                        (defn replace-empty [x] (if (and (string? x) (empty? x)) 0 x))

                        Errata type: Typo | Page no 28

                        In the How to do it... section of the Reading RDF data recipe in step 5:

                        user=> (load-data t-store (File. "data/currencies.xml") q)

                        Should be 

                        user=> (load-data tstore (File. "data/currencies.xml") q)

                        Errata Type: Typo | Page no 187

                        In the Scaling variables to simplify variable relationships recipe in the require a space is missing:

                        (require '[incanter.core :asi]

                        Should be 

                        (require '[incanter.core :as i]

                        Errata Type: Code | Page no: 190

                        The snippet for require in the book is:

                        (require

                          '[incanter.core :as i]

                          '[incanter.zoo :as zoo]

                          '[clj-time.format :as tf])

                        It should instead be:

                        (require

                          '[incanter.core :as i]

                          'incanter.io

                          '[incanter.zoo :as zoo]

                          '[clj-time.format :as tf])

                         

                        Errata Type: Code | Page no: 190

                        In the def data form, the code given in the book is:

                        (def data

                          (i/with-data

                            (i/col-names

                              (incanter.io/read-dataset data-file)

                              [:date-str :open :high :low :close :volume])

                            (->>

                              (i/$map parse-date :date-str)

                              (i/dataset [:date])

                              (i/conj-cols i/$data))))

                        It should instead be:

                        (def data

                          (i/with-data

                            (i/col-names

                              (incanter.io/read-dataset data-file :header true)

                              [:date-str :open :high :low :close :volume])

                            (->>

                              (i/$map parse-date :date-str)

                              (i/dataset [:date])

                              (i/conj-cols i/$data))))

                        Errata Type: Code | page no: 198

                        (def family-data

                          (incanter.io/read-dataset "data/all_160_in_51.P35.csv"

                                                    :header true))

                        Should be:

                        (def family-data

                          incanter.io/read-dataset data-file

                                                    :header true))

                        Errata type: Code | Page no:204

                        The require in the book is as follows:

                        (require

                          '[incanter.core :as i]

                          'incanter.io

                          '[incanter.bayes :as b]

                          '[incanter.stats :as s])

                        It should instad be:

                        (require

                          '[incanter.core :as i]

                          'incanter.io

                         '[incanter.charts :as c]

                          '[incanter.bayes :as b]

                          '[incanter.stats :as s])

                        Page 206

                        The output of the graph will be slightly different for you, since it's based upon a random sample of the data.

                        Errata type: Code | Page no: 208

                        The code in the book is:

                        (require

                          '[incanter.core :as i]

                          'incanter.io

                          '[incanter.stats :as s])

                        It should instead be:

                        p(require

                          '[incanter.core :as i]

                          'incanter.io

                          '[incanter.charts :as c]

                          '[incanter.stats :as s])

                        Errata type: Code
                        Page No. 238, in Step 1, line number 3
                        (let [attrs (map inc (map attr - n remove - attrs))
                        It should be (let [attrs (map inc (map # (attr - n dataset %) remove - attrs))

                        Errata type: Code | Page no: 239

                        The code line number 2 and 7 in Step 2 have an extra pair of square brackets [] which need to be removed.

                        Errata type: Code

                        Page No. 42, Last line of code
                        \space.
                        It should be a part of the previous comment line
                        # Separator. Probably one of \(, \), \-,

                        Sample chapters

                        You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

                        Frequently bought together

                        Clojure Data Analysis Cookbook +    Microsoft Dynamics NAV 2009 Programming Cookbook =
                        50% Off
                        the second eBook
                        Price for both: $47.10

                        Buy both these recommended eBooks together and get 50% off the cheapest eBook.

                        What you will learn from this book

                        • Create beautiful, insightful graphs that you can publish to the Internet
                        • Apply powerful clustering and data mining techniques to better understand your data
                        • Use powerful data analysis libraries like Incanter, Hadoop, and Weka to get things done quickly
                        • Interface with Mathematica and R to use the powerful analysis features they provide
                        • Process data concurrently and in parallel for faster performance
                        • Transform data to make it more useful and easier to analyze

                         

                        In Detail

                        Data is everywhere and it's increasingly important to be able to gain insights that we can act on. Using Clojure for data analysis and collection, this book will show you how to gain fresh insights and perspectives from your data with an essential collection of practical, structured recipes.

                        "The Clojure Data Analysis Cookbook" presents recipes for every stage of the data analysis process. Whether scraping data off a web page, performing data mining, or creating graphs for the web, this book has something for the task at hand.

                        You'll learn how to acquire data, clean it up, and transform it into useful graphs which can then be analyzed and published to the Internet. Coverage includes advanced topics like processing data concurrently, applying powerful statistical techniques like Bayesian modelling, and even data mining algorithms such as K-means clustering, neural networks, and association rules.

                        Approach

                        Full of practical tips, the "Clojure Data Analysis Cookbook" will help you fully utilize your data through a series of step-by-step, real world recipes covering every aspect of data analysis.

                        Who this book is for

                        Prior experience with Clojure and data analysis techniques and workflows will be beneficial, but not essential.

                        Code Download and Errata
                        Packt Anytime, Anywhere
                        Register Books
                        Print Upgrades
                        eBook Downloads
                        Video Support
                        Contact Us
                        Awards Voting Nominations Previous Winners
                        Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
                        Resources
                        Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software