Mastering Clojure Data Analysis


Mastering Clojure Data Analysis
eBook: $35.99
Formats: PDF, PacktLib, ePub and Mobi formats
$30.59
save 15%!
Print + free eBook + free PacktLib access to the book: $95.98    Print cover: $59.99
$59.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Overview
Table of Contents
Author
Support
Sample Chapters
  • Explore the concept of data analysis using established scientific methods combined with the powerful Clojure language
  • Master Naïve Bayesian Classification, Benford's Law, and much more in Clojure
  • Learn with the help of examples drawn from exciting, real-world data

Book Details

Language : English
Paperback : 340 pages [ 235mm x 191mm ]
Release Date : May 2014
ISBN : 1783284137
ISBN 13 : 9781783284139
Author(s) : Eric Rochester
Topics and Technologies : All Books, Open Source


Table of Contents

Preface
Chapter 1: Network Analysis – The Six Degrees of Kevin Bacon
Chapter 2: GIS Analysis – Mapping Climate Change
Chapter 3: Topic Modeling – Changing Concerns in the State of the Union Addresses
Chapter 4: Classifying UFO Sightings
Chapter 5: Benford's Law – Detecting Natural Progressions of Numbers
Chapter 6: Sentiment Analysis – Categorizing Hotel Reviews
Chapter 7: Null Hypothesis Tests – Analyzing Crime Data
Chapter 8: A/B Testing – Statistical Experiments for the Web
Chapter 9: Analyzing Social Data Participation
Chapter 10: Modeling Stock Data
Index
  • Chapter 1: Network Analysis – The Six Degrees of Kevin Bacon
    • Analyzing social networks
    • Getting the data
    • Understanding graphs
    • Implementing the graphs
      • Loading the data
    • Measuring social network graphs
      • Density
      • Degrees
      • Paths
      • Average path length
      • Network diameter
      • Clustering coefficient
      • Centrality
      • Degrees of separation
    • Visualizing the graph
      • Setting up ClojureScript
      • A force-directed layout
      • A hive plot
      • A pie chart
    • Summary
  • Chapter 2: GIS Analysis – Mapping Climate Change
    • Understanding GIS
    • Mapping the climate change
      • Downloading and extracting the data
        • Downloading the files
        • Extracting the files
      • Transforming the data – filtering
      • Rolling averages
        • Reading the data
      • Interpolating sample points and generating heat maps using inverse distance weighting (IDW)
    • Working with map projections
      • Finding a base map
    • Working with ArcGIS
    • Summary
  • Chapter 4: Classifying UFO Sightings
    • Getting the data
    • Extracting the data
    • Dealing with messy data
    • Visualizing UFO data
    • Description
    • Topic modeling descriptions
    • Hoaxes
      • Preparing the data
        • Reading the data into a sequence of data records
        • Splitting the NUFORC comments
        • Categorizing the documents based on the comments
        • Partitioning the documents into directories based on the categories
        • Dividing them into training and test sets
      • Classifying the data
        • Coding the classifier interface
        • Running the classifier and examining the results
    • Summary
  • Chapter 6: Sentiment Analysis – Categorizing Hotel Reviews
    • Understanding sentiment analysis
    • Getting hotel review data
    • Exploring the data
    • Preparing the data
      • Tokenizing
      • Creating feature vectors
      • Creating feature vector functions and POS tagging
    • Cross-validating the results
    • Calculating error rates
    • Using the Weka machine learning library
      • Connecting Weka and cross-validation
      • Understanding maximum entropy classifiers
      • Understanding naive Bayesian classifiers
    • Running the experiment
    • Examining the results
      • Combining the error rates
    • Improving the results
    • Summary
  • Chapter 7: Null Hypothesis Tests – Analyzing Crime Data
    • Introducing confirmatory data analysis
    • Understanding null hypothesis testing
      • Understanding the process
        • Formulating an initial hypothesis
        • Stating the null and alternative hypotheses
        • Determining appropriate tests
        • Selecting the significance level
        • Determining the critical region
        • Calculating the test statistics and its probability
        • Deciding whether to reject the null hypothesis or not
      • Flipping coins
        • Formulating an initial hypothesis
        • Stating the null and alternative hypotheses
        • Identifying the statistical assumptions in the sample
        • Determining appropriate tests
    • Understanding burglary rates
      • Getting the data
      • Parsing the Excel files
      • Pulling out raw data
        • Growing a data tree
        • Cutting down the data tree
        • Putting it all together
        • Transforming the data
        • Joining the data sources
        • Pivoting the data
        • Filtering the missing data
        • Putting it all together
    • Exploring the data
      • Generating summary statistics
        • Summarizing UNODC crime data
        • Summarizing World Bank land area and GNI data
      • Generating more charts and graphs
    • Conducting the experiment
      • Formulating an initial hypothesis
      • Stating the null and alternative hypotheses
      • Identifying the statistical assumptions in the sample
      • Determining appropriate tests
        • Understanding Spearman's rank correlation coefficient
      • Selecting the significance level
      • Determining the critical region
      • Calculating the test statistic and its probability
      • Deciding whether to reject the null hypothesis or not
    • Interpreting the results
    • Summary
  • Chapter 8: A/B Testing – Statistical Experiments for the Web
    • Defining A/B testing
    • Conducting an A/B test
      • Planning the experiment
      • Framing the statistics
      • Building the experiment
        • Looking at options to build the site
      • Implementing A/B testing on the server
        • Understanding the scaffolded site
      • Building the test site
      • Implementing A/B testing
      • Viewing the results
        • Looking at A/B testing as a user
      • Analyzing the results
        • Understanding the t-test
      • Testing the results
    • Summary
  • Chapter 9: Analyzing Social Data Participation
    • Setting up the project
      • Understanding the analyses
      • Understanding social network data
      • Understanding knowledge-based social networks
      • Introducing the 80/20 rule
        • Getting the data
        • Looking at the amount of data
        • Defining and loading the data
        • Counting frequencies
        • Sorting and ranking
        • Finding the patterns of participation
      • Matching the 80/20 rule
      • Looking for the 20 percent of questioners
      • Looking for the 20 percent of respondents
      • Combining ranks
        • Looking at those who only post questions
        • Looking at those who only post answers
        • Looking at those who post both questions and answers
      • Finding the up-voted answers
      • Processing the answers
        • Predicting the accepted answer
      • Setting up
        • Creating the InstanceList object
      • Training sets and Test sets
        • Training
        • Testing
      • Evaluating the outcome
    • Summary
  • Chapter 10: Modeling Stock Data
    • Learning about financial data analysis
    • Setting up the basics
      • Setting up the library
      • Getting the data
    • Getting prepared with data
      • Working with news articles
      • Working with stock data
    • Analyzing the text
      • Analyzing vocabulary
      • Stop lists
      • Hapax and Dis Legomena
      • TF-IDF
    • Inspecting the stock prices
    • Merging text and stock features
    • Analyzing both text and stock features together with neural nets
      • Understanding neural nets
      • Setting up the neural net
      • Training the neural net
      • Running the neural net
      • Validating the neural net
      • Finding the best parameters
    • Predicting the future
      • Loading stock prices
      • Loading news articles
      • Creating training and test sets
      • Finding the best parameters for the neural network
      • Training and validating the neural network
      • Running the network on new data
    • Taking it with a grain of salt
      • Related to this project
      • Related to machine learning and market modeling in general
    • Summary

Eric Rochester

Eric Rochester enjoys reading, writing, and spending time with his wife and kids. When he's not doing these things, he likes to work on programs in a variety of languages and platforms. Currently, he is exploring functional programming languages, including Clojure and Haskell. He has also written Clojure Data Analysis Cookbook, Packt Publishing. He works at the Scholars' Lab library at the University of Virginia, helping the professors and graduate students of humanities realize their digitally informed research agendas.

Sorry, we don't have any reviews for this title yet.

Code Downloads

Download the code and support files for this book.


Submit Errata

Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.


Errata

- 1 submitted: last submission 23 Jun 2014

Type: Grammar  |  Page no: 1

companies no long seem to want

should be

companies no longer seem to want

 

Page: 4 and 11      |     Type: None


[clojure.set :as set] is required twice.

Sample chapters

You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

Frequently bought together

Mastering Clojure Data Analysis +    Oracle BPM Suite 11g Developer's cookbook =
50% Off
the second eBook
Price for both: $51.15

Buy both these recommended eBooks together and get 50% off the cheapest eBook.

What you will learn from this book

  • Use geospatial data to learn about geographical patterns in data
  • Use sentiment analysis to determine people's opinions from online reviews
  • Frame and implement statistical experiments
  • Use A/B testing to determine the best
  • UI to keep users engaged
  • Work with time series data
  • Learn how to use parallelization and concurrency to work with large datasets
  • Use topic modeling to find the subjects discussed in a group of documents
  • Use network analysis to learn about online social networks

In Detail

Clojure is a Lisp dialect built on top of the Java Virtual Machine. As data increasingly invades more and more parts of our lives, we continually need more tools to deal with it effectively. Data can be organized effectively using Clojure data tools.

Mastering Clojure Data Analysis teaches you how to analyze and visualize complex datasets. With this book, you'll learn how to perform data analysis using established scientific methods with the modern, powerful Clojure programming language with the help of exciting examples drawn from real-world data. This will help you get to grips with advanced topics such as network analysis, the characteristics of social networks, applying topic modeling to get a handle on unstructured textual data, and GIS analysis to apply geospatial techniques to your data analysis problems.

With this guide, you'll learn how to leverage the power and flexibility of Clojure to dig into your data and access the insights it hides.

Approach

This book consists of a practical, example-oriented approach that aims to help you learn how to use Clojure for data analysis quickly and efficiently.

Who this book is for

This book is great for those who have experience with Clojure and need to use it to perform data analysis. This book will also be hugely beneficial for readers with basic experience in data analysis and statistics.

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software