Practical Data Analysis

For small businesses, analyzing the information contained in their data using open source technology could be game-changing. All you need is some basic programming and mathematical skills to do just that.

Practical Data Analysis

Hector Cuesta

1 customer reviews
For small businesses, analyzing the information contained in their data using open source technology could be game-changing. All you need is some basic programming and mathematical skills to do just that.
Mapt Subscription
FREE
$29.99/m after trial
eBook
$21.00
RRP $29.99
Save 29%
Print + eBook
$49.99
RRP $49.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$21.00
$49.99
$29.99p/m after trial
RRP $29.99
RRP $49.99
Subscription
eBook
Print + eBook
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Preview in Mapt

Book Details

ISBN 139781783280995
Paperback360 pages

Book Description

Plenty of small businesses face big amounts of data but lack the internal skills to support quantitative analysis. Understanding how to harness the power of data analysis using the latest open source technology can lead them to providing better customer service, the visualization of customer needs, or even the ability to obtain fresh insights about the performance of previous products. Practical Data Analysis is a book ideal for home and small business users who want to slice and dice the data they have on hand with minimum hassle.

Practical Data Analysis is a hands-on guide to understanding the nature of your data and turn it into insight. It will introduce you to the use of machine learning techniques, social networks analytics, and econometrics to help your clients get insights about the pool of data they have at hand. Performing data preparation and processing over several kinds of data such as text, images, graphs, documents, and time series will also be covered.

Practical Data Analysis presents a detailed exploration of the current work in data analysis through self-contained projects. First you will explore the basics of data preparation and transformation through OpenRefine. Then you will get started with exploratory data analysis using the D3js visualization framework. You will also be introduced to some of the machine learning techniques such as, classification, regression, and clusterization through practical projects such as spam classification, predicting gold prices, and finding clusters in your Facebook friends’ network. You will learn how to solve problems in text classification, simulation, time series forecast, social media, and MapReduce through detailed projects. Finally you will work with large amounts of Twitter data using MapReduce to perform a sentiment analysis implemented in Python and MongoDB.

Practical Data Analysis contains a combination of carefully selected algorithms and data scrubbing that enables you to turn your data into insight.

Table of Contents

Chapter 1: Getting Started
Computer science
Artificial intelligence (AI)
Machine Learning (ML)
Statistics
Mathematics
Knowledge domain
Data, information, and knowledge
The nature of data
The data analysis process
Quantitative versus qualitative data analysis
Importance of data visualization
What about big data?
Summary
Chapter 2: Working with Data
Datasource
Data scrubbing
Data formats
Getting started with OpenRefine
Summary
Chapter 3: Data Visualization
Data-Driven Documents (D3)
Getting started with D3.js
Interaction and animation
Summary
Chapter 4: Text Classification
Learning and classification
Bayesian classification
E-mail subject line tester
The algorithm
Classifier accuracy
Summary
Chapter 5: Similarity-based Image Retrieval
Image similarity search
Dynamic time warping (DTW)
Processing the image dataset
Implementing DTW
Analyzing the results
Summary
Chapter 6: Simulation of Stock Prices
Financial time series
Random walk simulation
Monte Carlo methods
Generating random numbers
Implementation in D3.js
Summary
Chapter 7: Predicting Gold Prices
Working with the time series data
Smoothing the time series
The data – historical gold prices
Nonlinear regression
Summary
Chapter 8: Working with Support Vector Machines
Understanding the multivariate dataset
Dimensionality reduction
Getting started with support vector machine
Summary
Chapter 9: Modeling Infectious Disease with Cellular Automata
Introduction to epidemiology
The epidemic models
Modeling with cellular automata
Simulation of the SIRS model in CA with D3.js
Summary
Chapter 10: Working with Social Graphs
Structure of a graph
Social Networks Analysis
Acquiring my Facebook graph
Representing graphs with Gephi
Statistical analysis
Degree distribution
Transforming GDF to JSON
Graph visualization with D3.js
Summary
Chapter 11: Sentiment Analysis of Twitter Data
The anatomy of Twitter data
Using OAuth to access Twitter API
Getting started with Twython
Sentiment classification
Getting started with Natural Language Toolkit (NLTK)
Summary
Chapter 12: Data Processing and Aggregation with MongoDB
Getting started with MongoDB
Data preparation
Group
The aggregation framework
Summary
Chapter 13: Working with MapReduce
MapReduce overview
Programming model
Using MapReduce with MongoDB
Filtering the input collection
Grouping and aggregation
Word cloud visualization of the most common positive words in tweets
Summary
Chapter 14: Online Data Analysis with IPython and Wakari
Getting started with Wakari
Getting started with IPython Notebook
Introduction to image processing with PIL
Getting started with Pandas
Multiprocessing with IPython
Sharing your Notebook
Summary

What You Will Learn

Work with data to get meaningful results from your data analysis projects
Visualize your data to find trends and correlations
Build your own image similarity search engine
Learn how to forecast numerical values from time series data
Create an interactive visualization for your social media graph
Explore the MapReduce framework in MongoDB
Create interactive simulations with D3js

Authors

Table of Contents

Chapter 1: Getting Started
Computer science
Artificial intelligence (AI)
Machine Learning (ML)
Statistics
Mathematics
Knowledge domain
Data, information, and knowledge
The nature of data
The data analysis process
Quantitative versus qualitative data analysis
Importance of data visualization
What about big data?
Summary
Chapter 2: Working with Data
Datasource
Data scrubbing
Data formats
Getting started with OpenRefine
Summary
Chapter 3: Data Visualization
Data-Driven Documents (D3)
Getting started with D3.js
Interaction and animation
Summary
Chapter 4: Text Classification
Learning and classification
Bayesian classification
E-mail subject line tester
The algorithm
Classifier accuracy
Summary
Chapter 5: Similarity-based Image Retrieval
Image similarity search
Dynamic time warping (DTW)
Processing the image dataset
Implementing DTW
Analyzing the results
Summary
Chapter 6: Simulation of Stock Prices
Financial time series
Random walk simulation
Monte Carlo methods
Generating random numbers
Implementation in D3.js
Summary
Chapter 7: Predicting Gold Prices
Working with the time series data
Smoothing the time series
The data – historical gold prices
Nonlinear regression
Summary
Chapter 8: Working with Support Vector Machines
Understanding the multivariate dataset
Dimensionality reduction
Getting started with support vector machine
Summary
Chapter 9: Modeling Infectious Disease with Cellular Automata
Introduction to epidemiology
The epidemic models
Modeling with cellular automata
Simulation of the SIRS model in CA with D3.js
Summary
Chapter 10: Working with Social Graphs
Structure of a graph
Social Networks Analysis
Acquiring my Facebook graph
Representing graphs with Gephi
Statistical analysis
Degree distribution
Transforming GDF to JSON
Graph visualization with D3.js
Summary
Chapter 11: Sentiment Analysis of Twitter Data
The anatomy of Twitter data
Using OAuth to access Twitter API
Getting started with Twython
Sentiment classification
Getting started with Natural Language Toolkit (NLTK)
Summary
Chapter 12: Data Processing and Aggregation with MongoDB
Getting started with MongoDB
Data preparation
Group
The aggregation framework
Summary
Chapter 13: Working with MapReduce
MapReduce overview
Programming model
Using MapReduce with MongoDB
Filtering the input collection
Grouping and aggregation
Word cloud visualization of the most common positive words in tweets
Summary
Chapter 14: Online Data Analysis with IPython and Wakari
Getting started with Wakari
Getting started with IPython Notebook
Introduction to image processing with PIL
Getting started with Pandas
Multiprocessing with IPython
Sharing your Notebook
Summary

Book Details

ISBN 139781783280995
Paperback360 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

Practical Data Science Cookbook Book Cover
Practical Data Science Cookbook
$ 29.99
$ 21.00
Building Machine Learning Systems with Python Book Cover
Building Machine Learning Systems with Python
$ 29.99
$ 6.00
Machine Learning with R Book Cover
Machine Learning with R
$ 32.99
$ 23.10
Data Visualization with D3.js Cookbook Book Cover
Data Visualization with D3.js Cookbook
$ 26.99
$ 18.90
Mastering Web Application Development with AngularJS Book Cover
Mastering Web Application Development with AngularJS
$ 26.99
$ 5.40
Python Data Analysis Book Cover
Python Data Analysis
$ 29.99
$ 21.00