Python Data Analysis

Learn how to apply powerful data analysis techniques with popular open source Python modules

Python Data Analysis

Learning
Ivan Idris

2 customer reviews
Learn how to apply powerful data analysis techniques with popular open source Python modules
$29.99
$49.99
RRP $29.99
RRP $49.99
eBook
Print + eBook

Instantly access this course right now and get the skills you need in 2017

With unlimited access to a constantly growing library of over 4,000 eBooks and Videos, a subscription to Mapt gives you everything you need to learn new skills. Cancel anytime.

Preview in Mapt

Book Details

ISBN 139781783553358
Paperback348 pages

Book Description

Python is a multi-paradigm programming language well suited for both object-oriented application development as well as functional design patterns. Python has become the language of choice for data scientists for data analysis, visualization, and machine learning. It will give you velocity and promote high productivity.

This book will teach novices about data analysis with Python in the broadest sense possible, covering everything from data retrieval, cleaning, manipulation, visualization, and storage to complex analysis and modeling. It focuses on a plethora of open source Python modules such as NumPy, SciPy, matplotlib, pandas, IPython, Cython, scikit-learn, and NLTK. In later chapters, the book covers topics such as data visualization, signal processing, and time-series analysis, databases, predictive analytics and machine learning. This book will turn you into an ace data analyst in no time.

Table of Contents

Chapter 1: Getting Started with Python Libraries
Software used in this book
Building NumPy, SciPy, matplotlib, and IPython from source
Installing with setuptools
NumPy arrays
A simple application
Using IPython as a shell
Reading manual pages
IPython notebooks
Where to find help and references
Summary
Chapter 2: NumPy Arrays
The NumPy array object
Creating a multidimensional array
Selecting NumPy array elements
NumPy numerical types
One-dimensional slicing and indexing
Manipulating array shapes
Creating array views and copies
Fancy indexing
Indexing with a list of locations
Indexing NumPy arrays with Booleans
Broadcasting NumPy arrays
Summary
Chapter 3: Statistics and Linear Algebra
NumPy and SciPy modules
Basic descriptive statistics with NumPy
Linear algebra with NumPy
Finding eigenvalues and eigenvectors with NumPy
NumPy random numbers
Creating a NumPy-masked array
Summary
Chapter 4: pandas Primer
Installing and exploring pandas
pandas DataFrames
pandas Series
Querying data in pandas
Statistics with pandas DataFrames
Data aggregation with pandas DataFrames
Concatenating and appending DataFrames
Joining DataFrames
Handling missing values
Dealing with dates
Pivot tables
Remote data access
Summary
Chapter 5: Retrieving, Processing, and Storing Data
Writing CSV files with NumPy and pandas
Comparing the NumPy .npy binary format and pickling pandas DataFrames
Storing data with PyTables
Reading and writing pandas DataFrames to HDF5 stores
Reading and writing to Excel with pandas
Using REST web services and JSON
Reading and writing JSON with pandas
Parsing RSS and Atom feeds
Parsing HTML with Beautiful Soup
Summary
Chapter 6: Data Visualization
matplotlib subpackages
Basic matplotlib plots
Logarithmic plots
Scatter plots
Legends and annotations
Three-dimensional plots
Plotting in pandas
Lag plots
Autocorrelation plots
Plot.ly
Summary
Chapter 7: Signal Processing and Time Series
statsmodels subpackages
Moving averages
Window functions
Defining cointegration
Autocorrelation
Autoregressive models
ARMA models
Generating periodic signals
Fourier analysis
Spectral analysis
Filtering
Summary
Chapter 8: Working with Databases
Lightweight access with sqlite3
Accessing databases from pandas
SQLAlchemy
Pony ORM
Dataset – databases for lazy people
PyMongo and MongoDB
Storing data in Redis
Apache Cassandra
Summary
Chapter 9: Analyzing Textual Data and Social Media
Installing NLTK
Filtering out stopwords, names, and numbers
The bag-of-words model
Analyzing word frequencies
Naive Bayes classification
Sentiment analysis
Creating word clouds
Social network analysis
Summary
Chapter 10: Predictive Analytics and Machine Learning
A tour of scikit-learn
Preprocessing
Classification with logistic regression
Classification with support vector machines
Regression with ElasticNetCV
Support vector regression
Clustering with affinity propagation
Mean Shift
Genetic algorithms
Neural networks
Decision trees
Summary
Chapter 11: Environments Outside the Python Ecosystem and Cloud Computing
Exchanging information with MATLAB/Octave
Installing rpy2
Interfacing with R
Sending NumPy arrays to Java
Integrating SWIG and NumPy
Integrating Boost and Python
Using Fortran code through f2py
Setting up Google App Engine
Running programs on PythonAnywhere
Working with Wakari
Summary
Chapter 12: Performance Tuning, Profiling, and Concurrency
Profiling the code
Installing Cython
Calling C code
Creating a process pool with multiprocessing
Speeding up embarrassingly parallel for loops with Joblib
Comparing Bottleneck to NumPy functions
Performing MapReduce with Jug
Installing MPI for Python
IPython Parallel
Summary

What You Will Learn

  • Install open source Python modules on various platforms
  • Get to know about the fundamentals of NumPy including arrays
  • Manipulate data with pandas
  • Retrieve, process, store, and visualize data
  • Understand signal processing and time-series data analysis
  • Work with relational and NoSQL databases
  • Discover more about data modeling and machine learning
  • Get to grips with interoperability and cloud computing

Authors

Table of Contents

Chapter 1: Getting Started with Python Libraries
Software used in this book
Building NumPy, SciPy, matplotlib, and IPython from source
Installing with setuptools
NumPy arrays
A simple application
Using IPython as a shell
Reading manual pages
IPython notebooks
Where to find help and references
Summary
Chapter 2: NumPy Arrays
The NumPy array object
Creating a multidimensional array
Selecting NumPy array elements
NumPy numerical types
One-dimensional slicing and indexing
Manipulating array shapes
Creating array views and copies
Fancy indexing
Indexing with a list of locations
Indexing NumPy arrays with Booleans
Broadcasting NumPy arrays
Summary
Chapter 3: Statistics and Linear Algebra
NumPy and SciPy modules
Basic descriptive statistics with NumPy
Linear algebra with NumPy
Finding eigenvalues and eigenvectors with NumPy
NumPy random numbers
Creating a NumPy-masked array
Summary
Chapter 4: pandas Primer
Installing and exploring pandas
pandas DataFrames
pandas Series
Querying data in pandas
Statistics with pandas DataFrames
Data aggregation with pandas DataFrames
Concatenating and appending DataFrames
Joining DataFrames
Handling missing values
Dealing with dates
Pivot tables
Remote data access
Summary
Chapter 5: Retrieving, Processing, and Storing Data
Writing CSV files with NumPy and pandas
Comparing the NumPy .npy binary format and pickling pandas DataFrames
Storing data with PyTables
Reading and writing pandas DataFrames to HDF5 stores
Reading and writing to Excel with pandas
Using REST web services and JSON
Reading and writing JSON with pandas
Parsing RSS and Atom feeds
Parsing HTML with Beautiful Soup
Summary
Chapter 6: Data Visualization
matplotlib subpackages
Basic matplotlib plots
Logarithmic plots
Scatter plots
Legends and annotations
Three-dimensional plots
Plotting in pandas
Lag plots
Autocorrelation plots
Plot.ly
Summary
Chapter 7: Signal Processing and Time Series
statsmodels subpackages
Moving averages
Window functions
Defining cointegration
Autocorrelation
Autoregressive models
ARMA models
Generating periodic signals
Fourier analysis
Spectral analysis
Filtering
Summary
Chapter 8: Working with Databases
Lightweight access with sqlite3
Accessing databases from pandas
SQLAlchemy
Pony ORM
Dataset – databases for lazy people
PyMongo and MongoDB
Storing data in Redis
Apache Cassandra
Summary
Chapter 9: Analyzing Textual Data and Social Media
Installing NLTK
Filtering out stopwords, names, and numbers
The bag-of-words model
Analyzing word frequencies
Naive Bayes classification
Sentiment analysis
Creating word clouds
Social network analysis
Summary
Chapter 10: Predictive Analytics and Machine Learning
A tour of scikit-learn
Preprocessing
Classification with logistic regression
Classification with support vector machines
Regression with ElasticNetCV
Support vector regression
Clustering with affinity propagation
Mean Shift
Genetic algorithms
Neural networks
Decision trees
Summary
Chapter 11: Environments Outside the Python Ecosystem and Cloud Computing
Exchanging information with MATLAB/Octave
Installing rpy2
Interfacing with R
Sending NumPy arrays to Java
Integrating SWIG and NumPy
Integrating Boost and Python
Using Fortran code through f2py
Setting up Google App Engine
Running programs on PythonAnywhere
Working with Wakari
Summary
Chapter 12: Performance Tuning, Profiling, and Concurrency
Profiling the code
Installing Cython
Calling C code
Creating a process pool with multiprocessing
Speeding up embarrassingly parallel for loops with Joblib
Comparing Bottleneck to NumPy functions
Performing MapReduce with Jug
Installing MPI for Python
IPython Parallel
Summary

Book Details

ISBN 139781783553358
Paperback348 pages
Read More
From 2 reviews

Read More Reviews