Python Data Analysis - Second Edition

Learn how to apply powerful data analysis techniques with popular open source Python modules
Preview in Mapt

Python Data Analysis - Second Edition

Armando Fandango

2 customer reviews
Learn how to apply powerful data analysis techniques with popular open source Python modules
Mapt Subscription
FREE
$29.99/m after trial
eBook
$10.00
RRP $39.99
Save 74%
Print + eBook
$49.99
RRP $49.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$10.00
$49.99
$29.99 p/m after trial
RRP $39.99
RRP $49.99
Subscription
eBook
Print + eBook
Start 30 Day Trial

Frequently bought together


Python Data Analysis - Second Edition Book Cover
Python Data Analysis - Second Edition
$ 39.99
$ 10.00
Network Analysis using Wireshark 2 Cookbook - Second Edition Book Cover
Network Analysis using Wireshark 2 Cookbook - Second Edition
$ 43.99
$ 10.00
Buy 2 for $20.00
Save $63.98
Add to Cart

Book Details

ISBN 139781787127487
Paperback330 pages

Book Description

Data analysis techniques generate useful insights from small and large volumes of data. Python, with its strong set of libraries, has become a popular platform to conduct various data analysis and predictive modeling tasks.

With this book, you will learn how to process and manipulate data with Python for complex analysis and modeling. We learn data manipulations such as aggregating, concatenating, appending, cleaning, and handling missing values, with NumPy and Pandas. The book covers how to store and retrieve data from various data sources such as SQL and NoSQL, CSV fies, and HDF5. We learn how to visualize data using visualization libraries, along with advanced topics such as signal processing, time series, textual data analysis, machine learning, and social media analysis.

The book covers a plethora of Python modules, such as matplotlib, statsmodels, scikit-learn, and NLTK. It also covers using Python with external environments such as R, Fortran, C/C++, and Boost libraries.

Table of Contents

Chapter 1: Getting Started with Python Libraries
Installing Python 3
Using IPython as a shell
Reading manual pages
Jupyter Notebook
NumPy arrays
A simple application
Where to find help and references
Listing modules inside the Python libraries
Visualizing data using Matplotlib
Summary
Chapter 2: NumPy Arrays
The NumPy array object
Creating a multidimensional array
Selecting NumPy array elements
NumPy numerical types
One-dimensional slicing and indexing
Manipulating array shapes
Creating array views and copies
Fancy indexing
Indexing with a list of locations
Indexing NumPy arrays with Booleans
Broadcasting NumPy arrays
Summary
References
Chapter 3: The Pandas Primer
Installing and exploring Pandas
The Pandas DataFrames
The Pandas Series
Querying data in Pandas
Statistics with Pandas DataFrames
Data aggregation with Pandas DataFrames
Concatenating and appending DataFrames
Joining DataFrames
Handling missing values
Dealing with dates
Pivot tables
Summary
References
Chapter 4: Statistics and Linear Algebra
Basic descriptive statistics with NumPy
Linear algebra with NumPy
Finding eigenvalues and eigenvectors with NumPy
NumPy random numbers
Creating a NumPy masked array
Summary
Chapter 5: Retrieving, Processing, and Storing Data
Writing CSV files with NumPy and Pandas
The binary .npy and pickle formats
Storing data with PyTables
Reading and writing Pandas DataFrames to HDF5 stores
Reading and writing to Excel with Pandas
Using REST web services and JSON
Reading and writing JSON with Pandas
Parsing RSS and Atom feeds
Parsing HTML with Beautiful Soup
Summary
Reference
Chapter 6: Data Visualization
The matplotlib subpackages
Basic matplotlib plots
Logarithmic plots
Scatter plots
Legends and annotations
Three-dimensional plots
Plotting in Pandas
Lag plots
Autocorrelation plots
Plot.ly
Summary
Chapter 7: Signal Processing and Time Series
The statsmodels modules
Moving averages
Window functions
Defining cointegration
Autocorrelation
Autoregressive models
ARMA models
Generating periodic signals
Fourier analysis
Spectral analysis
Filtering
Summary
Chapter 8: Working with Databases
Lightweight access with sqlite3
Accessing databases from Pandas
SQLAlchemy
Pony ORM
Dataset - databases for lazy people
PyMongo and MongoDB
Storing data in Redis
Storing data in memcache
Apache Cassandra
Summary
Chapter 9: Analyzing Textual Data and Social Media
Installing NLTK
About NLTK
Filtering out stopwords, names, and numbers
The bag-of-words model
Analyzing word frequencies
Naive Bayes classification
Sentiment analysis
Creating word clouds
Social network analysis
Summary
Chapter 10: Predictive Analytics and Machine Learning
Preprocessing
Classification with logistic regression
Classification with support vector machines
Regression with ElasticNetCV
Support vector regression
Clustering with affinity propagation
Mean shift
Genetic algorithms
Neural networks
Decision trees
Summary
Chapter 11: Environments Outside the Python Ecosystem and Cloud Computing
Exchanging information with Matlab/Octave
Installing rpy2 package
Interfacing with R
Sending NumPy arrays to Java
Integrating SWIG and NumPy
Integrating Boost and Python
Using Fortran code through f2py
PythonAnywhere Cloud
Summary
Chapter 12: Performance Tuning, Profiling, and Concurrency
Profiling the code
Installing Cython
Calling C code
Creating a process pool with multiprocessing
Speeding up embarrassingly parallel for loops with Joblib
Comparing Bottleneck to NumPy functions
Performing MapReduce with Jug
Installing MPI for Python
IPython Parallel
Summary

What You Will Learn

  • Install open source Python modules such NumPy, SciPy, Pandas, stasmodels, scikit-learn,theano, keras, and tensorflow on various platforms
  • Prepare and clean your data, and use it for exploratory analysis
  • Manipulate your data with Pandas
  • Retrieve and store your data from RDBMS, NoSQL, and distributed filesystems such as HDFS and HDF5
  • Visualize your data with open source libraries such as matplotlib, bokeh, and plotly
  • Learn about various machine learning methods such as supervised, unsupervised, probabilistic, and Bayesian
  • Understand signal processing and time series data analysis
  • Get to grips with graph processing and social network analysis

Authors

Table of Contents

Chapter 1: Getting Started with Python Libraries
Installing Python 3
Using IPython as a shell
Reading manual pages
Jupyter Notebook
NumPy arrays
A simple application
Where to find help and references
Listing modules inside the Python libraries
Visualizing data using Matplotlib
Summary
Chapter 2: NumPy Arrays
The NumPy array object
Creating a multidimensional array
Selecting NumPy array elements
NumPy numerical types
One-dimensional slicing and indexing
Manipulating array shapes
Creating array views and copies
Fancy indexing
Indexing with a list of locations
Indexing NumPy arrays with Booleans
Broadcasting NumPy arrays
Summary
References
Chapter 3: The Pandas Primer
Installing and exploring Pandas
The Pandas DataFrames
The Pandas Series
Querying data in Pandas
Statistics with Pandas DataFrames
Data aggregation with Pandas DataFrames
Concatenating and appending DataFrames
Joining DataFrames
Handling missing values
Dealing with dates
Pivot tables
Summary
References
Chapter 4: Statistics and Linear Algebra
Basic descriptive statistics with NumPy
Linear algebra with NumPy
Finding eigenvalues and eigenvectors with NumPy
NumPy random numbers
Creating a NumPy masked array
Summary
Chapter 5: Retrieving, Processing, and Storing Data
Writing CSV files with NumPy and Pandas
The binary .npy and pickle formats
Storing data with PyTables
Reading and writing Pandas DataFrames to HDF5 stores
Reading and writing to Excel with Pandas
Using REST web services and JSON
Reading and writing JSON with Pandas
Parsing RSS and Atom feeds
Parsing HTML with Beautiful Soup
Summary
Reference
Chapter 6: Data Visualization
The matplotlib subpackages
Basic matplotlib plots
Logarithmic plots
Scatter plots
Legends and annotations
Three-dimensional plots
Plotting in Pandas
Lag plots
Autocorrelation plots
Plot.ly
Summary
Chapter 7: Signal Processing and Time Series
The statsmodels modules
Moving averages
Window functions
Defining cointegration
Autocorrelation
Autoregressive models
ARMA models
Generating periodic signals
Fourier analysis
Spectral analysis
Filtering
Summary
Chapter 8: Working with Databases
Lightweight access with sqlite3
Accessing databases from Pandas
SQLAlchemy
Pony ORM
Dataset - databases for lazy people
PyMongo and MongoDB
Storing data in Redis
Storing data in memcache
Apache Cassandra
Summary
Chapter 9: Analyzing Textual Data and Social Media
Installing NLTK
About NLTK
Filtering out stopwords, names, and numbers
The bag-of-words model
Analyzing word frequencies
Naive Bayes classification
Sentiment analysis
Creating word clouds
Social network analysis
Summary
Chapter 10: Predictive Analytics and Machine Learning
Preprocessing
Classification with logistic regression
Classification with support vector machines
Regression with ElasticNetCV
Support vector regression
Clustering with affinity propagation
Mean shift
Genetic algorithms
Neural networks
Decision trees
Summary
Chapter 11: Environments Outside the Python Ecosystem and Cloud Computing
Exchanging information with Matlab/Octave
Installing rpy2 package
Interfacing with R
Sending NumPy arrays to Java
Integrating SWIG and NumPy
Integrating Boost and Python
Using Fortran code through f2py
PythonAnywhere Cloud
Summary
Chapter 12: Performance Tuning, Profiling, and Concurrency
Profiling the code
Installing Cython
Calling C code
Creating a process pool with multiprocessing
Speeding up embarrassingly parallel for loops with Joblib
Comparing Bottleneck to NumPy functions
Performing MapReduce with Jug
Installing MPI for Python
IPython Parallel
Summary

Book Details

ISBN 139781787127487
Paperback330 pages
Read More
From 2 reviews

Read More Reviews

Recommended for You

Python: End-to-end Data Analysis Book Cover
Python: End-to-end Data Analysis
$ 71.99
$ 10.00
Python: Data Analytics and Visualization Book Cover
Python: Data Analytics and Visualization
$ 79.99
$ 10.00
Python GUI Programming Cookbook - Second Edition Book Cover
Python GUI Programming Cookbook - Second Edition
$ 39.99
$ 10.00
Statistics for Machine Learning Book Cover
Statistics for Machine Learning
$ 39.99
$ 10.00
Python High Performance - Second Edition Book Cover
Python High Performance - Second Edition
$ 31.99
$ 10.00
Python: Deeper Insights into Machine Learning Book Cover
Python: Deeper Insights into Machine Learning
$ 69.99
$ 10.00