Mastering pandas

Master the features and capabilities of pandas, a data analysis toolkit for Python

Mastering pandas

This ebook is included in a Mapt subscription
Femi Anthony

3 customer reviews
Master the features and capabilities of pandas, a data analysis toolkit for Python
$0.00
$20.00
$49.99
$29.99p/m after trial
RRP $39.99
RRP $49.99
Subscription
eBook
Print + eBook
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Preview in Mapt

Book Details

ISBN 139781783981960
Paperback364 pages

Book Description

Python is a ground breaking language for its simplicity and succinctness, allowing the user to achieve a great deal with a few lines of code, especially compared to other programming languages. The pandas brings these features of Python into the data analysis realm, by providing expressiveness, simplicity, and powerful capabilities for the task of data analysis. By mastering pandas, users will be able to do complex data analysis in a short period of time, as well as illustrate their findings using the rich visualization capabilities of related tools such as IPython and matplotlib.

This book is an in-depth guide to the use of pandas for data analysis, for either the seasoned data analysis practitioner or the novice user. It provides a basic introduction to the pandas framework, and takes users through the installation of the library and the IPython interactive environment. Thereafter, you will learn basic as well as advanced features, such as MultiIndexing, modifying data structures, and sampling data, which provide powerful capabilities for data analysis.

Table of Contents

Chapter 1: Introduction to pandas and Data Analysis
Motivation for data analysis
How Python and pandas fit into the data analytics mix
What is pandas?
Benefits of using pandas
Summary
Chapter 2: Installation of pandas and the Supporting Software
Selecting a version of Python to use
Python installation
Installation of Python and pandas from a third-party vendor
Continuum Analytics Anaconda
Other numeric or analytics-focused Python distributions
Downloading and installing pandas
IPython installation
Summary
Chapter 3: The pandas Data Structures
NumPy ndarrays
Data structures in pandas
Summary
Chapter 4: Operations in pandas, Part I – Indexing and Selecting
Basic indexing
Label, integer, and mixed indexing
Boolean indexing
Summary
Chapter 5: Operations in pandas, Part II – Grouping, Merging, and Reshaping of Data
Grouping of data
Merging and joining
Pivots and reshaping data
Summary
Chapter 6: Missing Data, Time Series, and Plotting Using Matplotlib
Handling missing data
Handling time series
A summary of Time Series-related objects
Summary
Chapter 7: A Tour of Statistics – The Classical Approach
Descriptive statistics versus inferential statistics
Measures of central tendency and variability
Hypothesis testing – the null and alternative hypotheses
Summary
Chapter 8: A Brief Tour of Bayesian Statistics
Introduction to Bayesian statistics
Mathematical framework for Bayesian statistics
Probability distributions
Bayesian statistics versus Frequentist statistics
Conducting Bayesian statistical analysis
Monte Carlo estimation of the likelihood function and PyMC
References
Summary
Chapter 9: The pandas Library Architecture
Introduction to pandas' file hierarchy
Description of pandas' modules and files
Improving performance using Python extensions
Summary
Chapter 10: R and pandas Compared
R data types
Slicing and selection
Arithmetic operations on columns
Aggregation and GroupBy
Comparing matching operators in R and pandas
Logical subsetting
Split-apply-combine
Reshaping using melt
Factors/categorical data
Summary
Chapter 11: Brief Tour of Machine Learning
Role of pandas in machine learning
Installation of scikit-learn
Introduction to machine learning
Application of machine learning – Kaggle Titanic competition
Data analysis and preprocessing using pandas
A naïve approach to Titanic problem
The scikit-learn ML/classifier interface
Supervised learning algorithms
Unsupervised learning algorithms
Summary

What You Will Learn

  • Download, install, and set up Python, pandas, and related tools to perform data analysis for different operating environments
  • Practice using IPython as an interactive environment for doing data analysis using pandas
  • Master the core features of pandas used in data analysis
  • Get to grips with the more advanced features of pandas
  • Understand the basics of using matplotlib to plot data analysis results
  • Analyze real-world datasets using pandas
  • Acquire knowledge of using pandas for basic statistical analysis

Authors

Table of Contents

Chapter 1: Introduction to pandas and Data Analysis
Motivation for data analysis
How Python and pandas fit into the data analytics mix
What is pandas?
Benefits of using pandas
Summary
Chapter 2: Installation of pandas and the Supporting Software
Selecting a version of Python to use
Python installation
Installation of Python and pandas from a third-party vendor
Continuum Analytics Anaconda
Other numeric or analytics-focused Python distributions
Downloading and installing pandas
IPython installation
Summary
Chapter 3: The pandas Data Structures
NumPy ndarrays
Data structures in pandas
Summary
Chapter 4: Operations in pandas, Part I – Indexing and Selecting
Basic indexing
Label, integer, and mixed indexing
Boolean indexing
Summary
Chapter 5: Operations in pandas, Part II – Grouping, Merging, and Reshaping of Data
Grouping of data
Merging and joining
Pivots and reshaping data
Summary
Chapter 6: Missing Data, Time Series, and Plotting Using Matplotlib
Handling missing data
Handling time series
A summary of Time Series-related objects
Summary
Chapter 7: A Tour of Statistics – The Classical Approach
Descriptive statistics versus inferential statistics
Measures of central tendency and variability
Hypothesis testing – the null and alternative hypotheses
Summary
Chapter 8: A Brief Tour of Bayesian Statistics
Introduction to Bayesian statistics
Mathematical framework for Bayesian statistics
Probability distributions
Bayesian statistics versus Frequentist statistics
Conducting Bayesian statistical analysis
Monte Carlo estimation of the likelihood function and PyMC
References
Summary
Chapter 9: The pandas Library Architecture
Introduction to pandas' file hierarchy
Description of pandas' modules and files
Improving performance using Python extensions
Summary
Chapter 10: R and pandas Compared
R data types
Slicing and selection
Arithmetic operations on columns
Aggregation and GroupBy
Comparing matching operators in R and pandas
Logical subsetting
Split-apply-combine
Reshaping using melt
Factors/categorical data
Summary
Chapter 11: Brief Tour of Machine Learning
Role of pandas in machine learning
Installation of scikit-learn
Introduction to machine learning
Application of machine learning – Kaggle Titanic competition
Data analysis and preprocessing using pandas
A naïve approach to Titanic problem
The scikit-learn ML/classifier interface
Supervised learning algorithms
Unsupervised learning algorithms
Summary

Book Details

ISBN 139781783981960
Paperback364 pages
Read More
From 3 reviews

Read More Reviews