You're reading from Learning Pandas Get to grips with pandas - a versatile and high-performance Python library for data manipulation, analysis, and discovery

Product type Paperback

Published in Apr 2015

Publisher Packt

ISBN-13 9781783985128

Length 504 pages

Edition 1st Edition

Languages

Python

Tools

Pandas

Concepts

Data Analysis

Author (1):

Michael Heydt

View More author details

Table of Contents (14) Chapters

Preface

1. A Tour of pandas FREE CHAPTER

2. Installing pandas

3. NumPy for pandas

4. The pandas Series Object

5. The pandas DataFrame Object

6. Accessing Data

7. Tidying Up Your Data

8. Combining and Reshaping Data

9. Grouping and Aggregating Data

10. Time-series Data

11. Visualization

12. Applications to Finance

Index

pandas and why it is important

pandas is a library containing high-level data structures and tools that have been created to assist a Python programmer to perform powerful data manipulations, and discover information in that data in a simple and fast way.

The simple and effective data analysis requires the ability to index, retrieve, tidy, reshape, combine, slice, and perform various analyses on both single and multidimensional data, including heterogeneous typed data that is automatically aligned along index labels. To enable these capabilities, pandas provides the following features (and many more not explicitly mentioned here):

High performance array and table structures for representation of homogenous and heterogeneous data sets: the Series and DataFrame objects
Flexible reshaping of data structure, allowing the ability to insert and delete both rows and columns of tabular data
Hierarchical indexing of data along multiple axes (both rows and columns), allowing multiple labels per data item
Labeling of series and tabular data to facilitate indexing and automatic alignment of data
Ability to easily identify and fix missing data, both in floating point and as non-floating point formats
Powerful grouping capabilities and a functionality to perform split-apply-combine operations on series and tabular data
Simple conversion from ragged and differently indexed data of both NumPy and Python data structures to pandas objects
Smart label-based slicing and subsetting of data sets, including intuitive and flexible merging, and joining of data with SQL-like constructs
Extensive I/O facilities to load and save data from multiple formats including CSV, Excel, relational and non-relational databases, HDF5 format, and JSON
Explicit support for time series-specific functionality, providing functionality for date range generation, moving window statistics, time shifting, lagging, and so on
Built-in support to retrieve and automatically parse data from various web-based data sources such as Yahoo!, Google Finance, the World Bank, and several others

For those desiring to get into data analysis and the emerging field of data science, pandas offers an excellent means for a Python programmer (or just an enthusiast) to learn data manipulation. For those just learning or coming from a statistical language like R, pandas can offer an excellent introduction to Python as a programming language.

pandas itself is not a data science toolkit. It does provide some statistical methods as a matter of convenience, but to draw conclusions from data, it leans upon other packages in the Python ecosystem, such as SciPy, NumPy, scikit-learn, and upon graphics libraries such as matplotlib and ggvis for data visualization. This is actually the strength of pandas over other languages such as R, as pandas applications are able to leverage an extensive network of robust Python frameworks already built and tested elsewhere.

In this book, we will look at how to use pandas for data manipulation, with a specific focus on gathering, cleaning, and manipulation of various forms of data using pandas. Detailed specifics of data science, finance, econometrics, social network analysis, Python, and IPython are left as reference. You can refer to some other excellent books on these topics already available at https://www.packtpub.com/.

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

You're reading from Learning Pandas Get to grips with pandas - a versatile and high-performance Python library for data manipulation, analysis, and discovery

Table of Contents (14) Chapters

pandas and why it is important

Authors (1)

Personalised recommendations for you

You're reading from Learning Pandas Get to grips with pandas - a versatile and high-performance Python library for data manipulation, analysis, and discovery

Table of Contents (14) Chapters

pandas and why it is important

Authors (1)

Personalised recommendations for you

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access