Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Learning Pandas

You're reading from   Learning Pandas Get to grips with pandas - a versatile and high-performance Python library for data manipulation, analysis, and discovery

Arrow left icon
Product type Paperback
Published in Apr 2015
Publisher Packt
ISBN-13 9781783985128
Length 504 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Michael Heydt Michael Heydt
Author Profile Icon Michael Heydt
Michael Heydt
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. A Tour of pandas FREE CHAPTER 2. Installing pandas 3. NumPy for pandas 4. The pandas Series Object 5. The pandas DataFrame Object 6. Accessing Data 7. Tidying Up Your Data 8. Combining and Reshaping Data 9. Grouping and Aggregating Data 10. Time-series Data 11. Visualization 12. Applications to Finance Index

pandas and why it is important

pandas is a library containing high-level data structures and tools that have been created to assist a Python programmer to perform powerful data manipulations, and discover information in that data in a simple and fast way.

The simple and effective data analysis requires the ability to index, retrieve, tidy, reshape, combine, slice, and perform various analyses on both single and multidimensional data, including heterogeneous typed data that is automatically aligned along index labels. To enable these capabilities, pandas provides the following features (and many more not explicitly mentioned here):

  • High performance array and table structures for representation of homogenous and heterogeneous data sets: the Series and DataFrame objects
  • Flexible reshaping of data structure, allowing the ability to insert and delete both rows and columns of tabular data
  • Hierarchical indexing of data along multiple axes (both rows and columns), allowing multiple labels per data item
  • Labeling of series and tabular data to facilitate indexing and automatic alignment of data
  • Ability to easily identify and fix missing data, both in floating point and as non-floating point formats
  • Powerful grouping capabilities and a functionality to perform split-apply-combine operations on series and tabular data
  • Simple conversion from ragged and differently indexed data of both NumPy and Python data structures to pandas objects
  • Smart label-based slicing and subsetting of data sets, including intuitive and flexible merging, and joining of data with SQL-like constructs
  • Extensive I/O facilities to load and save data from multiple formats including CSV, Excel, relational and non-relational databases, HDF5 format, and JSON
  • Explicit support for time series-specific functionality, providing functionality for date range generation, moving window statistics, time shifting, lagging, and so on
  • Built-in support to retrieve and automatically parse data from various web-based data sources such as Yahoo!, Google Finance, the World Bank, and several others

For those desiring to get into data analysis and the emerging field of data science, pandas offers an excellent means for a Python programmer (or just an enthusiast) to learn data manipulation. For those just learning or coming from a statistical language like R, pandas can offer an excellent introduction to Python as a programming language.

pandas itself is not a data science toolkit. It does provide some statistical methods as a matter of convenience, but to draw conclusions from data, it leans upon other packages in the Python ecosystem, such as SciPy, NumPy, scikit-learn, and upon graphics libraries such as matplotlib and ggvis for data visualization. This is actually the strength of pandas over other languages such as R, as pandas applications are able to leverage an extensive network of robust Python frameworks already built and tested elsewhere.

In this book, we will look at how to use pandas for data manipulation, with a specific focus on gathering, cleaning, and manipulation of various forms of data using pandas. Detailed specifics of data science, finance, econometrics, social network analysis, Python, and IPython are left as reference. You can refer to some other excellent books on these topics already available at https://www.packtpub.com/.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Learning Pandas
You have been reading a chapter from
Learning Pandas
Published in: Apr 2015
Publisher: Packt
ISBN-13: 9781783985128
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Modal Close icon
Modal Close icon