Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Mastering pandas. - Second Edition

You're reading from  Mastering pandas. - Second Edition

Product type Book
Published in Oct 2019
Publisher
ISBN-13 9781789343236
Pages 674 pages
Edition 2nd Edition
Languages
Author (1):
Ashish Kumar Ashish Kumar
Profile icon Ashish Kumar

Table of Contents (21) Chapters

Preface Section 1: Overview of Data Analysis and pandas
Introduction to pandas and Data Analysis Installation of pandas and Supporting Software Section 2: Data Structures and I/O in pandas
Using NumPy and Data Structures with pandas I/Os of Different Data Formats with pandas Section 3: Mastering Different Data Operations in pandas
Indexing and Selecting in pandas Grouping, Merging, and Reshaping Data in pandas Special Data Operations in pandas Time Series and Plotting Using Matplotlib Section 4: Going a Step Beyond with pandas
Making Powerful Reports In Jupyter Using pandas A Tour of Statistics with pandas and NumPy A Brief Tour of Bayesian Statistics and Maximum Likelihood Estimates Data Case Studies Using pandas The pandas Library Architecture pandas Compared with Other Tools A Brief Tour of Machine Learning Other Books You May Enjoy

The pandas Library Architecture

In this chapter, we examine the various libraries that are available to pandas users. This chapter is intended to be a short guide to help the user to navigate and find their way around the various modules and libraries that pandas provides. It gives a breakdown of how the library code is organized, and it gives a brief description of the various modules. It will be most valuable to users who are interested in seeing the inner workings of pandas , as well as to those who wish to make contributions to the code base. We will also briefly demonstrate how you can improve performance using Python extensions. The various topics that will be discussed are as follows:

  • Introduction to the pandas library hierarchy
  • Description of pandas modules and files
  • Improving performance using Python extensions

Understanding the pandas file hierarchy

Generally, upon installation, pandas is installed as a Python module in a standard location for third-party Python modules. In the following table, you will see the standard installation location for Unix/ macOS and the Windows platform:

Platform

Standard installation location

Example

Unix/macOS

prefix/lib/pythonX.Y/site-packages

/usr/local/lib/python2.7/site-packages

Windows

prefix\Lib\site-packages

C:\Python27\Lib\site-packages

If Python installation was done with Anaconda, then the pandas module can be found in the Anaconda directory, within a similar file path: Anaconda3\pkgs\pandas-0.23.4-py37h830ac7b_0\Lib\site-packages\pandas.

Now that we have had a look at the module on third-party Python modules, we will understand the file hierarchy. There are eight types of file in the installed Pandas library....

Improving performance using Python extensions

One of the gripes of Python and pandas users is that the ease of use and expressiveness of the language and module comes with a significant downsidethe performance. This happens especially when it comes to numeric computing.

According to programming benchmark standards, Python is often slower than compiled languages, such as C/C++, for many algorithms or data structure operations. An example of this would be binary-tree operations. In one simulation experiment, Python3 ran 104 times slower than the fastest C++ implementation of an n-body simulation calculation.

So, how can we solve this legitimate, yet vexing problem? We can mitigate this slowness in Python while maintaining the things that we likeclarity and productivity. This can be done by writing the parts of our code that are performance-sensitive-for example,...

Summary

To summarize this chapter, we took a tour of the library hierarchy of pandas in an attempt to illustrate the internal guts of the library. This understanding will be useful for building custom modules from pandas code or improving the functionalities of pandas as an open source contributor. We also touched on the benefits of speeding up our code performance by using a Python extension module.

In the next chapter, we will see how pandas compares to other data analysis tools in terms of various analysis operations.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Mastering pandas. - Second Edition
Published in: Oct 2019 Publisher: ISBN-13: 9781789343236
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}