You're reading from Mastering pandas. - Second Edition

Product typeBook

Published inOct 2019

Reading LevelIntermediate

Publisher

ISBN-139781789343236

Edition2nd Edition

Languages

Python

Tools

Pandas

Concepts

Scientific Computing

Author (1)

Ashish Kumar

Installation of pandas and Supporting Software

Before we can start work on pandas for doing data analysis, we need to make sure that the software is installed and the environment is in proper working order. This chapter deals with the installation of Python (if necessary), the pandas library, and all necessary dependencies for the Windows, macOS/X, and Linux platforms. The topics we address include, among other things, selecting a version of Python, installing Python, and installing pandas.

The steps outlined in the following section should work for the most part, but your mileage may vary depending upon the setup. On different operating system versions, the scripts may not always work perfectly, and the third-party software packages already in the system may sometimes conflict with the instructions provided.

The following topics will be covered in this chapter:

Selecting a...

Selecting a version of Python to use

This is a classic battle among Python developers—Python 2.7.x or Python 3.x—which is better? Until a year back, it was Python 2.7.x that topped the charts; the reason being it was a stable version. More than 70% of projects used Python 2.7, in the year 2016. This number began to fall and by 2017 it was 63%. This shift in trends was due to the announcement that Python 2.7 would not be maintained from January 1, 2018, meaning that there would be no more bug fixes or new releases. Some libraries released after this announcement are only compatible with Python 3.x. Several businesses have started migrating towards Python 3.x. Hence, as of 2018, Python 3.x is the preferred version.

For further information, please see https://wiki.python.org/moin/Python2orPython3.

The main differences between Python 2.x and 3 include better Unicode...

Standalone Python installation

Here, we detail the standalone installation of Python on multiple platforms—Linux, Windows, and macOS/X. Standalone means just the IDLE IDE, interpreter, and some basic packages. Another option is to download from a distribution, which is a richer version and comes pre-installed with many utilities.

Linux

If you're using Linux, Python will most probably come pre-installed. If you're not sure, type the following at Command Prompt:

       which python

Python is likely to be found in one of the following folders on Linux, depending on your distribution and particular installation:

/usr/bin/python
/bin/python
/usr/local/bin/python
/opt/local/bin/python

You...

Installation of Python and pandas using Anaconda

After a standalone installation of Python, each library will have to be separately installed. It is a bit of a hassle to ensure version compatibility between newly installed libraries and the associated dependencies. This is where a third-party distribution like Anaconda comes in handy. Anaconda is the most widely used distribution for Python/R, designed for developing scalable data science solutions.

What is Anaconda?

Anaconda is an open source Python/R distribution, developed to seamlessly manage packages, dependencies and environments. It is compatible with Windows, Linux and macOS and requires 3 GB of disk space. It needs this memory to download and install quite a collection...

Dependency packages for pandas

Please note that if you are using Anaconda distribution, you don't need to install pandas separately and hence don't need to worry about installing the dependencies. It is still good to know the dependency packages that are being used behind the hood in pandas to better understand the functioning.

At the time of writing, the latest stable version of pandas is the 0.23.4 version. The various dependencies along with the associated download locations are as follows:

Package	Required	Description	Download location
`NumPy : 1.9.0` or higher	Required	NumPy library for numerical operations	http://www.numpy.org/
`python-dateutil` 2.5.0	Required	Date manipulation and utility library	http://labix.org/
`Pytz`	Required	Time zone support	http://sourceforge.net/
`Setuptools 24.2.0`	Required	Packaging Python projects...

Review of items installed with Anaconda

Anaconda installs more than 200 packages and several IDEs. Some of the widely used packages that get installed are: NumPy, pandas, scipy, scikit-learn, matplotlib, seaborn, beautifulsoup4, nltk, and dask.

Packages, which are not installed along with Anaconda, could be installed manually through Conda, Anaconda's package manager. Any package upgradation can also be done through Conda. Conda will fetch the packages from the Anaconda repository, which is huge and has more than 1400 packages. The following commands will install and update packages through conda:

To install, use conda install pandas
To update, use conda update pandas

The following IDEs are installed with Anaconda:

JupyterLab
Jupyter Notebook
QTConsole
Spyder

The IDEs could be launched either through Conda or Anaconda Navigator.

Anaconda Navigator is a GUI that lets...

Cross tooling – combining pandas awesomeness with R, Julia, H20.ai, and Azure ML Studio

Pandas can be regarded as a "wonder tool" when it comes to applications like data manipulation, data cleaning, or handling time series data. It is extremely fast and efficient, and it is powerful enough to handle small to intermediate datasets. The best part is that the use of pandas is not restricted just to Python. There are methods enabling the supremacy of pandas to be utilized in other frameworks, like R, Julia, Azure ML Studio and H20.ai. These methods of using the benefits of a superior framework from another tool is called cross-tooling and is frequently applied. One of the main reasons for this to exist is that it is almost impossible for one tool to have all the functionalities. Suppose one task has two sub-tasks: sub-task 1 can be done well in R while the sub-task...

Command line tricks for pandas

The command line is an important arsenal for pandas users. The command line can be used as an efficient and faster but tedious-to-use complement/supplement to pandas. Many of the data operations, like breaking a huge file into multiple chunks, cleaning a data file of unsupported characters, and so on, can be performed in the command line before feeding the data to pandas.

The head function of pandas is extremely useful to quickly assess the data. A command line function for head makes this option even more useful:

# Get the first 10 rows
$ head myData.csv

# Get the first 5 rows
$ head -n 5 myData.csv

# Get 100 bytes of data
$ head -c 100 myData.csv

The translate (tr) function packs within it the ability to replace characters. The following command converts all uppercase characters in a text file to lowercase characters:

$ cat upper.txt | tr "[:upper...

Options and settings for pandas

pandas allows the users to modify some display and formatting options.

The get_option() and set_option() commands let the user view the current setting and change it:

pd.get_option("display.max_rows")
Output: 60

pd.set_option("display.max_rows", 120)
pd.get_option("display.max_rows")
Output: 120

pd.reset_option("display.max_rows")
pd.get_option("display.max_rows")
Output: 60

The preceding options discussed set and reset the number of rows that are displayed when a dataframe is printed. Some of the other useful display options are the following:

max_columns: Set the number of columns to be displayed.
chop_threshold: Float values below the limit set here will be displayed as zeros.
colheader_justify: Set the justification for the column header.
date_dayfirst: Setting to 'True' prints day first...

Summary

Before we delve into the awesomeness of pandas, it is mission critical that we install Python and pandas correctly, choose the right IDEs, and set the right options. In this chapter, we discussed these and more. Here is a summary of key takeaways from the chapter:

Python 3.x is available, but many users still prefer to use version 2.7 as it is more stable and scientific-computation friendly.
The support and bug fixing for version 2.7 has now been stopped.
Translating code from one version to other is a breeze. One can also use both versions together using the virtualenv package, which comes pre-installed with Anaconda.
Anaconda is a popular Python distribution that comes with 700+ libraries/packages and several popular IDEs, such as Jupyter and Spyder.
Python codes are callable from, and usable in, other tools, like R, Azure ML Studio, H20.ai, and Julia.
Some of the day...

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.
Read more about Ashish Kumar

Other recommended products

Related to this chapter

Mastering Exploratory Analysis with pandas

Exploratory data analysis exploits the visual properties of the datasets that are commonly used by data scientists. It helps you build custom data pipelines to address data analysis tasks. This book uses pandas, the most popular Python library for data analysis, and helps you build end-to-end exploratory data-analysis solutions

BookSep 2018140 pages

Learning pandas

Pandas is a popular Python package used for practical, real world data analysis. It provides efficient fast, high-performance data structures that makes data exploration and analysis very easy. This learner's guide will help you through a comprehensive set of features provided by the pandas library to perform efficient data manipulation and analysis.

BookJun 2017446 pages

Hands-On Data Analysis with NumPy and Pandas

In this book, you will explore two important Python packages used by Data Analysts, NumPy & pandas. You will dive into different concepts such as reading, sorting, grouping of data, and also learn how to work with different data formats for your data analysis projects.

BookJun 2018168 pages5

Pandas Cookbook

Explore pandas, the powerful Python library for data analysis and manipulation by working on real-world datasets. Get to grips with the fundamentals and learn to use pandas to clean messy data, independently analyze groups within your data, make powerful time-series calculations, and create beautiful visualizations during exploratory data analysis.

BookOct 2017532 pages

Become a Python Data Analyst

Become a Python Data Analyst book introduces you to the mainstream libraries of Python’s Data Science stack. With proven examples and real-world datasets, this book teaches how to effectively perform data manipulation, visualize and analyze data patterns and brings you to the ladder of advanced topics like Predictive Analytics.

BookAug 2018178 pages

Hands-On Financial Trading with Python

This book focuses on key Python analytics and algorithmic trading libraries used for backtesting. With the help of practical examples, you will learn the principle aspects of trading strategy development. The 14 profitable strategies included in the book will also help you build intuitions that will enable you to create your own strategy.

BookApr 2021360 pages

Python Data Analysis

This book will show data analysis tasks, ranging from data retrieval, cleaning, manipulation, visualization, and storage to complex analysis and modeling using a variety of modules such as NumPy, SciPy, matplotlib, pandas, scikit-learn, and NLTK. You will be able to analyze different kinds of data including numeric, text, time-series, graph, and social media.

BookMar 2017330 pages

Statistics Crash Course for Beginners

Through both theoretical and practical study with Python, this course will get you up to speed with all you need to know about statistics in programming—a core study of machine learning.

BookMar 2021329 pages

Pandas 1.x Cookbook

A new edition of the bestselling Pandas cookbook updated to pandas 1.x with new chapters on creating and testing, and exploratory data analysis. Recipes are written with modern pandas constructs. This book also covers EDA, tidying data, pivoting data, time-series calculations, visualizations, and more.

BookFeb 2020626 pages

Essential Statistics for Non-STEM Data Analysts

Put your data science knowledge to work with this practical guide to statistics. You’ll understand the working mechanism of each method used and find out how data science algorithms function. This book will help you learn the statistical techniques required for key model building and functioning using Python.

BookNov 2020392 pages

Hands-On Exploratory Data Analysis with Python

This book provides practical knowledge about the main pillars of EDA including data cleaning, data preparation, data exploration, and data visualization. You can leverage the power of Python to understand, summarize and investigate your data in the best way possible. The book presents a unique approach to exploring hidden features in your data.

BookMar 2020352 pages

Python Data Analysis

This book takes a practical approach to Python data analysis, showing you how to use Python libraries such as pandas, NumPy, SciPy, and scikit-learn to analyze a variety of data. You’ll also get up to speed with everything from data manipulation to visualization systematically.

BookFeb 2021478 pages5

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Mastering pandas. - Second Edition

Unlock this book and the full library FREE for 7 days

Author (1)

Mastering Exploratory Analysis with pandas

Learning pandas

Hands-On Data Analysis with NumPy and Pandas

In this book, you will explore two important Python packages used by Data Analysts, NumPy &amp; pandas. You will dive into different concepts such as reading, sorting, grouping of data, and also learn how to work with different data formats for your data analysis projects.

Pandas Cookbook

Become a Python Data Analyst

Hands-On Financial Trading with Python

Python Data Analysis

Statistics Crash Course for Beginners

Through both theoretical and practical study with Python, this course will get you up to speed with all you need to know about statistics in programming—a core study of machine learning.

Pandas 1.x Cookbook

Essential Statistics for Non-STEM Data Analysts

Hands-On Exploratory Data Analysis with Python

Python Data Analysis

This book takes a practical approach to Python data analysis, showing you how to use Python libraries such as pandas, NumPy, SciPy, and scikit-learn to analyze a variety of data. You’ll also get up to speed with everything from data manipulation to visualization systematically.

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook

In this book, you will explore two important Python packages used by Data Analysts, NumPy & pandas. You will dive into different concepts such as reading, sorting, grouping of data, and also learn how to work with different data formats for your data analysis projects.