Packt+ | Advance your knowledge in tech

You're reading from IPython Notebook Essentials

Product typeBook

Published inNov 2014

Publisher

ISBN-139781783988341

Edition1st Edition

Tools

IPython

Concepts

Scientific Computing

Author (1)

Luiz Felipe Martins

Chapter 4. Handling Data with pandas

In this chapter, we will introduce pandas, a powerful and versatile Python library that provides tools for data handling and analysis. We will consider the two main pandas structures for storing data, the Series and DataFrame objects, in detail. You will learn how to create these structures and how to access and insert data into them. We also cover the important topic of slicing, that is, how to access portions of data using the different indexing methods provided by pandas. Next, we'll discuss the computational and graphics tools offered by pandas, and finish the chapter by demonstrating how to work with a realistic dataset.

pandas is an extensive package for data-oriented manipulation, and it is beyond the scope of this book to realistically cover all aspects of the package. We will cover only some of the most useful data structures and functionalities. In particular, we will not cover the Panel data structure and multi-indexes. However, we will provide...

The Series class

A Series object represents a one-dimensional, indexed series of data. It can be thought of as a dictionary, with one main difference: the indexes in a Series class are ordered. The following example constructs a Series object and displays it:

grades1 = Series([76, 82, 78, 100],
                 index = ['Alex', 'Robert', 'Minnie', 'Alice'],
                 name = 'Assignment 1', dtype=float64)
grades1

This produces the following output:

Alex       76
Robert     82
Minnie     78
Alice     100
Name: Assignment 1, dtype: float64

Notice the format of the constructor call:

Series(<data>, index=<indexes>, name=<name>, dtype=<type>)

Both data and indexes are usually lists or NumPy arrays, but can be any Python iterable. The lists must have the same length. The name variable is a string that describes the data in the series. The type variable is a NumPy data type. The indexes and the name variables are optional (if indexes are omitted, they are set to integers...

The DataFrame class

The DataFrame class is used to represent two-dimensional data. To illustrate its use, let's create a DataFrame class containing student data as follows:

grades = DataFrame(
    [['Alice',  80., 92., 84,],
     ['Bob',    78., NaN, 86,],
     ['Samaly', 75., 78., 88.]],
    index = [17005, 17035, 17028],
    columns = ['Name', 'Test 1', 'Test 2', 'Final']
    )

This code demonstrates one of the most straightforward ways to construct a DataFrame class. In the preceding case, the data can be specified as any two-dimensional Python data structure, such as a list of lists (as shown in the example) or a NumPy array. The index option sets the row names, which are integers representing student IDs here. Likewise, the columns option sets the column names. Both the index and column arguments can be given as any one-dimensional Python structure, such as lists, NumPy arrays, or a Series object.

To display the output of the DataFrame class, run the following statement in a cell:

grades...

Computational and graphics tools

The objects of pandas have a rich set of built-in computational tools. To illustrate some of this functionality, we will use the random data stored in the dframe object defined in the previous section. If you discarded that object, here is how to construct it again:

means = [0, 0, 1, 1, -1, -1, -2, -2]
sdevs = [1, 2, 1, 2,  1,  2,  1,  2]
random_data = {}
nrows = 30
for mean, sdev in zip(means, sdevs):
    label = 'Mean={}, sd={}'.format(mean, sdev)
    random_data[label] = normal(mean, sdev, nrows)
row_labels = ['Row {}'.format(i) for i in range(nrows)]
dframe = DataFrame (random_data, index=row_labels)

Let's explore some of this functionality of the built-in computational tools.

To get a list of the methods available for the object, start typing the following command in a cell:
```
dframe.
```
Then, press the Tab key. The completion popup allows us to select a method by double clicking on it. For example, double click on mean. The cell text changes to the following...

An example with a realistic dataset

In this section, we will work with a realistic dataset of moderate size. We will use the World Development Indicators dataset, which is provided free of charge by the World Bank. This is a reasonably sized dataset that is not too large or complex to experiment with.

In any real application, we will need to read data from some source, reformat it to our purposes, and save the reformatted data back to some storage system. pandas offers facilities for data retrieval and storage in multiple formats:

Comma-separated values (CSV) in text files
Excel
JSON
SQL
HTML
Stata
Clipboard data in text format
Python-pickled data

The list of formats supported by pandas keeps growing with each new update to the library. Please refer to http://pandas.pydata.org/pandas-docs/stable/io.html for a current list.

Treating all formats supported by pandas is not possible in a book with the current scope. We will restrict examples to CSV files, which is a simple text format that is widely...

Summary

In this chapter, we covered the objects of pandas, Series and DataFrame, which are specialized containers for data-oriented computations. We discussed how to create, access, and modify these objects, including advanced indexing and slicing operations. We also considered the computational and graphical capabilities offered by pandas. We then discussed how these capabilities can be leveraged to work with a realistic dataset.

In the next chapter, we will learn how to use SciPy to solve advanced mathematical problems of modeling, science, and engineering.

The rest of the chapter is locked

You have been reading a chapter from

IPython Notebook Essentials

Published in: Nov 2014Publisher: ISBN-13: 9781783988341

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Luiz Felipe Martins

Luiz Felipe Martins holds a PhD in applied mathematics from Brown University and has worked as a researcher and educator for more than 20 years. His research is mainly in the field of applied probability. He has been involved in developing code for the open source homework system, WeBWorK, where he wrote a library for the visualization of systems of differential equations. He was supported by an NSF grant for this project. Currently, he is an Associate Professor in the Department of Mathematics at Cleveland State University, Cleveland, Ohio, where he has developed several courses in applied mathematics and scientific computing. His current duties include coordinating all first-year calculus sessions.
Read more about Luiz Felipe Martins

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages