You're reading from Scientific Computing with Python - Second Edition

Product typeBook

Published inJul 2021

Reading LevelIntermediate

PublisherPackt

ISBN-139781838822323

Edition2nd Edition

Languages

Python

Tools

NumPy Pandas

Concepts

Scientific Computing

Authors (3):

Claus Führer

Jan Erik Solem

Olivier Verdier

View More author details

Series and Dataframes - Working with Pandas

In this chapter, we give a brief introduction to pandas—the central tool in Python for data analysis and data handling. You will learn how to work with various time series in Python, the concept of dataframes, and how to access and visualize data. You will also find examples that demonstrate how smoothly pandas interacts with the other central modules in this book, namely NumPy and Matplotlib.

But please note, this chapter can, within the scope of this book, only serve as an appetizer. Its purpose is to equip you with the basic concepts. The full range of visualization, data analysis, and data conversion tools in pandas is impressive.

pandas offers many ways of importing data. Some of them will be presented together with guiding examples throughout this chapter.

The following topics will be covered in this chapter:

A guiding...

10. 1 A guiding example: Solar cells

To describe pandas in the best way, we need data. Thus, in this chapter, we will use production data from solar cell panels on the roof of a private house in the south of Sweden.

In the file solarWatts.dat there is data about the electricity production in watts per minute. A semicolon is used as a data separator and the first line in the file is a header line, explaining the content of the data columns:

Date;Watt
           :
2019-10-20 08:22:00 ; 44.0
2019-10-20 08:23:00 ; 61.0
2019-10-20 08:24:00 ; 42.0
           :

In another file, price.dat, we find the hourly electricity production price in Swedish crowns. The file is otherwise organized as before:

Date;SEK
2019-10-20 01:00 ; 0.32
2019-10-20 02:00 ; 0.28
2019-10-20 03:00 ; 0.29
       :

Finally, in a third file, rates.dat, we find the daily conversion rates from Swedish crowns to Euros (€):

Date;Euro_SEK
2019-10-21 ; 10.7311
2019-10-22 ; 10.7303
2019-10-23 ; 10.7385
       :

We want to...

10.2 NumPy arrays and pandas dataframes

Let's start by just looking at an example of a NumPy array:

A=array( [[ 1., 2., 3.],
          [4., 5., 6.]])

It is displayed as:

[[1. 2. 3.]
 [4. 5. 6.]]

And its elements are accessed by using indexes generated simply by counting rows and columns, for example, A[0,1].

This matrix can be converted to the pandas datatype DataFrame by keeping the same data and order but representing and accessing it in a different way:

import pandas as pd
A=array( [[ 1., 2., 3.],
          [ 4., 5., 6.]] )
AF = pd.DataFrame(A)

This DataFrame object, which we will explain in more detail in this chapter, is displayed as


      0   1  2
 0   1.0 2.0 3.0
 1   4.0 5.0 6.0

We see that a pandas dataframe has extra labels for the rows and columns called index and columns. These are the metadata of a dataframe.

Here, they coincide with NumPy's indexing, but that is not always so. The index and columns metadata allows the pandas dataframe to...

10.2.1 Indexing rules

Similar to the way dictionaries use keys to address a value, pandas dataframes use row labels—the dataframe index—and column labels to access individual values:

AF.loc['R1', 'C2']      # this returns 2.0

Or to generate a subframe:

AF.loc[['R1','R2'],['C1','C2']]

Resulting in:

   C1   C2
R1  1  2.0
R2  4  5.0

You can also address a complete row by using the index label:

AF.loc['R1']

This returns a pandas Series object:

C1    1.0
C2    2.0
C3    3.0
Name: R1, dtype: float64

If loc or iloc are called with list arguments or slices, the result is a dataframe.

In that way, an individual dataframe element can also be addressed as follows:

AF.loc['R1'].loc['C1']    # returns 1.0

An entire column is addressed directly as follows:

AF['C1']

This again returns a pandas Series object:

R1    1.0
R2    4.0
Name: C1, dtype: float64

Alternatively...

10.3 Creating and modifying dataframes

We return now to the solar cell data and explain how to create a dataframe from a data file. The format of the file with the given data is CSV. Each line in the file contains one data record with a comma or another character string as a data separator. Here, we used a semicolon as a separator because in many countries the comma is used instead of a decimal separator.

10.3.1 Creating a dataframe from imported data

We want to organize the dataframe in such a way that the dates are used as the index of the dataframe. To be better prepared for operating with dates, we also want that the data import process automatically converts date strings to a pandas Timestamp object. Finally, you might have noted that the way the date is written in the data files is in the ISO-format YY-MM-DD format and not in the American MM-DD-YY or the European DD-MM-YY format. We can put on our wishlist that pandas automatically recognizes the date format and performs the correct conversion:

solarWatts = pd.read_csv("solarWatts.dat", 
                         sep=';',
                         index_col='Date',
                         parse_dates=[0], infer_datetime_format=True)

The pandas command read_csv is the central tool. It has many more parameters than we used here and carefully studying their functionalities saves a lot of programming...

10.3.2 Setting the index

The default index of a dataframe is the row numbers. These are generated automatically when a dataframe is created and no index is specified. Here is an example.

We create a dataframe from a list of lists:

towns=[['Stockholm', 'Sweden', 188,975904],
       ['Malmö', 'Sweden', 322, 316588],
       ['Oslo', 'Norway', 481, 693491],
       ['Bergen', 'Norway', 464, 28392]]
town=pd.DataFrame(towns, columns=['City','Country','area','population'])

This produces a dataframe with rows labeled by their row numbers:

        City Country  area  population
0  Stockholm  Sweden   188      975904
1      Malmö  Sweden   322      316588
2       Oslo  Norway   481      693491
3     Bergen  Norway   464      28392

We can change this by choosing a column to be the index. The column can be duplicated—one serving as the index and the other belonging to the...

10.3.3 Deleting entries

Entries in a dataframe are deleted by the method drop.

Again, we use the dataframe from the previous section:

town=pd.DataFrame(towns, columns=['City','Country','area','population'])
town.set_index('City', inplace=True)

An entire row is deleted by:

town.drop('Bergen', axis=0)

The parameter axis specifies here that we look for a row. Deleting a row requires the column label and the correct parameter axis:

town.drop('area', axis=1)

10.3.4 Merging dataframes

From the three datafiles we provided for this chapter we used the first one, solarwatts.dat, to set up a dataframe solarWatts; see Section 10.3.1, Creating a dataframe from imported data. In a similar way, we can create dataframes price and rates from the other two files.

We show now how to merge these three dataframes into one and to treat rows with missing data in the resulting dataframe.

First, we merge solarWatts with price. For this, we use the pandas command merge:

solar_all=pd.merge(solarWatts, price, how='outer', sort=True, on='Date')
solar_all=pd.merge(solar_all, rates, how='outer', sort=True, on='Date')

It sets the column Date, which exists in both dataframes as the index of the new frame. The parameter how defines how to set up the new index column. By specifying outer we decided to choose the union of both index columns. Finally, we want to sort the index.

As solarWatts has data for every minute and...

10.3.5 Missing data in a dataframe

We saw in the last section that missing data is often indicated by NaN. The way missing data is indicated depends on the datatype of the column. Missing timestamps are indicated by the pandas object NaT, while missing data with another non-numeric datatype is indicated by None.

The dataframe method isnull returns a Boolean dataframe with the entry True at all places with missing data.

We will study various methods for treating missing data before returning to the solar cell data example.

Let's demonstrate these methods on a small dataframe:

frame = pd.DataFrame(array([[1., -5.,  3., NaN], 
                            [3.,  4., NaN, 17.], 
                            [6.,  8., 11.,  7.]]), 
                     columns=['a','b','c','d'])

This dataframe is displayed as:

     a    b    c     d
0  1.0 -5.0  3.0   NaN
1  3.0  4.0  NaN  17.0
2  6.0  8.0 11.0   7.0

Dataframes with missing data can be handled in different...

10.4 Working with dataframes

So far, we have seen how to create and modify a dataframe. Now, we turn to data interpretation. We will look at examples of visualization, show how to make simple calculations, and see how to group data. These are all stepping stones into the world of pandas. The strength of this module is in its large range of statistical tools. We leave the presentation of these tools to textbooks on practical statistics, while we focus here on the basic principles of pandas programming. We do not aim for completeness. Again, let's serve an appetizer.

10.4.1 Plotting from dataframes

To demonstrate the plotting functionality, we plot the energy price changes on May 16, 2020. For this, we construct a subframe of the data from that day:

solar_all.loc['2020-05-16'].plot(y='SEK')

You can see that we indexed here with the full day. This is a short form of slicing:

solar_all.loc['2020-05-16 00:00':'2020-05-16 23:59']

The resulting plot, Figure 10.1, shows the hourly variation of electricity prices in Swedish crowns on a typical day of the year.

Figure 10.1: Plotting one column of a dataframe; the hourly price in Swedish crowns (SEK) per kWh on May 16, 2020

pandas' plot command is built upon plot from the module matplotlib.pyplot, which we met in Chapter 6, Plotting.

It accepts the same parameters, for example, linestyle or marker.

The data for the x axis is taken from the dataframe index if not otherwise specified. Alternatively, you can plot one dataframe column versus another.

Line...

10.4.2 Calculations within dataframes

We can do simple calculations on dataframe columns by applying functions on every element of the column, that is, elementwise application of functions. These functions can be built-in Python functions, NumPy functions, or user-defined functions, such as lambda functions (see Section 7.7, Anonymous functions).

The simplest way is to operate on the columns directly. In the following example, we convert watts into kilowatts and Swedish crowns (SEK) into Euros by using the conversion rate, which was the actual rate on the day of the measurement:

solar_converted=pd.DataFrame()
solar_converted['kW']=solar_all['Watt']/1000
solar_converted['Euro']=solar_all['SEK']/solar_all['Euro_SEK']

Tacitly, we also adjusted the column labels to the converted units.

The command solar_converted.loc['2020-07-01 7:00':'2020-07-01 7:04'] then returns the converted data for July...

10.4.3 Grouping data

The ability to group data is one of the essential features for pandas' dataframes. In the solar cell example, you saw that we had a data frequency of one measurement per minute. What if you want to report on an hourly or daily basis instead? We just form groups and aggregate the data in a prescribed way.

The following example forms a new dataframe with the two columns labeled Watt and SEK reporting the peak solar cell power per day and the average price in SEK:

solar_day=solar_all.groupby(solar_all.index.date).agg({'Watt':'max', 
                                                       'SEK':'mean'})

Again, we can visualize the results by using the dataframe method plot:

solar_day.index=pd.to_datetime(solar_day.index,format='%Y-%m-%d')
ax=solar_day.loc['2020-06-01':'2020-06-30'].plot.bar('Watt')

Note, we created an axis object, ax, in order to change the tick labels on the ...

10.5 Summary

In this chapter, you got a brief introduction to pandas and saw how the concept of NumPy arrays is extended to dataframes. Instead of a never-complete explanation of the possibilities of dataframes, we guided you using an example of solar cell energy data through the first steps with pandas: setting up a dataframe from a file, merging frames, grouping data, and making calculations.

The rest of the chapter is locked

You have been reading a chapter from

Scientific Computing with Python - Second Edition

Published in: Jul 2021Publisher: PacktISBN-13: 9781838822323

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Claus Führer

Claus Führer is a professor of scientific computations at Lund University, Sweden. He has an extensive teaching record that includes intensive programming courses in numerical analysis and engineering mathematics across various levels in many different countries and teaching environments. Claus also develops numerical software in research collaboration with industry and received Lund University's Faculty of Engineering Best Teacher Award in 2016.
Read more about Claus Führer

Jan Erik Solem

Other recommended products

Related to this chapter

Become a Python Data Analyst

Become a Python Data Analyst book introduces you to the mainstream libraries of Python’s Data Science stack. With proven examples and real-world datasets, this book teaches how to effectively perform data manipulation, visualize and analyze data patterns and brings you to the ladder of advanced topics like Predictive Analytics.

BookAug 2018178 pages

Functional Programming in Python

Functional Programming in Python introduces you to the various aspects that are either directly or indirectly related to functional programming. With this book, you will explore these concepts and implement the various techniques in your applications.

BookNov 2019109 pages

Hands-On Data Analysis with NumPy and Pandas

In this book, you will explore two important Python packages used by Data Analysts, NumPy & pandas. You will dive into different concepts such as reading, sorting, grouping of data, and also learn how to work with different data formats for your data analysis projects.

BookJun 2018168 pages5

SciPy Recipes

The SciPy stack is a popular Python ecosystem used for mathematical and scientific computing tasks. Learn how you can put to use the various functionalities offered by the SciPy stack in the most efficient way possible. With the help of this book, you will solve real-world problems in linear algebra, numerical analysis, visualization, and more.

BookDec 2017386 pages

Python Fundamentals

Python Fundamentals takes you from zero experience to a complete understanding of key concepts, edge cases, and using Python for real-world application development. You'll move progressively from the basics to working with larger complex applications. After completing this book, you'll have the skills you need to dive into an existing application or start your own project.

BookOct 2018324 pages

Python Parallel Programming Cookbook

Python Parallel Programming Cookbook, Second Edition, covers recipes that will help you how to build multithreaded, multiprocess and asynchronous applications in Python. The book will help you build applications for the GPU using CUDA and PyOPENCL and implement effective debugging and testing techniques.

BookSep 2019370 pages

Dancing with Python

Millions of software developers use Python, and it is a powerful foundation for classical and quantum computing. Dancing with Python teaches you how to create elegant and efficient code using Pythonic techniques. Its integrated introduction to quantum computing development helps you extend your skills to the next major computing technology.

BookAug 2021744 pages

IPython Interactive Computing and Visualization Cookbook

IPython Interactive Computing and Visualization Cookbook, Second Edition shows you how to analyze and visualize data in the Jupyter Notebook. It will help you become an expert in high-performance computing and visualization for data analysis and scientific modeling.

BookJan 2018548 pages

Applying Math with Python

Python has a number of powerful packages to help anyone tackle complex mathematical problems in a simple and efficient way. This practical guide explains how to model real-world problems as mathematical objects in Python and how to perform computations, and interpret results. It explores Python lang to solve a variety of math and statistics problems.

BookJul 2020358 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Scientific Computing with Python - Second Edition

10. 1 A guiding example: Solar cells

10.2 NumPy arrays and pandas dataframes

10.2.1 Indexing rules

10.3 Creating and modifying dataframes

10.3.1 Creating a dataframe from imported data

10.3.2 Setting the index

10.3.3 Deleting entries

10.3.4 Merging dataframes

10.3.5 Missing data in a dataframe

10.4 Working with dataframes

10.4.1 Plotting from dataframes

10.4.2 Calculations within dataframes

10.4.3 Grouping data

10.5 Summary

Unlock this book and the full library FREE for 7 days

Authors (3)

Become a Python Data Analyst

Functional Programming in Python

Functional Programming in Python introduces you to the various aspects that are either directly or indirectly related to functional programming. With this book, you will explore these concepts and implement the various techniques in your applications.

Hands-On Data Analysis with NumPy and Pandas

In this book, you will explore two important Python packages used by Data Analysts, NumPy &amp; pandas. You will dive into different concepts such as reading, sorting, grouping of data, and also learn how to work with different data formats for your data analysis projects.

SciPy Recipes

Python Fundamentals

Python Parallel Programming Cookbook

Dancing with Python

IPython Interactive Computing and Visualization Cookbook

IPython Interactive Computing and Visualization Cookbook, Second Edition shows you how to analyze and visualize data in the Jupyter Notebook. It will help you become an expert in high-performance computing and visualization for data analysis and scientific modeling.

Applying Math with Python

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook

In this book, you will explore two important Python packages used by Data Analysts, NumPy & pandas. You will dive into different concepts such as reading, sorting, grouping of data, and also learn how to work with different data formats for your data analysis projects.