Reader small image

You're reading from  Mastering pandas. - Second Edition

Product typeBook
Published inOct 2019
Reading LevelIntermediate
Publisher
ISBN-139781789343236
Edition2nd Edition
Languages
Tools
Right arrow
Author (1)
Ashish Kumar
Ashish Kumar
author image
Ashish Kumar

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.
Read more about Ashish Kumar

Right arrow

Time Series and Plotting Using Matplotlib

Time series data is generated by a variety of processes, including the Internet of Things (IoT) sensors, machine/server logs, and monthly sales data from Customer Relationship Management (CRM) system. Some common characteristics of time series data is that the data points are generated at a fixed frequency and that there is an inherent trend and seasonality associated with the data.

In this chapter, we will take a tour of some topics that are necessary to develop expertise in using pandas. Knowledge of these topics is very useful for the preparation of data as input to programs for data analysis, prediction, or visualization.

The topics that we'll discuss in this chapter are as follows:

  • Handling time series data and dates
  • Manipulation of time series data—rolling, resampling, shifting, lagging, and time element separation
  • Formatting...

Handling time series data

In this section, we show you how to handle time series data. Handling involves reading, creating, resampling, and reindexing timestamp data. These tasks need to be performed on timestamp data to make it usable. We will start by showing you how to create time series data using the data read in from a csv file.

Reading in time series data

In this section, we demonstrate the various ways to read in time series data, starting with the simple read_csv method:

    In [7]: ibmData=pd.read_csv('ibm-common-stock-closing-prices-1959_1960.csv')
      ibmData.head()
    Out[7]:   TradeDate  closingPrice
    0   1959-06-29   445
    1   1959-06-30   448
    2   1959-07-01   450
    3   1959-07-02  ...

Plotting using matplotlib

This section provides a brief introduction to plotting in pandas using matplotlib. The matplotlib API is imported using the standard convention, as shown in the following command:

In [1]: import matplotlib.pyplot as plt 

Series and DataFrame have a plot method, which is simply a wrapper around plt.plot. Here, we will examine how we can do a simple plot of a sine and cosine function. Suppose we wished to plot the following functions over the interval pi to pi:

  • f(x) = cos(x) + sin (x)
  • g(x) = cos (x) - sin (x)

This gives the following interval:

    In [51]: import numpy as np
    In [52]: X = np.linspace(-np.pi, np.pi, 256,endpoint=True)
    
    In [54]: f,g = np.cos(X)+np.sin(X), np.sin(X)-np.cos(X)
    In [61]: f_ser=pd.Series(f)
             g_ser=pd.Series(g)
    
    
    In [31]: plotDF=pd.concat([f_ser,g_ser],axis=1)
             plotDF.index=X...

Summary

In this chapter, we discussed time series data and the steps you can take to process and manipulate it. A date column can be assigned as an index for Series or DataFrame and can then be used for subsetting them based on the index column. Time series data can be resampled—to either increase or decrease the frequency of the time series. For example, data generated every millisecond can be resampled to capture the data only every second or can be averaged for 1,000 milliseconds for each second. Similarly, data generated every minute can be resampled to have data every second by backfilling or forward filling (filling in the same value as the last or next minute value for all the seconds in that minute).

String to datetime conversion can be done via the datetime, strptime, and strftime packages , and each type of date entry (for example, 22nd July, 7/22/2019, and...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering pandas. - Second Edition
Published in: Oct 2019Publisher: ISBN-13: 9781789343236
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ashish Kumar

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.
Read more about Ashish Kumar