Reader small image

You're reading from  The Pandas Workshop

Product typeBook
Published inJun 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781800208933
Edition1st Edition
Languages
Concepts
Right arrow
Authors (4):
Blaine Bateman
Blaine Bateman
author image
Blaine Bateman

Blaine Bateman has more than 35 years of experience working with various industries from government R&D to startups to $1B public companies. His experience focuses on analytics including machine learning and forecasting. His hands-on abilities include Python and R coding, Keras/Tensorflow, and AWS & Azure machine learning services. As a machine learning consultant, he has developed and deployed actual ML models in industry.
Read more about Blaine Bateman

Saikat Basak
Saikat Basak
author image
Saikat Basak

Saikat Basak is a data scientist and a passionate programmer. Having worked with multiple industry leaders, he has a good understanding of problem areas that can potentially be solved using data. Apart from being a data guy, he is also a science geek and loves to explore new ideas in the frontiers of science and technology.
Read more about Saikat Basak

Thomas V. Joseph
Thomas V. Joseph
author image
Thomas V. Joseph

Thomas V. Joseph is a data science practitioner, researcher, trainer, mentor, and writer with more than 19 years of experience. He has extensive experience in solving business problems using machine learning toolsets across multiple industry segments.
Read more about Thomas V. Joseph

William So
William So
author image
William So

William So is a Data Scientist with both a strong academic background and extensive professional experience. He is currently the Head of Data Science at Douugh and also a Lecturer for Master of Data Science and Innovation at the University of Technology Sydney. During his career, he successfully covered the end-end spectrum of data analytics from ML to Business Intelligence helping stakeholders derive valuable insights and achieve amazing results that benefits the business. William is a co-author of the "The Applied Artificial Intelligence Workshop" published by Packt.
Read more about William So

View More author details
Right arrow

Chapter 13: Exploring Time Series

Here, you will learn how to use time data in the index to enable advanced capabilities such as resampling to different time intervals, interpolating, and modeling as a function of time. In Chapter 11, Data Modeling – Regression Modeling, you learned how to use multiple regressions as a powerful data modeling approach, and by the end of this chapter, you will be able to use regression with time series as well.

You will cover the following topics as you work through this chapter:

  • The time series as an index
  • Resampling, grouping, and aggregation by time
  • Activity 13.01 – Creating a time series model

The time series as an index

In many of the examples so far, we have had a column in a DataFrame containing dates or datetime information, and we've manipulated that. In many cases, when we want to perform operations on time-stamped data, it is simpler and more natural to have a time-based index. In general, you may want to consider time series to refer to a data structure with a time-based index and one or more columns of data. Let's explore a bit more what we can do with such a time series.

Time series periods/frequencies

We've seen the use of the pandas .date_range() method to generate a sequence of dates. The method is intuitive; we simply provide the start, end, and optional frequency (freq) arguments. The latter is the key to a lot of the convenience provided by pandas. The freq argument can take many values, and we've summarized them here.

Figure 13.1 – The possible values and meanings of the freq argument for date_range...

Resampling, grouping, and aggregation by time

We have now covered many of the components of time series and the great convenience offered by pandas to work with time-stamped data. As we mentioned in the last section, most of the time you will think of a time series as a time-based index and one or more columns of data. Now, let's take that structure as a starting point and then move on to introduce some advanced capabilities in pandas.

Using the resample method

Suppose you were given 6,000 readings of a sensor dataset, and the sample rate was 10 Hz or 10 times per second. We can make a simulated series like this as follows. We can start as we did in the last section, creating a sequence of timestamps. Using an end time of 9:59.9 and a frequency of 100 ms (milliseconds) generates the correct number of points (6,000) on the correct interval (10 per second = 100 ms):

sensor_times = ((pd.date_range('00:00:00', '00:09:59.9', freq = '100ms'))...

Activity 13.01 – Creating a time series model

In this activity, as a data analyst for a bike-share startup, you are provided with a dataset that has hourly unit rentals for a bike-share business. You are tasked to create a very simple model to predict the rentals one week in advance. Here, you will use linear regression from scikit-learn, which you saw in Chapter 11, Data Modeling –Regression:

  1. For this activity, you will need the pandas library, the matplotlib.pyplot library, and the sklearn.linear_model.LinearRegression module. Load them in the first cell of the notebook:
    import pandas as pd
    import matplotlib.pyplot as plt
    from sklearn.linear_model import LinearRegression
  2. Read in the bike_share.csv data from the Datasets directory, and list the first few rows:

Figure 13.25 – The bike_share.csv data

  1. You need to create a datetime index. Construct a new datetime-valued column as a combination of the date and the hour and...

Summary

In this chapter, starting with the enhancements provided by pandas for the Timestamp data types, you saw we can use time-aware methods to interpolate missing values or resample a time series to a higher or lower frequency (period). You used a linear regression model using a lagged series of data, which was enabled by making a datetime index from some text data. You are now prepared to analyze tabular data as well as order time series data, and carry out many transformations to find information hidden in complex data, which we will do in the next chapter.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
The Pandas Workshop
Published in: Jun 2022Publisher: PacktISBN-13: 9781800208933
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (4)

author image
Blaine Bateman

Blaine Bateman has more than 35 years of experience working with various industries from government R&D to startups to $1B public companies. His experience focuses on analytics including machine learning and forecasting. His hands-on abilities include Python and R coding, Keras/Tensorflow, and AWS & Azure machine learning services. As a machine learning consultant, he has developed and deployed actual ML models in industry.
Read more about Blaine Bateman

author image
Saikat Basak

Saikat Basak is a data scientist and a passionate programmer. Having worked with multiple industry leaders, he has a good understanding of problem areas that can potentially be solved using data. Apart from being a data guy, he is also a science geek and loves to explore new ideas in the frontiers of science and technology.
Read more about Saikat Basak

author image
Thomas V. Joseph

Thomas V. Joseph is a data science practitioner, researcher, trainer, mentor, and writer with more than 19 years of experience. He has extensive experience in solving business problems using machine learning toolsets across multiple industry segments.
Read more about Thomas V. Joseph

author image
William So

William So is a Data Scientist with both a strong academic background and extensive professional experience. He is currently the Head of Data Science at Douugh and also a Lecturer for Master of Data Science and Innovation at the University of Technology Sydney. During his career, he successfully covered the end-end spectrum of data analytics from ML to Business Intelligence helping stakeholders derive valuable insights and achieve amazing results that benefits the business. William is a co-author of the "The Applied Artificial Intelligence Workshop" published by Packt.
Read more about William So