Reader small image

You're reading from  Machine Learning for Time-Series with Python

Product typeBook
Published inOct 2021
PublisherPackt
ISBN-139781801819626
Edition1st Edition
Right arrow
Author (1)
Ben Auffarth
Ben Auffarth
author image
Ben Auffarth

Ben Auffarth is a full-stack data scientist with more than 15 years of work experience. With a background and Ph.D. in computational and cognitive neuroscience, he has designed and conducted wet lab experiments on cell cultures, analyzed experiments with terabytes of data, run brain models on IBM supercomputers with up to 64k cores, built production systems processing hundreds and thousands of transactions per day, and trained language models on a large corpus of text documents. He co-founded and is the former president of Data Science Speakers, London.
Read more about Ben Auffarth

Right arrow

Working with time-series in Python

Python has a lot of libraries and packages for time-series, such as datetime, time, calendar, dateutil, and pytz, which can be highly confusing for beginners. At the same time, there are many different data types like date, time, datetime, tzinfo, timedelta, relativedelta, and more.

When it comes to using them, the devil is in the details. Just to name one example: many of these types are insensitive to the timezone. You should feel reassured, however, knowing that to get started, familiarity with a small subset of these libraries and data types is enough.

Requirements

In this chapter, we'll use several libraries, which we can quickly install from the terminal (or similarly from Anaconda Navigator):

pip install -U dython scipy numpy pandas seaborn scikit-learn

We'll execute the commands from the Python (or IPython) terminal, but equally we could execute them from a Jupyter notebook (or a different environment).

It's a good start if we at least know datetime and pandas, two very prominent libraries, which we'll cover in the following two sections. We'll create basic objects and do simple manipulations on them.

Datetime

The date and datetime data types are not primitive types in Python the way that numbers (float and int), string, list, dictionary, tuple, or file are. To work with date and datetime objects, we have to import datetime, a library that is part of the Python Standard Library, and the libraries that come by default with CPython and other main Python distributions.

datetime comes with objects such as date, datetime, time, and timedelta, among others. The difference between datetime and date objects is that the datetime object includes time information in addition to a date.

To get a date, we can do this:

from datetime import date

To get today's date:

today = date.today()

To get some other date:

other_date = date(2021, 3, 24)

If we want a datetime object (a timestamp) instead, we can do this as well:

from datetime import datetime
now = datetime.now()

This will get the current timestamp. We can create a datetime for a specific date and time as well:

some_date = datetime(2021, 5, 18, 15, 39, 0)
some_date.isoformat()

We can get a string output in isoformat:

'2021-05-18T15:39:00'

isoformat, short for the ISO 8601 format, is an international standard for representing dates and times.

We can also work with time differences using timedelta:

from datetime import timedelta 
year = timedelta(days=365)

These timedelta objects can be added to other objects for calculations. We can do calculations with a timedelta object, for example:

year * 10

This should give us the following output:

datetime.timedelta(days=3650) 

The datetime library can parse string inputs to date and datetime types and output these objects as string:

from datetime import date
some_date = date.fromisoformat('2021-03-24')

Or:

some_date = datetime.date(2021, 3, 24)

We can format the output with string format options, for example like this:

some_date.strftime('%A %d. %B %Y')

This would give us:

'Wednesday 24. March 2021'

Similarly, we can read in a date or datetime object from a string, and we can use the same format options:

from datetime import datetime
dt = datetime.strptime('24/03/21 15:48', '%d/%m/%y %H:%M')

You can find a complete list of formatting options that you can use both for parsing strings and printing datetime objects here: https://strftime.org/.

A few important ones are listed in this table:

Format string

Meaning

%Y

Year as 4 digits

%y

Year as 2 digits

%m

Month as a number

%d

Day

%H

Hour as 2 digits

%M

Minute as 2 digits

Figure 2.2: Format strings for dates

It's useful to remember these strings with formatting options. For example, the format string for a US date separated by slashes would look like this:

'%d/%m/%Y'

pandas

We introduced the pandas library in the previous chapter. pandas is one of the most important libraries in the Python ecosystem for data science, used for data manipulation and analysis. Initially released in 2008, it has been a major driver of Python's success.

pandas comes with significant time-series functionality such as date range generation, frequency conversion, moving window statistics, date shifting, and lagging.

Let's go through some of these basics. We can create a time-series as follows:

import pandas as pd
pd.date_range(start='2021-03-24', end='2021-09-01')

This gives us a DateTimeIndex like this:

DatetimeIndex(['2021-03-24', '2021-03-25', '2021-03-26', '2021-03-27',
               '2021-03-28', '2021-03-29', '2021-03-30', '2021-03-31',
               '2021-04-01', '2021-04-02',
               ...
               '2021-08-23', '2021-08-24', '2021-08-25', '2021-08-26',
               '2021-08-27', '2021-08-28', '2021-08-29', '2021-08-30',
               '2021-08-31', '2021-09-01'],
              dtype='datetime64[ns]', length=162, freq='D') 

We can also create a time-series as follows:

pd.Series(pd.date_range("2021", freq="D", periods=3))

This would give us a time-series like this:

0   2021-01-01
1   2021-01-02
2   2021-01-03
dtype: datetime64[ns]

As you can see, this type is called a DatetimeIndex. This means we can use this data type for indexing a dataset.

One of the most important functionalities is parsing to date or datetime objects from either string or separate columns:

import pandas as pd
df = pd.DataFrame({'year': [2021, 2022],
    'month': [3, 4],
    'day': [24, 25]}
)
ts1 = pd.to_datetime(df)
ts2 = pd.to_datetime('20210324', format='%Y%m%d')

We've created two time-series.

You can take a rolling window for calculations like this:

s = pd.Series([1, 2, 3, 4, 5])
s.rolling(3).sum()

Can you guess the result of this? If not, why don't you put this into your Python interpreter?

A time-series would usually be an index with a time object and one or more columns with numeric or other types, such as this:

import numpy as np 
rng = pd.date_range('2021-03-24', '2021-09-01', freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)

We can have a look at our time-series:

2021-03-24   -2.332713
2021-03-25    0.177074
2021-03-26   -2.136295
2021-03-27    2.992240
2021-03-28   -0.457537
                 ...
2021-08-28   -0.705022
2021-08-29    1.089697
2021-08-30    0.384947
2021-08-31    1.003391
2021-09-01   -1.021058
Freq: D, Length: 162, dtype: float64

We can index these time-series datasets like any other pandas Series or DataFrame. ts[:2].index would give us:

DatetimeIndex(['2021-03-24', '2021-03-25'], dtype='datetime64[ns]', freq='D')

Interestingly, we can index directly with strings or datetime objects. For example, ts['2021-03-28':'2021-03-30'] gives us:

2021-03-28   -0.457537
2021-03-29   -1.089423
2021-03-30   -0.708091
Freq: D, dtype: float64

You can shift or lag the values in a time-series back and forward in time using the shift method. This changes the alignment of the data:

ts.shift(1)[:5]

We can also change the resolution of time-series objects, for example like this:

ts.asfreq('M')

Please note the difference between datetime and pd.DateTimeIndex. Even though they encode the same kind of information, they are different types and they might not always play well with each other. Therefore, I'd recommend to always explicitly convert types when doing comparisons.

In the next section, let's go through a basic example of importing a time-series dataset, getting summary statistics, and plotting some variables.

Previous PageNext Page
You have been reading a chapter from
Machine Learning for Time-Series with Python
Published in: Oct 2021Publisher: PacktISBN-13: 9781801819626
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ben Auffarth

Ben Auffarth is a full-stack data scientist with more than 15 years of work experience. With a background and Ph.D. in computational and cognitive neuroscience, he has designed and conducted wet lab experiments on cell cultures, analyzed experiments with terabytes of data, run brain models on IBM supercomputers with up to 64k cores, built production systems processing hundreds and thousands of transactions per day, and trained language models on a large corpus of text documents. He co-founded and is the former president of Data Science Speakers, London.
Read more about Ben Auffarth