Python Algorithmic Trading Cookbook

By Pushpak Dagade
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Handling and Manipulating Date, Time, and Time Series Data

About this book

Python is a very popular language used to build and execute algorithmic trading strategies. If you want to find out how you can build a solid foundation in algorithmic trading using the language, this cookbook is here to help.

Starting by setting up the Python environment for trading and connectivity with brokers, you’ll then learn the important aspects of financial markets. As you progress through this algorithmic trading book, you’ll learn to fetch financial instruments, query and calculate various types of candles and historical data, and finally, compute and plot technical indicators. Next, you’ll discover how to place various types of orders, such as regular, bracket, and cover orders, and understand their state transitions. You’ll also uncover challenges faced while devising and executing powerful algorithmic trading strategies from scratch. Later chapters will take you through backtesting, paper trading, and finally real trading for the algorithmic strategies that you've created from the ground up. You’ll even understand how to automate trading and find the right strategy for making effective decisions that would otherwise be impossible for human traders.

By the end of this book, you’ll be able to use Python for algorithmic trading by implementing Python libraries to conduct key tasks in the algorithmic trading ecosystem.

Publication date:
August 2020
Publisher
Packt
Pages
542
ISBN
9781838989354

 
Handling and Manipulating Date, Time, and Time Series Data

Time series data is ubiquitous when it comes to algorithmic trading. So, handling, managing, and manipulating time series data is essential to performing algorithmic trading successfully. This chapter has various recipes that demonstrate how algorithmic trading can be done using the Python standard library and pandas, which is a Python data analysis library.

For our context, time series data is a series of data consisting of equally spaced timestamps and multiple data points describing trading data in that particular time frame. 

When handling time series data, the first thing you should know is how to read, modify, and create Python objects that understand date and time. The Python standard library includes the datetime module, which provides the datetime and timedelta objects, which can handle everything about the date and time. The first seven recipes in this chapter talk about this module. The remainder of this chapter talks about handling time series data using the pandas library, which is a very efficient library for data analysis. The pandas.DataFrame class will be used in our recipes.

The following is a list of the recipes in this chapter: 

  • Creating datetime objects
  • Creating timedelta objects
  • Operations on datetime objects
  • Modifying datetime objects
  • Converting a datetime to a string
  • Creating a datetime object from a string
  • The datetime object and time zones
  • Creating a pandas.DataFrame object
  • DataFrame manipulation—renaming, rearranging, reversing, and slicing
  • DataFrame manipulation—applying, sorting, iterating, and concatenating
  • Converting a DataFrame into other formats
  • Creating a DataFrame from other formats
 

Technical requirements

You will need the following to successfully execute the recipes in this chapter:

  • Python 3.7+
  • Python package:
  • pandas ($ pip install pandas)

For all the recipes in this chapter, you will need the Jupyter notebook for this chapter, found at https://github.com/PacktPublishing/Python-Algorithmic-Trading-Cookbook/tree/master/Chapter01.

You can also open a new Jupyter notebook and try the hands-on exercises directly as they are shown in the recipes. Note that the output for some of these recipes might differ for you as they depend on the date, time, and time zone information provided at the time.

 

Creating datetime objects

The datetime module provides a datetime class, which can be used to accurately capture information relating to timestamps, dates, times, and time zones. In this recipe, you will create datetime objects in multiple ways and introspect their attributes.

 

How to do it…

Follow these steps to execute this recipe:

  1. Import the necessary module from the Python standard library:
>>> from datetime import datetime
  1. Create a datetime object holding the current timestamp using the now() method and print it:
>>> dt1 = datetime.now()
>>> print(f'Approach #1: {dt1}')

We get the following output. Your output will differ:

Approach #1: 2020-08-12 20:55:39.680195
  1. Print the attributes of dt1 related to date and time:
>>> print(f'Year: {dt1.year}')
>>> print(f'Month: {dt1.month}')
>>> print(f'Day: {dt1.day}')
>>> print(f'Hours: {dt1.hour}')
>>> print(f'Minutes: {dt1.minute}')
>>> print(f'Seconds: {dt1.second}')
>>> print(f'Microseconds: {dt1.microsecond}')
>>> print(f'Timezone: {dt1.tzinfo}')

We get the following output. Your output would differ:

Year: 2020
Month: 8
Day: 12
Hours: 20
Minutes: 55
Seconds: 39
Microseconds: 680195
Timezone: None
  1. Create a datetime object holding the timestamp for 1st January 2021::
>>> dt2 = datetime(year=2021, month=1, day=1)
>>> print(f'Approach #2: {dt2}')

You will get the following output:

Approach #2: 2021-01-01 00:00:00
  1. Print the various attributes of dt2 related to date and time:
>>> print(f'Year: {dt.year}')
>>> print(f'Month: {dt.month}')
>>> print(f'Day: {dt.day}')
>>> print(f'Hours: {dt.hour}')
>>> print(f'Minutes: {dt.minute}')
>>> print(f'Seconds: {dt.second}')
>>> print(f'Microseconds: {dt.microsecond}')
>>> print(f'Timezone: {dt2.tzinfo}')

You will get the following output:

Year: 2021
Month: 1
Day: 1
Hours: 0
Minutes: 0
Seconds: 0
Microseconds: 0
Timezone: None
 

How it works...

In step 1, you import the datetime class from the datetime module. In step 2, you create and print a datetime object using the now() method and assign it to dt1. This object holds the current timestamp information.

A datetime object has the following attributes related to date, time, and time zone information:

1

year

An integer between 0 and 23, both inclusive

2

month

An integer between 1 and 12, both inclusive

3

day

An integer between 1 and 31, both inclusive

4

hour

An integer between 0 and 23, both inclusive

5

minute

An integer between 0 and 59, both inclusive

6

second

An integer between 0 and 59, both inclusive

7

microsecond

An integer between 0 and 999999, both inclusive

8

tzinfo

An object of class timezone. (More information on time zones in The datetime object and time zones recipe).

In step 3, these attributes are printed for dt1. You can see that they hold the current timestamp information.

In step 4, you create and print another datetime object. This time you create a specific timestamp, which is 1st Jan 2021, midnight. You call the constructor itself with the parameters—year as 2021, month as 1, and day as 1. The other time related attributes default to 0 and time zone defaults to None. In step 5, you print the attributes of dt2. You can see that they hold exactly the same values as you had passed to the constructor in step 4.

 

There's more

You can use the date() and time() methods of the datetime objects to extract the date and time information, as instances of datetime.date and datetime.time classes respectively:

  1. Use date() method to extract date from dt1. Note the type of the return value.
>>> print(f"Date: {dt1.date()}")
>>> print(f"Type: {type(dt1.date())}")

You will get the following output. Your output may differ::

Date: 2020-08-12
Type: <class 'datetime.date'>
  1. Use time() method to extract date from dt1. Note the type of the return value.
>>> print(f"Time: {dt1.time()}")
>>> print(f"Type: {type(dt1.time())}")

We get the following output. Your output may differ:

Time: 20:55:39.680195
Type: <class 'datetime.time'>
  1. Use date() method to extract date from dt2. Note the type of the return value.
>>> print(f"Date: {dt2.date()}")
>>> print(f"Type: {type(dt2.date())}")

We get the following output:

Date: 2021-01-01
Type: <class 'datetime.date'>
  1. Use time() method to extract date from dt2. Note the type of the return value.
>>> print(f"Time: {dt2.time()}")
>>> print(f"Type: {type(dt2.time())}")

We get the following output:

Time: 00:00:00
Type: <class 'datetime.time'>
 

Creating timedelta objects

The datetime module provides a timedelta class, which can be used to represent information related to date and time differences. In this recipe, you will create timedelta objects and perform operations on them.

 

How to do it…

Follow along with these steps to execute this recipe:

  1. Import the necessary module from the Python standard library:
>>> from datetime import timedelta
  1. Create a timedelta object with a duration of 5 days. Assign it to td1 and print it:
>>> td1 = timedelta(days=5)
>>> print(f'Time difference: {td1}')

We get the following output:

Time difference: 5 days, 0:00:00
  1. Create a timedelta object with a duration of 4 days. Assign it to td2 and print it:
>>> td2 = timedelta(days=4)
>>> print(f'Time difference: {td2}')

We get the following output:

Time difference: 4 days, 0:00:00
  1. Add td1 and td2 and print the output:
>>> print(f'Addition: {td1} + {td2} = {td1 + td2}')

We get the following output:

Addition: 5 days, 0:00:00 + 4 days, 0:00:00 = 9 days, 0:00:00
  1. Subtract td2 from td1 and print the output:
>>> print(f'Subtraction: {td1} - {td2} = {td1 - td2}')

We will get the following output:

Subtraction: 5 days, 0:00:00 - 4 days, 0:00:00 = 1 day, 0:00:00
  1. Multiply td1 with a number (a float) :
>>> print(f'Multiplication: {td1} * 2.5 = {td1 * 2.5}')

We get the following output:

Multiplication: 5 days, 0:00:00 * 2.5 = 12 days, 12:00:00
 

How it works...

In step 1, you import the timedelta class from the datetime module. In step 2 you create a timedelta object that holds a time difference value of 5 days and assign it to td1. You call the constructor to create the object with a single attribute, days. You pass the value as 5 here. Similarly, in step 3, you create another timedelta object, which holds a time difference value of 4 days and assign it to td2.

In the next steps, you perform operations on the timedelta objects. In step 4, you add td1 and td2. This returns another timedelta object which holds a time difference value of 9 days, which is the sum of the time difference values held by td1 and td2. In step 5, you subtract td2 from td1. This returns another timedelta object that holds a time difference value of 1 day, which is the difference of time difference values held by td1 and td2. In step 6, you multiply td1 with 2.5, a float. This again returns a timedelta object that holds a time difference value of twelve and a half days.

 

There's more

A timedelta object can be created using one or more optional arguments:

1

weeks

An integer. Default value is 0.

2

days

An integer. Default value is 0.

3

hours

An integer. Default value is 0.

4

minutes

An integer. Default value is 0.

5

seconds

An integer. Default value is 0.

6

milliseconds

An integer. Default value is 0.

7

microseconds

An integer. Default value is 0.

 

In step 2 and step 3, we have used just the days argument. You can use other arguments as well. Also, these attributes are normalized upon creation. This normalization of timedelta objects is done to ensure that there is always a unique representation for every time difference value which can be held. The following code demonstrates this:

  1. Create a timedelta object with hours as 23, minutes as 59, and seconds as 60. Assign it to td3 and print it. It will be normalized to a timedelta object with days as 1 (and other date and time-related attributes as 0):
>>> td3 = timedelta(hours=23, minutes=59, seconds=60)
>>> print(f'Time difference: {td3}')

We get the following output:

Time difference: 1 day, 0:00:00

The timedelta objects have a convenience method, total_seconds(). This method returns a float which represents the total seconds contained in the duration held by the timedelta object.

  1. Call the total_seconds() method on td3. You get 86400.0 as the output:
>>> print(f'Total seconds in 1 day: {td3.total_seconds()}')

We get the following output:

Total seconds in 1 day: 86400.0
 

Operations on datetime objects

The datetime and timedelta classes support various mathematical operations to get dates in the future or the past. Using these operations returns another datetime object. . In this recipe, you would create datetime, date, time, and timedelta objects and perform mathematical operations on them.

 

How to do it…

Follow along with these steps to execute this recipe:

  1. Import the necessary modules from the Python standard library:
>>> from datetime import datetime, timedelta
  1. Fetch today's date. Assign it to date_today and print it:
>>> date_today = date.today()              
>>> print(f"Today's Date: {date_today}")

We get the following output. Your output may differ:

Today's Date: 2020-08-12
  1. Add 5 days to today's date using a timedelta object. Assign it to date_5days_later and print it:
>>> date_5days_later = date_today + timedelta(days=5)
>>> print(f"Date 5 days later: {date_5days_later}")

We get the following output. Your output may differ:

Date 5 days later: 2020-08-17
  1. Subtract 5 days from today's date using a timedelta object. Assign it to date_5days_ago and print it:
>>> date_5days_ago = date_today - timedelta(days=5)
>>> print(f"Date 5 days ago: {date_5days_ago}")

We get the following output. Your output may differ:

Date 5 days ago: 2020-08-07
  1. Compare date_5days_later with date_5days_ago using the > operator:
>>> date_5days_later > date_5days_ago

We get the following output:

True
  1. Compare date_5days_later with date_5days_ago using the < operator:
>>> date_5days_later < date_5days_ago

We get the following output:

False
  1. Compare date_5days_later, date_today and date_5days_ago together using the > operator:
>>> date_5days_later > date_today > date_5days_ago

We get the following output:

True
  1. Fetch the current timestamp. Assign it to current_timestamp:
>>> current_timestamp = datetime.now()
  1. Fetch the current time. Assign it to time_now and print it:
>>> time_now = current_timestamp.time()
>>> print(f"Time now: {time_now}")

We get the following output. Your output may differ:

Time now: 20:55:45.239177
  1. Add 5 minutes to the current time using a timedelta object. Assign it to time_5minutes_later and print it:
>>> time_5minutes_later = (current_timestamp + 
timedelta(minutes=5)).time()
>>> print(f"Time 5 minutes later: {time_5minutes_later}")

We get the following output. Your output may differ:

Time 5 minutes later: 21:00:45.239177
  1. Subtract 5 minutes from the current time using a timedelta object. Assign it to time_5minutes_ago and print it:
>>> time_5minutes_ago = (current_timestamp - 
timedelta(minutes=5)).time()
>>> print(f"Time 5 minutes ago: {time_5minutes_ago}")

We get the following output. Your output may differ:

Time 5 minutes ago: 20:50:45.239177
  1. Compare time_5minutes_later with time_5minutes_ago using the < operator:
>>> time_5minutes_later < time_5minutes_ago

We get the following output. Your output may differ:

False
  1. Compare time_5minutes_later with time_5minutes_ago using the > operator:
>>> time_5minutes_later > time_5minutes_ago

We get the following output. Your output may differ:

True
  1. Compare time_5minutes_later, time_now and time_5minutes_ago together using the > operator:
>> time_5minutes_later > time_now > time_5minutes_ago

We get the following output. Your output may differ:

True
 

How it works…

In step 1, you import date, datetime, and timedelta classes from the datetime module. In step 2, you fetch today's date using the today() classmethod provided by the class date and assign it to a new attribute, date_today. (A classmethod allows you to call a method directly on a class without creating an instance.) The return object is of type datetime.date. In step 3, you create a date, 5 days ahead of today, by adding a timedelta object, holding a duration of 5 days, to date_today. You assign this to a new attribute, date_5days_later. Similarly, in step 4, you create a date, 5 days ago and assign it to a new attribute date_5days_ago.

In step 5 and step 6, you compare date_5days_later and date_5days_ago using the > and < operators, respectively. The > operator returns True if the first operand holds a date ahead of that held by operand 2. Similarly, the < operator returns True if the second operand holds a date ahead of that held by operand 1. In step 7, you compare together all three date objects created so far. Note the outputs.

Step 8 to step 14 perform the same operations as step 2 to step 7, but this time on datetime.time objects—fetching current time, fetching a time 5 minutes ahead of the current time, fetching a time 5 minutes before the current time and comparing all the datetime.time objects which are created. The timedelta objects cannot be added to datetime.time objects directly to get time in the past or the future. To overcome this, you can add timedelta objects to datetime objects and then extract time from them using the time() method. You do this in step 10 and step 11.

 

There's more

The operations shown in this recipe on date and time objects can similarly be performed on datetime objects. Besides +, -, < and >, you can also use the following operators on datetime, date, and time objects:

>=

Return True only if the first operand holds a datetime/date/time ahead or equal to that of the first operand

<=

Return True only if the first operand holds a datetime/date/time before or equal to that of the first operand

==

Return True only if the first operand holds a datetime/date/time equal to that of the first operand

This is not an exhaustive list of permissible operators. Refer to the official documentation on datetime module for more information: https://docs.python.org/3.8/library/datetime.html.

 

Modifying datetime objects

Often, you may want to modify existing datetime objects to represent a different date and time. This recipe includes code to demonstrate this.

 

How to do it…

Follow these steps to execute this recipe:

  1. Import the necessary modules from the Python standard library:
>>> from datetime import datetime
  1. Fetch the current timestamp. Assign it to dt1 and print it:
>>> dt1 = datetime.now()
>>> print(dt1)

We get the following output. Your output would differ:

2020-08-12 20:55:46.753899
  1. Create a new datetime object by replacing the year, month, and day attributes of dt1. Assign it to dt2 and print it :
>>> dt2 = dt1.replace(year=2021, month=1, day=1)
>>> print(f'A timestamp from 1st January 2021: {dt2}')

We get the following output. Your output would differ:

A timestamp from 1st January 2021: 2021-01-01 20:55:46.753899
  1. Create a new datetime object by specifying all the attributes directly. Assign it to dt3 and print it:
>>> dt3 = datetime(year=2021, 
month=1,
day=1,
hour=dt1.hour,
minute=dt1.minute,
second=dt1.second,
microsecond=dt1.microsecond,
tzinfo=dt1.tzinfo)
print(f'A timestamp from 1st January 2021: {dt3}')

We get the following output. Your output would differ:

A timestamp from 1st January 2021: 2021-01-01 20:55:46.753899
  1. Compare dt2 and dt3:
>>> dt2 == dt3

We get the following output.

True
 

How it works...

In step 1, you import the datetime class from the datetime module. In step 2, you fetch the current timestamp using the now() method of datetime and assign it to a new attribute, dt1. To get a modified timestamp from an existing datetime object, you can use the replace() method. In step 3, you create a new datetime object dt2, from dt1, by calling the replace() method. You specify the attributes to be modified, which are year, month, and day. The remaining attributes remain as it is, which are an hour, minute, second, microsecond, and timezone. You can confirm this by comparing the outputs of step 2 and step 3. In step 4, you create another datetime object, dt3. This time you call the datetime constructor directly. You pass all the attributes to the constructor such that the timestamp created is the same as dt2. In step 5, you confirm that dt2 and dt3 hold exactly the same timestamp by using the == operator, which returns True.

 

Converting a datetime object to a string

This recipe demonstrates the conversion of the datetime objects into strings which finds application in printing and logging. Also, this is helpful while sending timestamps as JSON data over web APIs.

 

How to do it…

Execute the following steps for this recipe:

  1. Import the necessary modules from the Python standard library:
>>> from datetime import datetime
  1. Fetch the current timestamp along with time zone information. Assign it to now and print it:
>>> now = datetime.now().astimezone()
  1. Cast now to a string and print it::
>>> print(str(now))

We get the following output. Your output may differ:

2020-08-12 20:55:48.366130+05:30
  1. Convert now to a string with a specific date-time format using strftime() and print it:
>>> print(now.strftime("%d-%m-%Y %H:%M:%S %Z"))

We get the following output. Your output may differ:

12-08-2020 20:55:48 +0530
 

How it works...

In step 1, you import the datetime class from the datetime module. In step 2, you fetch the current timestamp with time zone and assign it to a new attribute, now. The now() method of datetime fetches the current timestamp, but without time zone information. Such objects are called time zone-native datetime objects. The astimezone() method adds time zone information from the system local time on this time zone-naive object, essentially converting it to a time zone-aware object. (More information in The datetime object and time zones recipe). In step 3, you cast now to a string object and print it. Observe that the output date format is fixed and may not be of your choice. The datetime module has a strftime() method which can convert the object to a string in a specific format as required. In step 4, you convert now to a string in the format DD-MM-YYYY HH:MM:SS +Z. The directives used in step 4 are described as follows:

Directive

Meaning

%d

The day of the month as a zero-padded decimal number

%m

The month as a zero-padded decimal number

%Y

The year with the century as a decimal number

%H

The hour (24-hour clock) as a zero-padded decimal number

%M

The minute as a zero-padded decimal number

%S

The second as a zero-padded decimal number

%Z

The time zone name (empty string if the object is naive)

A complete list of the directives that can be given to .strptime() can be found at https://docs.python.org/3.7/library/datetime.html#strftime-and-strptime-behavior.

 

Creating a datetime object from a string

This recipe demonstrates the conversion of well-formatted strings into datetime objects. This finds application in reading timestamps from a file. Also, this is helpful while receiving timestamps as JSON data over web APIs.

 

How to do it…

Execute the following steps for this recipe:

  1. Import the necessary modules from the Python standard library:
>>> from datetime import datetime
  1. Create a string representation of timestamp with date, time, and time zone. Assign it to now_str:
>>> now_str = '13-1-2021 15:53:39 +05:30'
  1. Convert now_str to now, a datetime.datetime object. Print it:
>>> now = datetime.strptime(now_str, "%d-%m-%Y %H:%M:%S %z")
>>> print(now)

We get the following output:

2021-01-13 15:53:39+05:30
  1. Confirm that now is of the datetime type:
>>> print(type(now))

We get the following output:

<class 'datetime.datetime'>
 

How it works...

In step 1, you import the datetime class from the datetime module. In step 2, you create a string holding a valid timestamp and assign it to a new attribute, now_str. The datetime module has a strptime() method which can convert a string holding a valid timestamp in a specific format to a datetime object. In step 3, you convert now_str, a string in the format DD-MM-YYYY HH:MM:SS +Z, to now. In step 4, you confirm that now is indeed an object of the datetime type. The directives used in step 3 are the same as those described in the Converting a datetime object to a string recipe.

 

There's more

When reading a string into a datetime object, the entire string should be consumed with appropriate directives. Consuming a string partially will throw an exception, as shown in the following code snippet. The error message shows what data was not converted and can be used to fix the directives provided to the strptime() method.

Try to convert now_str to a datetime object using strptime() method. Pass a string with directives for only the date part of the string. Note the error:

>>> now = datetime.strptime(now_str, "%d-%m-%Y")

The output is as follows:

# Note: It's expected to have an error below
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-96-dc92a0358ed8> in <module>
----> 1 now = datetime.strptime(now_str, "%d-%m-%Y")
2 # Note: It's expected to get an error below

/usr/lib/python3.8/_strptime.py in _strptime_datetime(cls, data_string, format)
566 """Return a class cls instance based on the input string and the
567 format string."""
--> 568 tt, fraction, gmtoff_fraction = _strptime(data_string, format)
569 tzname, gmtoff = tt[-2:]
570 args = tt[:6] + (fraction,)

/usr/lib/python3.8/_strptime.py in _strptime(data_string, format)
350 (data_string, format))
351 if len(data_string) != found.end():
--> 352 raise ValueError("unconverted data remains: %s" %
353 data_string[found.end():])
354

ValueError: unconverted data remains: 15:53:39 +05:30
 

The datetime object and time zones

There are two types of datetime objects—time zone-naive and time zone-aware. Time zone-naive objects do not hold time zone information and timezone-aware objects hold time zone information. This recipe demonstrates multiple time zone related operations on datetime objects: creating time zone-naive and time zone-aware objects, adding time zone information to time zone-aware objects, removing time zone information from time zone-naive objects, and comparing time zone-aware and time zone-naive objects.

 

How to do it…

Execute the following steps for this recipe:

  1. Import the necessary modules from the Python standard library:
>>> from datetime import datetime
  1. Create a time zone-naive datetime object. Assign it to now_tz_naive and print it:
>>> now_tz_unaware = datetime.now()
>>> print(now_tz_unaware)

We get the following output. Your output may differ:

2020-08-12 20:55:50.598800
  1. Print the time zone information of now_tz_naive. Note the output:
>>> print(now_tz_unaware.tzinfo)

We get the following output:

None
  1. Create a time zone-aware datetime object. Assign it to now_tz_aware and print it:
>>> now_tz_aware = datetime.now().astimezone()
>>> print(now_tz_aware)

We get the following output. Your output may differ:

2020-08-12 20:55:51.004671+05:30
  1. Print the time zone information of now_tz_aware. Note the output:
>>> print(now_tz_aware.tzinfo)

We get the following output. Your output may differ:

IST
  1. Create a new timestamp by adding time zone information to now_tz_naive from now_tz_aware. Assign it to new_tz_aware and print it:
>>> new_tz_aware = now_tz_naive.replace(tzinfo=now_tz_aware.tzinfo)
>>> print(new_tz_aware)

The output is as follows. Your output may differ:

2020-08-12 20:55:50.598800+05:30
  1. Print the timezone information of new_tz_aware using the tzinfo attribute. Note the output:
>>> print(new_tz_aware.tzinfo)

The output is as follows. Your output may differ:

IST
  1. Create a new timestamp by removing timezone information from new_tz_aware. Assign it to new_tz_naive and print it:
>>> new_tz_naive = new_tz_aware.replace(tzinfo=None)
>>> print(new_tz_naive)

The output is as follows. Your output may differ:

2020-08-12 20:55:50.598800
  1. Print the timezone information of new_tz_naive using the tzinfo attribute. Note the output:
>>> print(new_tz_naive.tzinfo)

The output is as follows:

None
 

How it works...

In step 1, you import the datetime class from the datetime module. In step 2, you create a time zone-naive datetime object using the now() method and assign it to a new attribute now_tz_naive. In step 3, you print the time zone information held by now_tz_naive using the tzinfo attribute. Observe that the output is None as this is a time zone-naive object.

In step 4, you create a time zone-aware datetime object using the now() and astimezone() methods and assign it to a new attribute now_tz_aware. In step 5, you print the time zone information held by now_tz_aware using the tzinfo attribute. Observe that the output is IST and not None; as this is a time zone-aware object.

In step 6, you create a new datetime object by adding time zone information to now_tz_naive. The time zone information is taken from now_tz_aware. You do this using the replace() method (Refer to Modifying datetime objects recipe for more information). You assign this to a new variable, new_tz_aware. In step 7, you print the time zone information held by new_tz_aware. Observe it is the same output as in step 5 as you have taken time zone information from now_tz_aware. Similarly, in step 8 and step 9, you create a new datetime object, new_tz_naive, but this time you remove the time zone information.

 

There's more

You can use comparison operators only between time zone-naive or time zone-aware datetime objects. You cannot compare a time zone-naive datetime object with a time zone-aware datetime object. Doing so will throw an exception. This is demonstrated in the following steps:

  1. Compare 2 timezone-naive objects, new_tz_naive and now_tz_naive. Note the output:
>>> new_tz_naive <= now_tz_naive
  1. Compare 2-time zone-aware objects, new_tz_aware, and now_tz_aware. Note the output:
>>> new_tz_aware <= now_tz_aware

We get the following output:

True
  1. Compare a time zone-aware object and a time zone-naive object, new_tz_aware, and now_tz_naive. Note the error:
>>> new_tz_aware > now_tz_naive

We get the following output:

-------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-167-a9433bb51293> in <module>
----> 1 new_tz_aware > now_tz_naive
2 # Note: It's expected to get an error below

TypeError: can't compare offset-naive and offset-aware datetimes
 

Creating a pandas.DataFrame object

Now that we are done with handling date and time, let's move on to handling time series data. The pandas library has a pandas.DataFrame class, which is useful for handling and manipulating such data. This recipe starts by creating these objects.

 

How to do it...

Execute the following steps for this recipe:

  1. Import the necessary modules from the Python standard library:
>>> from datetime import datetime
>>> import pandas
  1. Create a sample time-series data as a list of dictionary objects. Assign it to time_series data:
>>> time_series_data = \
[{'date': datetime.datetime(2019, 11, 13, 9, 0),
'open': 71.8075, 'high': 71.845, 'low': 71.7775,
'close': 71.7925, 'volume': 219512},
{'date': datetime.datetime(2019, 11, 13, 9, 15),
'open': 71.7925, 'high': 71.8, 'low': 71.78,
'close': 71.7925, 'volume': 59252},
{'date': datetime.datetime(2019, 11, 13, 9, 30),
'open': 71.7925, 'high': 71.8125, 'low': 71.76,
'close': 71.7625, 'volume': 57187},
{'date': datetime.datetime(2019, 11, 13, 9, 45),
'open': 71.76, 'high': 71.765, 'low': 71.735,
'close': 71.7425, 'volume': 43048},
{'date': datetime.datetime(2019, 11, 13, 10, 0),
'open': 71.7425, 'high': 71.78, 'low': 71.7425,
'close': 71.7775, 'volume': 45863},
{'date': datetime.datetime(2019, 11, 13, 10, 15),
'open': 71.775, 'high': 71.8225, 'low': 71.77,
'close': 71.815, 'volume': 42460},
{'date': datetime.datetime(2019, 11, 13, 10, 30),
'open': 71.815, 'high': 71.83, 'low': 71.7775,
'close': 71.78, 'volume': 62403},
{'date': datetime.datetime(2019, 11, 13, 10, 45),
'open': 71.775, 'high': 71.7875, 'low': 71.7475,
'close': 71.7525, 'volume': 34090},
{'date': datetime.datetime(2019, 11, 13, 11, 0),
'open': 71.7525, 'high': 71.7825, 'low': 71.7475,
'close': 71.7625, 'volume': 39320},
{'date': datetime.datetime(2019, 11, 13, 11, 15),
'open': 71.7625, 'high': 71.7925, 'low': 71.76,
'close': 71.7875, 'volume': 20190}]
  1. Create a new DataFrame from time_series_data. Assign it to df and print it:
>>> df = pandas.DataFrame(time_series_data)
>>> df

We get the following output:

                 date    open    high     low   close volume
0 2019-11-13 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
1 2019-11-13 09:15:00 71.7925 71.8000 71.7800 71.7925 59252
2 2019-11-13 09:30:00 71.7925 71.8125 71.7600 71.7625 57187
3 2019-11-13 09:45:00 71.7600 71.7650 71.7350 71.7425 43048
4 2019-11-13 10:00:00 71.7425 71.7800 71.7425 71.7775 45863
5 2019-11-13 10:15:00 71.7750 71.8225 71.7700 71.8150 42460
6 2019-11-13 10:30:00 71.8150 71.8300 71.7775 71.7800 62403
7 2019-11-13 10:45:00 71.7750 71.7875 71.7475 71.7525 34090
8 2019-11-13 11:00:00 71.7525 71.7825 71.7475 71.7625 39320
9 2019-11-13 11:15:00 71.7625 71.7925 71.7600 71.7875 20190
  1. Get the list of columns in df:
>>> df.columns.tolist()

We get the following output:

['date', 'open', 'high', 'low', 'close', 'volume']
  1. Create a DataFrame object again using the time_series_data. This time, specify the columns in the order you want:
>>> pandas.DataFrame(time_series_data, 
columns=['close','date', 'open', 'high', 'low', 'volume'])

We get the following output:

    close                date    open    high     low volume
0 71.7925 2019-11-13 09:00:00 71.8075 71.8450 71.7775 219512
1 71.7925 2019-11-13 09:15:00 71.7925 71.8000 71.7800 59252
2 71.7625 2019-11-13 09:30:00 71.7925 71.8125 71.7600 57187
3 71.7425 2019-11-13 09:45:00 71.7600 71.7650 71.7350 43048
4 71.7775 2019-11-13 10:00:00 71.7425 71.7800 71.7425 45863
5 71.8150 2019-11-13 10:15:00 71.7750 71.8225 71.7700 42460
6 71.7800 2019-11-13 10:30:00 71.8150 71.8300 71.7775 62403
7 71.7525 2019-11-13 10:45:00 71.7750 71.7875 71.7475 34090
8 71.7625 2019-11-13 11:00:00 71.7525 71.7825 71.7475 39320
9 71.7875 2019-11-13 11:15:00 71.7625 71.7925 71.7600 20190
 

How it works...

In step 1, you import the datetime class from the datetime module and the pandas package. In step 2, you create a time-series data, which is typically returned by 3rd party APIs for historical data. This data is a list of dictionaries, and each dictionary has the same set of keys—date, open, high, low, close, and volume. Observe that the value for the date key is a datetime object and for the other keys are float objects.

In step 3, you create a pandas DataFrame object by directly calling the constructor with time_series_data as an argument and assign the return data to df. The keys of the dictionaries become the column names of df and values become the data. In step 4, you fetch the columns of df as a list using the columns attribute and the tolist() method. You can verify that the column names are the same as the keys of the dictionaries in time_series_data.

In step 5, you create a DataFrame with the columns in a specific order by passing a columns argument to the constructor with the required order as a list of strings.

 

There's more

When a DataFrame object is created, an index is assigned to it automatically, which is an address for all the rows. The leftmost column in the preceding example is the index column. By default, the index starts from 0. A custom index can be set by passing an index argument to the DataFrame constructor with the required indices as an iterator. This is shown as follows:

  1. Create a new DataFrame object from time_series_data, with a custom index:
>>> pandas.DataFrame(time_series_data, index=range(10, 20)) 

We get the following output:

                  date    open    high     low   close volume
10 2019-11-13 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
11 2019-11-13 09:15:00 71.7925 71.8000 71.7800 71.7925 59252
12 2019-11-13 09:30:00 71.7925 71.8125 71.7600 71.7625 57187
13 2019-11-13 09:45:00 71.7600 71.7650 71.7350 71.7425 43048
14 2019-11-13 10:00:00 71.7425 71.7800 71.7425 71.7775 45863
15 2019-11-13 10:15:00 71.7750 71.8225 71.7700 71.8150 42460
16 2019-11-13 10:30:00 71.8150 71.8300 71.7775 71.7800 62403
17 2019-11-13 10:45:00 71.7750 71.7875 71.7475 71.7525 34090
18 2019-11-13 11:00:00 71.7525 71.7825 71.7475 71.7625 39320
19 2019-11-13 11:15:00 71.7625 71.7925 71.7600 71.7875 20190

Note the index in the output starts from 10 and goes up to 19. The default index values would have ranged from 0 to 9.

 

DataFrame manipulation—renaming, rearranging, reversing, and slicing

After creating a DataFrame object, you can perform various operations on it. This recipe covers the following operations on DataFrame objects. Renaming a column, rearranging columns, reversing the DataFrame, and slicing the DataFrame to extract a row, column, and a subset of data.

 

Getting ready

Make sure the df object is available in your Python namespace. Refer to Creating a pandas.DataFrame object recipe of this chapter to set up this object.

 

How to do it…

Execute the following steps for this recipe:

  1. Rename the date column to timestamp for df. Print it:
>>> df.rename(columns={'date':'timestamp'}, inplace=True)
>>> df

We get the following output:

            timestamp    open    high     low   close volume
0 2019-11-13 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
1 2019-11-13 09:15:00 71.7925 71.8000 71.7800 71.7925 59252
2 2019-11-13 09:30:00 71.7925 71.8125 71.7600 71.7625 57187
3 2019-11-13 09:45:00 71.7600 71.7650 71.7350 71.7425 43048
4 2019-11-13 10:00:00 71.7425 71.7800 71.7425 71.7775 45863
5 2019-11-13 10:15:00 71.7750 71.8225 71.7700 71.8150 42460
6 2019-11-13 10:30:00 71.8150 71.8300 71.7775 71.7800 62403
7 2019-11-13 10:45:00 71.7750 71.7875 71.7475 71.7525 34090
8 2019-11-13 11:00:00 71.7525 71.7825 71.7475 71.7625 39320
9 2019-11-13 11:15:00 71.7625 71.7925 71.7600 71.7875 20190
  1. Create a new DataFrame object by rearranging the columns in df:
>>> df.reindex(columns=[
'volume',
'close',
'timestamp',
'high',
'open',
'low'
])

We get the following output:

  volume   close           timestamp    high    open     low
0 219512 71.7925 2019-11-13 09:00:00 71.8450 71.8075 71.7775
1 59252 71.7925 2019-11-13 09:15:00 71.8000 71.7925 71.7800
2 57187 71.7625 2019-11-13 09:30:00 71.8125 71.7925 71.7600
3 43048 71.7425 2019-11-13 09:45:00 71.7650 71.7600 71.7350
4 45863 71.7775 2019-11-13 10:00:00 71.7800 71.7425 71.7425
5 42460 71.8150 2019-11-13 10:15:00 71.8225 71.7750 71.7700
6 62403 71.7800 2019-11-13 10:30:00 71.8300 71.8150 71.7775
7 34090 71.7525 2019-11-13 10:45:00 71.7875 71.7750 71.7475
8 39320 71.7625 2019-11-13 11:00:00 71.7825 71.7525 71.7475
9 20190 71.7875 2019-11-13 11:15:00 71.7925 71.7625 71.7600
  1. Create a new DataFrame object by reversing the rows in df:
>>> df[::-1]

We get the following output:

            timestamp    open    high     low   close volume
9 2019-11-13 11:15:00 71.7625 71.7925 71.7600 71.7875 20190
8 2019-11-13 11:00:00 71.7525 71.7825 71.7475 71.7625 39320
7 2019-11-13 10:45:00 71.7750 71.7875 71.7475 71.7525 34090
6 2019-11-13 10:30:00 71.8150 71.8300 71.7775 71.7800 62403
5 2019-11-13 10:15:00 71.7750 71.8225 71.7700 71.8150 42460
4 2019-11-13 10:00:00 71.7425 71.7800 71.7425 71.7775 45863
3 2019-11-13 09:45:00 71.7600 71.7650 71.7350 71.7425 43048
2 2019-11-13 09:30:00 71.7925 71.8125 71.7600 71.7625 57187
1 2019-11-13 09:15:00 71.7925 71.8000 71.7800 71.7925 59252
0 2019-11-13 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
  1. Extract the close column from df:
>>> df['close']

We get the following output:

0    71.7925
1 71.7925
2 71.7625
3 71.7425
4 71.7775
5 71.8150
6 71.7800
7 71.7525
8 71.7625
9 71.7875
Name: close, dtype: float64
  1. Extract the first row from df:
>>> df.iloc[0]

We get the following output:

timestamp    2019-11-13 09:00:00
open 71.8075
high 71.845
low 71.7775
close 71.7925
volume 219512
Name: 10, dtype: object
  1. Extract a 2 × 2 matrix with the first two rows and first two columns only:
>>> df.iloc[:2, :2]

We get the following output:

            timestamp    open
0 2019-11-13 09:00:00 71.8075
1 2019-11-13 09:15:00 71.7925
 

How it works...

Renaming: In step 1, you rename the date column to timestamp using the rename() method of pandas DataFrame. You pass the columns argument as a dictionary with the existing names to be replaced as keys and their new names as the corresponding values. You also pass the inplace argument as True so that df is modified directly. If it is not passed, the default value is False, meaning a new DataFrame would be created instead of modifying df.

Rearranging: In step 2, you use the reindex() method to create a new DataFrame from df by rearranging its columns. You pass the columns argument with a list of column names as strings in the required order.

Revering: In step 3, you create a new DataFrame from df with its rows reversed by using the indexing operator in a special way - [::-1]. This is similar to the way we reverse regular Python lists.

Slicing: In step 4, you extract the column close by using the indexing operator on df. You pass the column name, close, as the index here. The return data is a pandas.Series object. You can use the iloc property on DataFrame objects to extract a row, a column, or a subset DataFrame object. In step 5, you extract the first-row using iloc with 0 as the index. The return data is a pandas.Series object In step 6, you extract a 2x2 subset from df using iloc with (:2, :2) as the index. This implies all data in rows until index 2 (which are 0 and 1) and columns until index 2 (which again are 0 and 1) would be extracted. The return data is a pandas.DataFrame object.

For all the operations shown in this recipe where a new DataFrame object is returned, the original DataFrame object remains unchanged.
 

There's more

The .iloc() property can also be used to extract a column from a DataFrame. This is shown in the following code.

Extract the 4th column from df. Observe the output:

>>> df.iloc[:, 4]

We get the following output:

0    71.7925
1 71.7925
2 71.7625
3 71.7425
4 71.7775
5 71.8150
6 71.7800
7 71.7525
8 71.7625
9 71.7875
Name: close, dtype: float64

Note that this output and the output of step 4 are identical.

 

DataFrame manipulation—applying, sorting, iterating, and concatenating

Adding to the previous recipe, this recipe demonstrates more operations that can be performed on DataFrame objects: applying a function to all elements in a column, sorting based on a column, iterating over the rows, and concatenating multiple DataFrame objects vertically and horizontally.

 

Getting ready

Make sure you have followed the previous recipe before trying out this recipe. Ensure you have df in your Python namespace from the previous recipe.

 

How to do it…

Execute the following steps for this recipe:

  1. Import the necessary modules
>>> import random
>>> import pandas
  1. Modify the values in the timestamp column of df with a different date and time format DD-MM-YYYY HH:MM:SS:
>>> df['timestamp'] = df['timestamp'].apply(
lambda x: x.strftime("%d-%m-%Y %H:%M:%S"))
>>> df

We get the following output:

            timestamp    open    high     low   close volume
0 13-11-2019 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
1 13-11-2019 09:15:00 71.7925 71.8000 71.7800 71.7925 59252
2 13-11-2019 09:30:00 71.7925 71.8125 71.7600 71.7625 57187
3 13-11-2019 09:45:00 71.7600 71.7650 71.7350 71.7425 43048
4 13-11-2019 10:00:00 71.7425 71.7800 71.7425 71.7775 45863
5 13-11-2019 10:15:00 71.7750 71.8225 71.7700 71.8150 42460
6 13-11-2019 10:30:00 71.8150 71.8300 71.7775 71.7800 62403
7 13-11-2019 10:45:00 71.7750 71.7875 71.7475 71.7525 34090
8 13-11-2019 11:00:00 71.7525 71.7825 71.7475 71.7625 39320
9 13-11-2019 11:15:00 71.7625 71.7925 71.7600 71.7875 20190
  1. Create a new DataFrame object by sorting the close column in ascending order:
>>> df.sort_values(by='close', ascending=True)

We get the following output:

            timestamp    open    high     low   close volume
3 13-11-2019 09:45:00 71.7600 71.7650 71.7350 71.7425 43048
7 13-11-2019 10:45:00 71.7750 71.7875 71.7475 71.7525 34090
2 13-11-2019 09:30:00 71.7925 71.8125 71.7600 71.7625 57187
8 13-11-2019 11:00:00 71.7525 71.7825 71.7475 71.7625 39320
4 13-11-2019 10:00:00 71.7425 71.7800 71.7425 71.7775 45863
6 13-11-2019 10:30:00 71.8150 71.8300 71.7775 71.7800 62403
9 13-11-2019 11:15:00 71.7625 71.7925 71.7600 71.7875 20190
0 13-11-2019 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
1 13-11-2019 09:15:00 71.7925 71.8000 71.7800 71.7925 59252
5 13-11-2019 10:15:00 71.7750 71.8225 71.7700 71.8150 42460
  1. Create a new DataFrame object by sorting the open column in descending order:
>>> df.sort_values(by='open', ascending=False)

We get the following output:

            timestamp    open    high     low   close volume
6 13-11-2019 10:30:00 71.8150 71.8300 71.7775 71.7800 62403
0 13-11-2019 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
2 13-11-2019 09:30:00 71.7925 71.8125 71.7600 71.7625 57187
1 13-11-2019 09:15:00 71.7925 71.8000 71.7800 71.7925 59252
7 13-11-2019 10:45:00 71.7750 71.7875 71.7475 71.7525 34090
5 13-11-2019 10:15:00 71.7750 71.8225 71.7700 71.8150 42460
9 13-11-2019 11:15:00 71.7625 71.7925 71.7600 71.7875 20190
3 13-11-2019 09:45:00 71.7600 71.7650 71.7350 71.7425 43048
8 13-11-2019 11:00:00 71.7525 71.7825 71.7475 71.7625 39320
4 13-11-2019 10:00:00 71.7425 71.7800 71.7425 71.7775 45863
  1. Iterate over df to find the average of open, close, high, and low values for each row:
>>> for _, row in df.iterrows():
avg = (row['open'] + row['close'] + row['high'] +
row['low'])/4
print(f"Index: {_} | Average: {avg}")

We get the following output:

Index: 0 | Average: 71.805625
Index: 1 | Average: 71.79124999999999
Index: 2 | Average: 71.781875
Index: 3 | Average: 71.750625
Index: 4 | Average: 71.760625
Index: 5 | Average: 71.795625
Index: 6 | Average: 71.800625
Index: 7 | Average: 71.765625
Index: 8 | Average: 71.76124999999999
Index: 9 | Average: 71.775625
  1. Iterate column-wise over all the values of the first row of df:
>>> for value in df.iloc[0]:
print(value)

We get the following output:

13-11-2019 09:00:00
71.8075
71.845
71.7775
71.7925
219512
  1. Create a sample time-series data as a list of dictionary objects. Assign it to df_new:
>>> df_new = pandas. DataFrame([
{'timestamp': datetime.datetime(2019, 11, 13, 11, 30),
'open': 71.7875,
'high': 71.8075,
'low': 71.77,
'close': 71.7925,
'volume': 18655},
{'timestamp': datetime.datetime(2019, 11, 13, 11, 45),
'open': 71.7925,
'high': 71.805,
'low': 71.7625,
'close': 71.7625,
'volume': 25648},
{'timestamp': datetime.datetime(2019, 11, 13, 12, 0),
'open': 71.7625,
'high': 71.805,
'low': 71.75,
'close': 71.785,
'volume': 37300},
{'timestamp': datetime.datetime(2019, 11, 13, 12, 15),
'open': 71.785,
'high': 71.7925,
'low': 71.7575,
'close': 71.7775,
'volume': 15431},
{'timestamp': datetime.datetime(2019, 11, 13, 12, 30),
'open': 71.7775,
'high': 71.795,
'low': 71.7725,
'close': 71.79,
'volume': 5178}])
>>> df_new

We get the following output:

            timestamp    open    high     low   close volume
0 2019-11-13 11:30:00 71.7875 71.8075 71.7700 71.7925 18655
1 2019-11-13 11:45:00 71.7925 71.8050 71.7625 71.7625 25648
2 2019-11-13 12:00:00 71.7625 71.8050 71.7500 71.7850 37300
3 2019-11-13 12:15:00 71.7850 71.7925 71.7575 71.7775 15431
4 2019-11-13 12:30:00 71.7775 71.7950 71.7725 71.7900 5178
  1. Create a new DataFrame by concatenating df and df_new vertically:
>>> pandas.concat([df, df_new]).reset_index(drop=True)

We get the following output:

             timestamp    open    high     low   close volume
0 13-11-2019 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
1 13-11-2019 09:15:00 71.7925 71.8000 71.7800 71.7925 59252
2 13-11-2019 09:30:00 71.7925 71.8125 71.7600 71.7625 57187
3 13-11-2019 09:45:00 71.7600 71.7650 71.7350 71.7425 43048
4 13-11-2019 10:00:00 71.7425 71.7800 71.7425 71.7775 45863
5 13-11-2019 10:15:00 71.7750 71.8225 71.7700 71.8150 42460
6 13-11-2019 10:30:00 71.8150 71.8300 71.7775 71.7800 62403
7 13-11-2019 10:45:00 71.7750 71.7875 71.7475 71.7525 34090
8 13-11-2019 11:00:00 71.7525 71.7825 71.7475 71.7625 39320
9 13-11-2019 11:15:00 71.7625 71.7925 71.7600 71.7875 20190
10 2019-11-13 11:30:00 71.7875 71.8075 71.7700 71.7925 18655
11 2019-11-13 11:45:00 71.7925 71.8050 71.7625 71.7625 25648
12 2019-11-13 12:00:00 71.7625 71.8050 71.7500 71.7850 37300
13 2019-11-13 12:15:00 71.7850 71.7925 71.7575 71.7775 15431
14 2019-11-13 12:30:00 71.7775 71.7950 71.7725 71.7900 5178
 

How it works...

In step 1, you import the pandas package.

Applying: In step 2, you modify all the values in the timestamp column of df by using the apply method. This method takes a function as an input to be applied. You pass a lambda function here which expects a datetime object as a single input, and converts it to a string in the required format using strftime(). (Refer to Converting a datetime object to a string recipe for more details on strftime()). The apply method is called on the timestamp column of df, which is a pandas.Series object. The lambda function is applied to each value in the column. This call returns a new pandas.Series object, which you assign back to the timestamp column of df. Note, after this, the timestamp column of df holds timestamps as string objects, and not datetime objects as earlier.

Sorting: In step 3, you create a new DataFrame object by sorting the close column of df in ascending order. You use the sort_values() method to perform the sorting. Similarly, in step 4, you create a new DataFrame object by sorting the open column of df in descending order.

Iterating: In step 5, you iterate over df using the iterrows() method to find and print the average of open, close, high, and low values for each row. The iterrows() method iterates over each row as an (index, pandas.Series) pair. In step 6, you iterate over all the values of the first row of df using df.iloc[0]. You get the timestamp, open, high, low, close, and volume column values for the first row as the output.

Concatenation: In step 6, you create a new DataFrame similar to the one created in step 2 of Creating a pandas.DataFrame object recipe, and assign it to df_new. You use the pandas.concat() function to create a new DataFrame by vertically concatenating dt and df_new. This implies that a new DataFrame would be created with the rows of df_new appended below the rows of df. You pass a list containing df and df_new as an argument to the pandas.concat() function. Also, to create a fresh index starting from 0, you use the reset_index() method with the argument drop passed as True. If you don't use reset_index(), the indices of the concatenated DataFrame would look something like this—0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4. (Refer to Creating a pandas.DataFrame object recipe to know more about the DataFrame index.)

 

There's more

You can also use the pandas.concat() function to concatenate two DataFrame objects together horizontally, which is column-wise by, passing the axis argument a value of 1 to the pandas.concat() method. This is shown in the following steps:

  1. Import random module from the Python standard library:
>>> import random
  1. Create a DataFrame object with a single column, open, and random values. Assign it to df1 and print it:
>>> df1 = pandas.DataFrame([random.randint(1,100) for i in 
range(10)], columns=['open'])
>>> df1

We get the following output. Your output may differ:

   open
0 99
1 73
2 16
3 53
4 47
5 74
6 21
7 22
8 2
9 30
  1. Create another DataFrame object with a single column, close, and random values. Assign it to df2 and print it:
>>> df2 = pandas.DataFrame([random.randint(1,100) for i in 
range(10)], columns=['close'])
>>> df2

We get the following output:

   close
0 63
1 84
2 44
3 56
4 25
5 1
6 41
7 55
8 93
9 82
  1. Create a new DataFrame by concatenating df1 and df2 horizontally
>>> pandas.concat([df1, df2], axis=1)

We get the following output. Your output may differ:

    open  close
0 99 93
1 73 42
2 16 57
3 53 56
4 47 25
5 74 1
6 21 41
7 22 55
8 2 93
9 30 82
 

Converting a DataFrame into other formats

This recipe demonstrates the conversion of DataFrame objects into other formats, such as .csv files, json objects, and pickle objects. Conversion into a .csv file makes it easier to further work on the data using a spreadsheet application. The json format is useful for transmitting DataFrame objects over web APIs. The pickle format is useful for transmitting DataFrame objects created in one Python session to another Python session over sockets without having to recreate them.

 

Getting ready

Make sure the object df is available in your Python namespace. Refer to Creating a pandas.DataFrame object recipe of this chapter to set up this object.

 

How to do it…

Execute the following steps for this recipe:

  1. Convert and save df as a CSV file:
>>> df.to_csv('dataframe.csv', index=False)
  1. Convert df to a JSON string:
>>> df.to_json()

We get the following output:

'{
"timestamp":{
"0":"13-11-2019 09:00:00","1":"13-11-2019 09:15:00",
"2":"13-11-2019 09:30:00","3":"13-11-2019 09:45:00",
"4":"13-11-2019 10:00:00","5":"13-11-2019 10:15:00",
"6":"13-11-2019 10:30:00","7":"13-11-2019 10:45:00",
"8":"13-11-2019 11:00:00","9":"13-11-2019 11:15:00"},
"open":{
"0":71.8075,"1":71.7925,"2":71.7925, "3":71.76,
"4":71.7425,"5":71.775,"6":71.815, "7":71.775,
"8":71.7525,"9":71.7625},
"high"{
"0":71.845,"1":71.8,"2":71.8125,"3":71.765,
"4":71.78,"5":71.8225,"6":71.83,"7":71.7875,
"8":71.7825,"9":71.7925},
"low":{
"0":71.7775,"1":71.78,"2":71.76,"3":71.735,
"4":71.7425,"5":71.77,"6":71.7775,"7":71.7475,
"8":71.7475,"9":71.76},
"close":{
"0":71.7925,"1":71.7925,"2":71.7625,"3":71.7425,
"4":71.7775,"5":71.815,"6":71.78,"7":71.7525,
"8":71.7625,"9":71.7875},
"volume":{
"0":219512,"1":59252,"2":57187,"3":43048,
"4":45863,"5":42460,"6":62403,"7":34090,
"8":39320,"9":20190}}'
  1. Pickle df to a file:
>>> df.to_pickle('df.pickle')
 

How it works...

In step 1, you use the to_csv() method to save df as a .csv file. You pass dataframe.csv, a file path where the .csv file should be generated, as the first argument and index as False as the second argument. Passing index as False prevents the index from being dumped to the .csv file. If you want to save the DataFrame along with its index, you can pass the index as True to the to_csv() method.

In step 2, you use the to_json() method to convert df into a JSON string. You do not pass any additional arguments to the to_json() method.

In step 3, you use the to_pickle() method to pickle (serialize) the object. Again you do not pass any additional arguments to the to_pickle() method.

The methods to_csv(), to_json(), and to_pickle() can take more optional arguments than the ones shown in this recipe. Refer to the official docs for complete information on these methods:
 

Creating a DataFrame from other formats

In this recipe, you will create DataFrame objects from other formats, such as .csv files, .json strings, and pickle files. A .csv file created using a spreadsheet application, valid JSON data received over web APIs, or valid pickle objects received over sockets can all be processed further using Python by converting them to DataFrame objects.

Loading pickled data received from untrusted sources can be unsafe. Please use read_pickle() with caution. You can find more details here: https://docs.python.org/3/library/pickle.html. If you are using this function on the pickle file created in the previous recipe, it is perfectly safe to use read_pickle().
 

Getting ready

Make sure you have followed the previous recipe before starting this recipe.

 

How to do it…

Execute the following steps for this recipe:

  1. Create a DataFrame object by reading a CSV file:
>>> pandas.read_csv('dataframe.csv')

We get the following output:

            timestamp    open    high     low   close volume
0 2019-11-13 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
1 2019-11-13 09:15:00 71.7925 71.8000 71.7800 71.7925 59252
2 2019-11-13 09:30:00 71.7925 71.8125 71.7600 71.7625 57187
3 2019-11-13 09:45:00 71.7600 71.7650 71.7350 71.7425 43048
4 2019-11-13 10:00:00 71.7425 71.7800 71.7425 71.7775 45863
5 2019-11-13 10:15:00 71.7750 71.8225 71.7700 71.8150 42460
6 2019-11-13 10:30:00 71.8150 71.8300 71.7775 71.7800 62403
7 2019-11-13 10:45:00 71.7750 71.7875 71.7475 71.7525 34090
8 2019-11-13 11:00:00 71.7525 71.7825 71.7475 71.7625 39320
9 2019-11-13 11:15:00 71.7625 71.7925 71.7600 71.7875 20190
  1. Create a DataFrame object by reading a JSON string:
>>> pandas.read_json("""{
"timestamp": {
"0":"13-11-2019 09:00:00", "1":"13-11-2019 09:15:00",
"2":"13-11-2019 09:30:00","3":"13-11-2019 09:45:00",
"4":"13-11-2019 10:00:00","5":"13-11-2019 10:15:00",
"6":"13-11-2019 10:30:00","7":"13-11-2019 10:45:00",
"8":"13-11-2019 11:00:00","9":"13-11-2019 11:15:00"},

"open":{
"0":71.8075,"1":71.7925,"2":71.7925,"3":71.76,
"4":71.7425,"5":71.775,"6":71.815,"7":71.775,
"8":71.7525,"9":71.7625},

"high":{
"0":71.845,"1":71.8,"2":71.8125,"3":71.765,"4":71.78,
"5":71.8225,"6":71.83,"7":71.7875,"8":71.7825,
"9":71.7925},

"low":{
"0":71.7775,"1":71.78,"2":71.76,"3":71.735,"4":71.7425,
"5":71.77,"6":71.7775,"7":71.7475,"8":71.7475,
"9":71.76},

"close":{
"0":71.7925,"1":71.7925,"2":71.7625,"3":71.7425,
"4":71.7775,"5":71.815,"6":71.78,"7":71.7525,
"8":71.7625,"9":71.7875},

"volume":{
"0":219512,"1":59252,"2":57187,"3":43048,"4":45863,
"5":42460,"6":62403,"7":34090,"8":39320,"9":20190}}
""")

We get the following output:

            timestamp    open    high     low   close volume
0 2019-11-13 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
1 2019-11-13 09:15:00 71.7925 71.8000 71.7800 71.7925 59252
2 2019-11-13 09:30:00 71.7925 71.8125 71.7600 71.7625 57187
3 2019-11-13 09:45:00 71.7600 71.7650 71.7350 71.7425 43048
4 2019-11-13 10:00:00 71.7425 71.7800 71.7425 71.7775 45863
5 2019-11-13 10:15:00 71.7750 71.8225 71.7700 71.8150 42460
6 2019-11-13 10:30:00 71.8150 71.8300 71.7775 71.7800 62403
7 2019-11-13 10:45:00 71.7750 71.7875 71.7475 71.7525 34090
8 2019-11-13 11:00:00 71.7525 71.7825 71.7475 71.7625 39320
9 2019-11-13 11:15:00 71.7625 71.7925 71.7600 71.7875 20190
  1. Create a DataFrame object by unpickling the df.pickle file:
>>> pandas.read_pickle('df.pickle')

We get the following output:

            timestamp    open    high     low   close volume
0 2019-11-13 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
1 2019-11-13 09:15:00 71.7925 71.8000 71.7800 71.7925 59252
2 2019-11-13 09:30:00 71.7925 71.8125 71.7600 71.7625 57187
3 2019-11-13 09:45:00 71.7600 71.7650 71.7350 71.7425 43048
4 2019-11-13 10:00:00 71.7425 71.7800 71.7425 71.7775 45863
5 2019-11-13 10:15:00 71.7750 71.8225 71.7700 71.8150 42460
6 2019-11-13 10:30:00 71.8150 71.8300 71.7775 71.7800 62403
7 2019-11-13 10:45:00 71.7750 71.7875 71.7475 71.7525 34090
8 2019-11-13 11:00:00 71.7525 71.7825 71.7475 71.7625 39320
9 2019-11-13 11:15:00 71.7625 71.7925 71.7600 71.7875 20190
 

How it works...

In step 1, you use the pandas.read_csv() function to create a DataFrame object from a .csv file. You pass dataframe.csv, the file path from where the .csv file should be read, as an argument. Recall, you have created dataframe.csv in step 1 of the previous recipe.

In step 2, you use the pandas.read_json() function to create a DataFrame object from a valid JSON string. You pass the JSON string from the output of step 2 in the previous recipe as an argument to this function.

In step 3, you use the pandas.read_pickle() method to create a DataFrame object from a pickle file. You pass df.pickle, the file path from where the pickle file should be read, as an argument to this function. Recall, what you created df.pickle in step 3 of the previous recipe.

If you have followed the previous recipe, the outputs for all the three steps would all be the same DataFrame object. And this would be identical to df from the previous recipe.

The methods read_csv(), read_json(), and read_pickle() can take more optional arguments than the ones shown in this recipe. Refer to the official docs for complete information on these methods.

About the Author

Recommended For You

Machine Learning for Algorithmic Trading - Second Edition

Leverage machine learning to design and back-test automated trading strategies for real-world markets using pandas, TA-Lib, scikit-learn, LightGBM, SpaCy, Gensim, TensorFlow 2, Zipline, backtrader, Alphalens, and pyfolio.

By Stefan Jansen
Artificial Intelligence with Python - Second Edition

New edition of the bestselling guide to artificial intelligence with Python, updated to Python 3.x, with seven new chapters that cover RNNs, AI and Big Data, fundamental use cases, chatbots, and more.

By Alberto Artasanchez and 1 more
Mastering Python Networking - Third Edition

New edition of the bestselling guide to mastering Python Networking, updated to Python 3 and including the latest on network data analysis, Cloud Networking, Ansible 2.8, and new libraries

By Eric Chou
The Complete Python Course [Video]

Go from beginner to expert in Python by building projects. The best investment for your Python journey!

By Company Eco Web Hosting Ltd and 1 more