Reader small image

You're reading from  Data Wrangling with R

Product typeBook
Published inFeb 2023
PublisherPackt
ISBN-139781803235400
Edition1st Edition
Concepts
Right arrow
Author (1)
Gustavo R Santos
Gustavo R Santos
author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos

Right arrow

Working with Date and Time Objects

Computers are complex machines that can, among many other things, record date and time. The world is so attached to the measurement of time that it drives a great part of our daily lives. We work within a time interval, we are supposed to go to school for a determinate minimum number of years, and even the value of our work is calculated based on time. Likewise, the business world is also run by the wheel of time. This is evident because of the popular saying: time is money.

Dates and times are just like other types of data, holding valuable information and good business insights if you know how to handle them. During a data exploration that involves a datetime variable, questions such as What is the busiest month, day, and hour? Which days of the week have more traffic? or How much was the revenue in the past 3 months? will arise, and we must know how to deal with those objects to facilitate better analysis.

There are three main ways to work...

Technical requirements

Dataset: We will use the Classic Rock dataset, from FiveThirtyEight. To access the original dataset, go to https://github.com/fivethirtyeight/data/tree/master/classic-rock

All the code can be found in the book’s GitHub repository: https://github.com/PacktPublishing/Data-Wrangling-with-R/tree/main/Part2/Chapter6.

Here are the libraries to be used in this chapter:

library(tidyverse)
library(lubridate)

Introduction to date and time

In R, there are three objects related to date and time: date, time, and datetime. The definition is as logical as it looks:

  • date: Object refers to a date YYYY-MM-DD
  • time: Object stores time data, as HH: MM: SS
  • datetime: A combination of both YYYY-MM-DD HH: MM: SS

From now on, I will mostly refer to datetime objects, as these are a combination of both other types and are the most common as well.

It is important to note that computers calculate time based on January 1, 1970, at 00:00:00 UTC. As a side note, UTC means Universal Time Coordinated, formerly known as Greenwich Mean Time (GMT), which is the point that regulates the world time zones. Every calculation of time zone is done from that point, adding or subtracting hours.

In the late 1960s and early 1970s, Unix engineers had to pick a date to use as ground zero when the clock started to count for computers. For the sake of calculation easiness, January 1 was convenient...

Date and time with lubridate

Dates and times have their formatting as the main characteristic to distinguish this type of data. A quick look at the variables where a YYYY-MM-DD number appears is enough to tell that it is a date object. However, as mentioned, computers calculate date and time based on seconds, so it is not difficult to see a dataset that brings a variable date or time as an integer number. In those cases, the solution is to recur to the data dictionary (document with the description of each variable) or to the dataset owner and align if that column should indeed be treated as a datetime object or a regular number. Later in this chapter, we will see this problem in action and how to solve it.

Before that, let’s set the base by learning some fundamental functions that will help us to parse datetime objects, splitting them into separate objects. Once again, I will ask you to go over the table from Figure 6.2 to get familiar with the logic of the lubridate library...

Date and time using regular expressions (regexps)

The datetime functions in lubridate can parse dates out of a good number of cases, even from phrases. Observe how the mdy() function can correctly parse only the date, which is in a weird format, by the way:

# Lubridate parsing
mdy("The championship starts on 10/11-2000")
[1] "2000-10-11"

But certainly, that feature combined with regexp is even more powerful. If we try to use the same mdy() function, this time we will get an error message: Warning: All formats failed to parse. No formats found. Regular expressions can pick every date from a text. Let’s create an example text to help illustrate this exercise:

# Text
t <- "The movie was launched on 10/10/1980. It was a great hype at that time, being the most watched movie on the weeks of 10/10/1980, 10/17/1980, 10/24/1980. Around ten years later, it was chosen as the best picture of the decade. The cast received the prize on 09/20/1990."...

Practicing

Before starting this practice, we should understand that this exercise is good for us to know the possibilities of working with datetime objects. However, there are some functions and libraries that we still did not fully cover, so you might see new functions in this section. Don’t worry. We will cover all of this in this book, and you can always come back to this chapter later to review the more challenging code.

Let’s practice the use of datetime variables using a dataset from FiveThirtyEight, about classic rock. The dataset has observations of songs played in many radio stations in one week of June 2014, which we can use to gain some insights about that period in time.

The variables in this dataset are as follows:

  • SONG RAW: Song title

Song Clean: Song title after cleaning up the name, removing not unmeaningful words such as live

ARTIST RAW: Artist name

ARTIST CLEAN: Artist name after removal of nonmeaningful elements and correcting...

Summary

We progressed a lot in this chapter and learned so much about datetime objects and variables. Knowing how to use them will enhance your analytical skills, opening room for better insights.

We started this chapter by learning how to create data objects. Next, we acquired knowledge on how to make good use of the lubridate library, and we are now able to parse dates in many different formats.

After that, the subject changed to math operations with date and time and all the specificities that surround it, including the usage of time zones. That was followed by guidance on how to use the customization power of regexp to parse dates out of texts.

Closing the chapter, we viewed a practical exercise where we used a dataset about songs containing a datetime variable, and how you can use that to extract insight for analysis.

The end of this chapter is also the completion of the building blocks for data wrangling. With it, we now have worked with the three major object types...

Exercises

  1. What are date, time, and datetime objects?
  2. Name one function used to create a datetime object.
  3. List some of the parsing functions from lubridate to extract periods of time from datetime objects.
  4. What is a period object?
  5. What is an example of usage for the duration object?
  6. How can you add or subtract dates?

Further reading

https://github.com/fivethirtyeight/data/tree/master/classic-rock

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Wrangling with R
Published in: Feb 2023Publisher: PacktISBN-13: 9781803235400
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos