Reader small image

You're reading from  Data Wrangling with R

Product typeBook
Published inFeb 2023
PublisherPackt
ISBN-139781803235400
Edition1st Edition
Concepts
Right arrow
Author (1)
Gustavo R Santos
Gustavo R Santos
author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos

Right arrow

Grouping and summarizing data

Grouping and summarizing are two complementary functions. Generally, they will be used together, as there is not much use in grouping a dataset and not calculating anything or using the groups for a purpose. That is when summarizing plays the important role of transforming the data from each group into a summary or a number that we can understand.

In the business world, requests such as the average number of sales by store, the median number of customers by day, the standard deviation of a distribution, and many other examples, are part of the routine of a data scientist. These tasks can be performed using the group_by() and summarise()functions from dplyr.

Starting with the group_by() function, observe that it alone cannot bring much value:

# group by not summarized
df %>% group_by(workclass)

Here is the result.

Figure 8.9 – Dataset grouped but not summarized

We can see in Figure 8.9 that it worked because...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Data Wrangling with R
Published in: Feb 2023Publisher: PacktISBN-13: 9781803235400

Author (1)

author image
Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos