Reader small image

You're reading from  Extending Excel with Python and R

Product typeBook
Published inApr 2024
PublisherPackt
ISBN-139781804610695
Edition1st Edition
Right arrow
Authors (2):
Steven Sanderson
Steven Sanderson
author image
Steven Sanderson

Steven Sanderson, MPH, is an applications manager for the patient accounts department at Stony Brook Medicine. He received his bachelor's degree in economics and his master's in public health from Stony Brook University. He has worked in healthcare in some capacity for just shy of 20 years. He is the author and maintainer of the healthyverse set of R packages. He likes to read material related to social and labor economics and has recently turned his efforts back to his guitar with the hope that his kids will follow suit as a hobby they can enjoy together.
Read more about Steven Sanderson

David Kun
David Kun
author image
David Kun

David Kun is a mathematician and actuary who has always worked in the gray zone between quantitative teams and ICT, aiming to build a bridge. He is a co-founder and director of Functional Analytics and the creator of the ownR Infinity platform. As a data scientist, he also uses ownR for his daily work. His projects include time series analysis for demand forecasting, computer vision for design automation, and visualization.
Read more about David Kun

View More author details
Right arrow

Reading multiple sheets with readxl and a custom function

In Excel, we often encounter workbooks that have multiple sheets in them. These could be stats for different months of the year, table data that follows a specific format month over month, or some other period. The point is that we may want to read all the sheets in a file for one reason or another, and we should not call the read function from a particular package for each sheet. Instead, we should use the power of R to loop through this with purrr.

Let’s build a customized function. To do this, we are going to load the readxl function. If we have it already loaded, then this is not necessary; however, if it is already installed and you do not wish to load the library into memory, then you can call the excel_sheets() function by using readxl::excel_sheets():

Figure 1.4 – Creating a function to read all the sheets into an Excel file at once – read_excel_sheets()

Figure 1.4 – Creating a function to read all the sheets into an Excel file at once – read_excel_sheets()

The new code can be broken down as follows:

 read_excel_sheets <- function(filename, single_tbl) {

This line defines a function called read_excel_sheets that takes two arguments: filename (the name of the Excel file to be read) and single_tbl (a logical value indicating whether the function should return a single table or a list of tables).

Next, we have the following line:

sheets <- readxl::excel_sheets(filename)

This line uses the readxl package to extract the names of all the sheets in the Excel file specified by filename. The sheet names are stored in the sheets variable.

Here’s the next line:

if (single_tbl) {

This line starts an if statement that checks the value of the single_tbl argument.

Now, we have the following:

x <- purrr::map_df(sheets, read_excel, path = filename)

If single_tbl is TRUE, this line uses the purrr package’s map_df function to iterate over each sheet name in sheets and read the corresponding sheet using the read_excel function from the readxl package. The resulting DataFrame are combined into a single table, which is assigned to the x variable.

Now, we have the following line:

} else {

This line indicates the start of the else block of the if statement. If single_tbl is FALSE, the code in this block will be executed.

Here’s the next line:

 x <- purrr::map(sheets, ~ readxl::read_excel(filename, sheet = .x))

In this line, the purrr package’s map function is used to iterate over each sheet name in sheets. For each sheet, the read_excel function from the readxl package is called to read the corresponding sheet from the Excel file specified by filename. The resulting DataFrame are stored in a list assigned to the x variable.

Now, we have the following:

 purrr::set_names(x, sheets)

This line uses the set_names function from the purrr package to set the names of the elements in the x list to the sheet names in sheets.

Finally, we have the following line:

 x

This line returns the value of x from the function, which will be either a single table (data.frame) if single_tbl is TRUE, or a list of tables (data.frame) if single_tbl is FALSE.

In summary, the read_excel_sheets function takes an Excel filename and a logical value indicating whether to return a single table or a list of tables. It uses the readxl package to extract the sheet names from the Excel file, and then reads the corresponding sheets either into a single table (if single_tbl is TRUE) or into a list of tables (if single_tbl is FALSE). The resulting data is returned as the output of the function. To see how this works, let’s look at the following example.

We have a spreadsheet that has four tabs in it – one for each species in the famous Iris dataset and then one sheet called iris, which is the full dataset.

As shown in Figure 1.5, the read_excel_sheets() function has read all four sheets of the Excel file. We can also see that the function has imported the sheets as a list object and has named each item in the list after the name of the corresponding tab in the Excel file. It is also important to note that the sheets must all have the same column names and structure for this to work:

Figure 1.5 – Excel file read by read_excel_sheets()

Figure 1.5 – Excel file read by read_excel_sheets()

In this section, we learned how to write a function that will read all of the sheets in any Excel file. This function will also return them as a named item list, where the names are the names of the tabs in the file itself.

Now that we have learned how to read Excel sheets in R, in the next section, we will cover Python, where we will revisit the same concepts but from the perspective of the Python language.

Previous PageNext Page
You have been reading a chapter from
Extending Excel with Python and R
Published in: Apr 2024Publisher: PacktISBN-13: 9781804610695
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (2)

author image
Steven Sanderson

Steven Sanderson, MPH, is an applications manager for the patient accounts department at Stony Brook Medicine. He received his bachelor's degree in economics and his master's in public health from Stony Brook University. He has worked in healthcare in some capacity for just shy of 20 years. He is the author and maintainer of the healthyverse set of R packages. He likes to read material related to social and labor economics and has recently turned his efforts back to his guitar with the hope that his kids will follow suit as a hobby they can enjoy together.
Read more about Steven Sanderson

author image
David Kun

David Kun is a mathematician and actuary who has always worked in the gray zone between quantitative teams and ICT, aiming to build a bridge. He is a co-founder and director of Functional Analytics and the creator of the ownR Infinity platform. As a data scientist, he also uses ownR for his daily work. His projects include time series analysis for demand forecasting, computer vision for design automation, and visualization.
Read more about David Kun