Reader small image

You're reading from  Extending Excel with Python and R

Product typeBook
Published inApr 2024
PublisherPackt
ISBN-139781804610695
Edition1st Edition
Right arrow
Authors (2):
Steven Sanderson
Steven Sanderson
author image
Steven Sanderson

Steven Sanderson, MPH, is an applications manager for the patient accounts department at Stony Brook Medicine. He received his bachelor's degree in economics and his master's in public health from Stony Brook University. He has worked in healthcare in some capacity for just shy of 20 years. He is the author and maintainer of the healthyverse set of R packages. He likes to read material related to social and labor economics and has recently turned his efforts back to his guitar with the hope that his kids will follow suit as a hobby they can enjoy together.
Read more about Steven Sanderson

David Kun
David Kun
author image
David Kun

David Kun is a mathematician and actuary who has always worked in the gray zone between quantitative teams and ICT, aiming to build a bridge. He is a co-founder and director of Functional Analytics and the creator of the ownR Infinity platform. As a data scientist, he also uses ownR for his daily work. His projects include time series analysis for demand forecasting, computer vision for design automation, and visualization.
Read more about David Kun

View More author details
Right arrow

Reading Excel files to R

In this section, we are going to read data from Excel with a few different R libraries. We need to do this before we can even consider performing any type of manipulation or analysis on the data contained in the sheets of the Excel files.

As mentioned in the Technical requirements section, we are going to be using the readxl, openxlsx, and xlsx packages to read data into R.

Installing and loading libraries

In this section, we are going to install and load the necessary libraries if you do not yet have them. We are going to use the openxlsx, xlsx, readxl, and readxlsb libraries. To install and load them, run the following code block:

pkgs <- c("openxlsx", "xlsx", "readxl")
install.packages(pkgs, dependencies = TRUE)
lapply(pkgs, library, character.only = TRUE)

The lapply() function in R is a versatile tool for applying a function to each element of a list, vector, or DataFrame. It takes two arguments, x and FUN, where x is the list and FUN is the function that is applied to the list object, x.

Now that the libraries have been installed, we can get to work. To do this, we are going to read a spreadsheet built from the Iris dataset that is built into base R. We are going to read the file with three different libraries, and then we are going to create a custom function to work with the readxl library that will read all the sheets of an Excel file. We will call this the read_excel_sheets() function.

Let’s start reading the files. The first library we will use to open an Excel file is openxlsx. To read the Excel file we are working with, you can run the code in the chapter1 folder of this book’s GitHub repository called ch1_create_iris_dataset.R Refer to the following screenshot to see how to read the file into R.

You will notice a variable called f_pat. This is the path to where the Iris dataset was saved as an Excel file – for example, C:/User/UserName/Documents/iris_data.xlsx:

Figure 1.1 – Using the openxlsx package to read the Excel file

Figure 1.1 – Using the openxlsx package to read the Excel file

The preceding screenshot shows how to read an Excel file. This example assumes that you have used the ch1_create_iris_datase.R file to create the example Excel file. In reality, you can read in any Excel file that you would like or need.

Now, we will perform the same type of operation, but this time with the xlsx library. Refer to the following screenshot, which uses the same methodology as with the openxlsx package:

Figure 1.2 – Using the xlsx library and the read.xlsx() function to open the Excel file we’ve created

Figure 1.2 – Using the xlsx library and the read.xlsx() function to open the Excel file we’ve created

Finally, we will use the readxl library, which is part of the tidyverse:

Figure 1.3 – Using the readxl library and the read_excel() function to read the Excel file into memory

Figure 1.3 – Using the readxl library and the read_excel() function to read the Excel file into memory

In this section, we learned how to read in an Excel file with a few different packages. While these packages can do more than simply read in an Excel file, that is what we needed to focus on in this section. You should now be familiar with how to use the readxl::read_excel(), xlsx::read.xlsx(), and openxlsx::read.xlsx() functions.

Building upon our expertise in reading Excel files into R, we’ll now embark on the next phase of our journey: unraveling the secrets of efficiently extracting data from multiple sheets within an Excel file.

Previous PageNext Page
You have been reading a chapter from
Extending Excel with Python and R
Published in: Apr 2024Publisher: PacktISBN-13: 9781804610695
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (2)

author image
Steven Sanderson

Steven Sanderson, MPH, is an applications manager for the patient accounts department at Stony Brook Medicine. He received his bachelor's degree in economics and his master's in public health from Stony Brook University. He has worked in healthcare in some capacity for just shy of 20 years. He is the author and maintainer of the healthyverse set of R packages. He likes to read material related to social and labor economics and has recently turned his efforts back to his guitar with the hope that his kids will follow suit as a hobby they can enjoy together.
Read more about Steven Sanderson

author image
David Kun

David Kun is a mathematician and actuary who has always worked in the gray zone between quantitative teams and ICT, aiming to build a bridge. He is a co-founder and director of Functional Analytics and the creator of the ownR Infinity platform. As a data scientist, he also uses ownR for his daily work. His projects include time series analysis for demand forecasting, computer vision for design automation, and visualization.
Read more about David Kun