Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
R High Performance Programming

You're reading from  R High Performance Programming

Product type Book
Published in Jan 2015
Publisher
ISBN-13 9781783989263
Pages 176 pages
Edition 1st Edition
Languages

Table of Contents (17) Chapters

R High Performance Programming
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Understanding R's Performance – Why Are R Programs Sometimes Slow? Profiling – Measuring Code's Performance Simple Tweaks to Make R Run Faster Using Compiled Code for Greater Speed Using GPUs to Run R Even Faster Simple Tweaks to Use Less RAM Processing Large Datasets with Limited RAM Multiplying Performance with Parallel Computing Offloading Data Processing to Database Systems R and Big Data Index

Three constraints on computing performance – CPU, RAM, and disk I/O


First, let's see how R programs are executed in a computer. This is a very simplified version of what actually happens, but it suffices for us to understand the performance limitations of R. The following figure illustrates the steps required to execute an R program.

Steps to execute an R program

Take for example, this simple R program, which loads some data from a CSV file, computes the column sums, and writes the results into another CSV file:

data <- read.csv("mydata.csv")
totals <- colSums(data)
write.csv(totals, "totals.csv")

We use the numbering to understand the preceding diagram:

  1. When we load and run an R program, the R code is first loaded into RAM.

  2. The R interpreter then translates the R code into machine code and loads the machine code into the CPU.

  3. The CPU executes the program.

  4. The program loads the data to be processed from the hard disk into RAM (read.csv() in the example).

  5. The data is loaded in small chunks into the CPU for processing.

  6. The CPU processes the data one chunk at a time, and exchanges chunks of data with RAM until all the data has been processed (in the example, the CPU executes the instructions of the colSums() function to compute the column sums on the data set).

  7. Sometimes, the processed data is stored back onto the hard drive (write.csv() in the example).

From this depiction of the computing process, we can see a few places where performance bottlenecks can occur:

  • The speed and performance of the CPU determines how quickly computing instructions, such as colSums() in the example, are executed. This includes the interpretation of the R code into the machine code and the actual execution of the machine code to process the data.

  • The size of RAM available on the computer limits the amount of data that can be processed at any given time. In this example, if the mydata.csv file contains more data than can be held in the RAM, the call to read.csv() will fail.

  • The speed at which the data can be read from or written to the hard disk (read.csv() and write.csv() in the example), that is, the speed of the disk input/output (I/O) affects how quickly the data can be loaded into the memory and stored back onto the hard disk.

Sometimes, you might encounter these limiting factors one at a time. For example, when a dataset is small enough to be quickly read from the disk and fully stored in the RAM, but the computations performed on it are complex, then only the CPU constraint is encountered. At other times, you might find them occurring together in various combinations. For example, when a dataset is very large, it takes a long time to load it from the disk, only one small chunk of it can be loaded at any given time into the memory, and it takes a long time to perform any computations on it. In either case, these are the symptoms of performance problems. In order to diagnose the problems and find solutions for them, we need to look at what is happening behind the scenes that might be causing these constraints to occur.

Let's now take a look at how R is designed and how it works, and see what the implications are for its performance.

You have been reading a chapter from
R High Performance Programming
Published in: Jan 2015 Publisher: ISBN-13: 9781783989263
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}