Reader small image

You're reading from  Learning R Programming

Product typeBook
Published inOct 2016
Reading LevelBeginner
PublisherPackt
ISBN-139781785889776
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Kun Ren
Kun Ren
author image
Kun Ren

Kun Ren has used R for nearly 4 years in quantitative trading, along with C++ and C#, and he has worked very intensively (more than 8-10 hours every day) on useful R packages that the community does not offer yet. He contributes to packages developed by other authors and reports issues to make things work better. He is also a frequent speaker at R conferences in China and has given multiple talks. Kun also has a great social media presence. Additionally, he has substantially contributed to various projects, which is evident from his GitHub account: https://github.com/renkun-ken https://cn.linkedin.com/in/kun-ren-76027530 http://renkun.me/ http://renkun.me/formattable/ http://renkun.me/pipeR/ http://renkun.me/rlist/
Read more about Kun Ren

Right arrow

Chapter 13. High-Performance Computing

In the previous chapter, you learned about a number of built-in functions and various packages tailored for data manipulation. Although these packages rely on different techniques and may be built under a different philosophy, they all make data filtering and aggregating much easier.

However, data processing is more than simple filtering and aggregating. Sometimes, it involves simulation and other computationintensive tasks. Compared to high-performance programming languages such as C and C++, R is much slower due to its dynamic design and the current implementation that prioritizes stability, ease, and power in statistical analysis and visualization over performance and language features. However, well-written R code can still be fast enough for most purposes.

In this chapter, I'll demonstrate the following techniques to help you write R code with high performance:

  • Measuring code performance

  • Profiling code to find bottleneck

  • Using built-in functions and...

Understanding code performance issues


From the very beginning, R is designed for statistical computing and data visualization and is widely used by academia and industry. For most data analysis purposes, correctness is more important than performance. In other words, getting a correct result in 1 minute should be better than getting an incorrect one in 20 seconds. A result that is three times faster is not automatically three times more valid than a slow but correct result. Therefore, performance should not be a concern before you are sure about the correctness of your code.

Let's assume that you are 100 percent sure that your code is correct but it runs a bit slowly. Now, is it necessary for you to optimize the code so that it can run faster. Well, it depends. Before making a decision, it is helpful to divide the time of problem solving into three parts: time of development, execution, and future maintenance.

Suppose we have been working on a problem for an hour. Since we didn't take performance...

Profiling code


In the previous section, you learned how to use microbenchmark() to benchmark expressions. This can be useful when we have several alternative solutions to a problem and want to see which has better performance and when we optimize an expression and want to see whether the performance actually gets better than the original code.

However, it is usually the case that, when we feel the code is slow, it is not easy to locate the expression that contributes most to slowing down the entire program. Such an expression is called a "performance bottleneck." To improve code performance, it is best to resolve the bottleneck first.

Fortunately, R provides profiling tools to help us find the bottleneck, that is, the code that runs most slowly, which should be the top focus for improving code performance.

Profiling code with Rprof

R provides a built-in function, Rprof(), for code profiling. When profiling starts, a sampling procedure is running with all subsequent code until the profiling is...

Boosting code performance


In the previous section, we demonstrated how to use profiling tools to identify a performance bottleneck in the code. In this section, you will learn about a number of approaches to boosting code performance.

Using built-in functions

Previously, we demonstrated the performance difference between my_cumsum1(), my_cumsum2() and the built-in function cumsum(). Although my_cumsum2() is faster than my_cumsum1(), when the input vector contains many numbers, cumsum() is much faster than them. Also, its performance does not decay significantly even as the input gets longer. If we evaluate cumsum, we can see that it is a primitive function:

cumsum 
## function (x)  .Primitive("cumsum") 

A primitive function in R is implemented in C/C++/Fortran, compiled to native instructions, and thus, is extremely efficient. Another example is diff(). Here, we will implement computing vector difference sequence in R:

diff_for <- function(x) { 
  n <- length(x) - 1 
...

Summary


In this chapter, you learned when performance may or may not matter, how to measure the performance of R code, how to use profiling tools to identify the slowest part of code, and why such code can be slow. Then, we introduced the most important ways to boost the code performance: using built-in functions if possible, taking advantage of vectorization, using the byte-code compiler, using parallel computing, writing code in C++ via Rcpp, and using multi-threading techniques in C++. High-performance computing is quite an advanced topic, and there's still a lot more to learn if you want to apply it in practice. This chapter demonstrates that using R does not always mean slow code. Instead, we can achieve high performance if we want.

In the next chapter, we will introduce another useful topic: web scraping. To scrape data from webpages, we need to understand how web pages are structured and how to extract data from their source code. You will learn the basic idea and representation of...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learning R Programming
Published in: Oct 2016Publisher: PacktISBN-13: 9781785889776
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Kun Ren

Kun Ren has used R for nearly 4 years in quantitative trading, along with C++ and C#, and he has worked very intensively (more than 8-10 hours every day) on useful R packages that the community does not offer yet. He contributes to packages developed by other authors and reports issues to make things work better. He is also a frequent speaker at R conferences in China and has given multiple talks. Kun also has a great social media presence. Additionally, he has substantially contributed to various projects, which is evident from his GitHub account: https://github.com/renkun-ken https://cn.linkedin.com/in/kun-ren-76027530 http://renkun.me/ http://renkun.me/formattable/ http://renkun.me/pipeR/ http://renkun.me/rlist/
Read more about Kun Ren