Packt+ | Advance your knowledge in tech

You're reading from Learning R Programming

Product typeBook

Published inOct 2016

Reading LevelBeginner

PublisherPackt

ISBN-139781785889776

Edition1st Edition

Languages

Tools

RStudio

Concepts

Programming Language

Author (1)

Kun Ren

Chapter 13. High-Performance Computing

In the previous chapter, you learned about a number of built-in functions and various packages tailored for data manipulation. Although these packages rely on different techniques and may be built under a different philosophy, they all make data filtering and aggregating much easier.

However, data processing is more than simple filtering and aggregating. Sometimes, it involves simulation and other computationintensive tasks. Compared to high-performance programming languages such as C and C++, R is much slower due to its dynamic design and the current implementation that prioritizes stability, ease, and power in statistical analysis and visualization over performance and language features. However, well-written R code can still be fast enough for most purposes.

In this chapter, I'll demonstrate the following techniques to help you write R code with high performance:

Measuring code performance
Profiling code to find bottleneck
Using built-in functions and...

Understanding code performance issues

From the very beginning, R is designed for statistical computing and data visualization and is widely used by academia and industry. For most data analysis purposes, correctness is more important than performance. In other words, getting a correct result in 1 minute should be better than getting an incorrect one in 20 seconds. A result that is three times faster is not automatically three times more valid than a slow but correct result. Therefore, performance should not be a concern before you are sure about the correctness of your code.

Let's assume that you are 100 percent sure that your code is correct but it runs a bit slowly. Now, is it necessary for you to optimize the code so that it can run faster. Well, it depends. Before making a decision, it is helpful to divide the time of problem solving into three parts: time of development, execution, and future maintenance.

Suppose we have been working on a problem for an hour. Since we didn't take performance...

Profiling code

In the previous section, you learned how to use microbenchmark() to benchmark expressions. This can be useful when we have several alternative solutions to a problem and want to see which has better performance and when we optimize an expression and want to see whether the performance actually gets better than the original code.

However, it is usually the case that, when we feel the code is slow, it is not easy to locate the expression that contributes most to slowing down the entire program. Such an expression is called a "performance bottleneck." To improve code performance, it is best to resolve the bottleneck first.

Fortunately, R provides profiling tools to help us find the bottleneck, that is, the code that runs most slowly, which should be the top focus for improving code performance.

Profiling code with Rprof

R provides a built-in function, Rprof(), for code profiling. When profiling starts, a sampling procedure is running with all subsequent code until the profiling is...

Boosting code performance

In the previous section, we demonstrated how to use profiling tools to identify a performance bottleneck in the code. In this section, you will learn about a number of approaches to boosting code performance.

Using built-in functions

Previously, we demonstrated the performance difference between my_cumsum1(), my_cumsum2() and the built-in function cumsum(). Although my_cumsum2() is faster than my_cumsum1(), when the input vector contains many numbers, cumsum() is much faster than them. Also, its performance does not decay significantly even as the input gets longer. If we evaluate cumsum, we can see that it is a primitive function:

cumsum 
## function (x)  .Primitive("cumsum")

A primitive function in R is implemented in C/C++/Fortran, compiled to native instructions, and thus, is extremely efficient. Another example is diff(). Here, we will implement computing vector difference sequence in R:

diff_for <- function(x) { 
  n <- length(x) - 1 ...

Summary

In this chapter, you learned when performance may or may not matter, how to measure the performance of R code, how to use profiling tools to identify the slowest part of code, and why such code can be slow. Then, we introduced the most important ways to boost the code performance: using built-in functions if possible, taking advantage of vectorization, using the byte-code compiler, using parallel computing, writing code in C++ via Rcpp, and using multi-threading techniques in C++. High-performance computing is quite an advanced topic, and there's still a lot more to learn if you want to apply it in practice. This chapter demonstrates that using R does not always mean slow code. Instead, we can achieve high performance if we want.

In the next chapter, we will introduce another useful topic: web scraping. To scrape data from webpages, we need to understand how web pages are structured and how to extract data from their source code. You will learn the basic idea and representation of...

The rest of the chapter is locked

You have been reading a chapter from

Learning R Programming

Published in: Oct 2016Publisher: PacktISBN-13: 9781785889776

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Kun Ren

Kun Ren has used R for nearly 4 years in quantitative trading, along with C++ and C#, and he has worked very intensively (more than 8-10 hours every day) on useful R packages that the community does not offer yet. He contributes to packages developed by other authors and reports issues to make things work better. He is also a frequent speaker at R conferences in China and has given multiple talks. Kun also has a great social media presence. Additionally, he has substantially contributed to various projects, which is evident from his GitHub account: https://github.com/renkun-ken https://cn.linkedin.com/in/kun-ren-76027530 http://renkun.me/ http://renkun.me/formattable/ http://renkun.me/pipeR/ http://renkun.me/rlist/
Read more about Kun Ren

Personalised recommendations for you

Based on your interests and search pattern

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

BookSep 2023288 pages

Expert C++

Discover advanced programming techniques, the latest features of C++17 and C++20, and best practices for memory management, debugging, testing, and large-scale application design with Expert C++. Ideal for experienced developers advancing to proficient programmers and building professional-grade C++ applications.

BookAug 2023604 pages

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

BookOct 2023604 pages4

Developer Career Masterplan

Written by industry experts that have spent the last 20+ years helping developers grow their career path towards senior developer positions and beyond. This book provides a comprehensive guide, sharing examples and stories from their global careers. By the end, you’ll have the knowledge to create a clear career progression plan as a technical professional.

BookSep 2023310 pages

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

BookNov 2023434 pages

Python Real-World Projects

Amplify your developer journey by curating a dynamic project portfolio that outshines traditional resumes. Delve into the Python realm through immersive projects, mastering core concepts while constructing comprehensive modules and applications. From data acquisition prowess to impactful data visualization, Python Real-World Projects arms you with essential skills to beat the competition.

BookSep 2023478 pages5

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

BookNov 2023386 pages

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Quantum Computing Algorithms

The book emphasizes intuitive ideas behind quantum algorithms in ways that other books don’t cover, striking a careful balance between no math and too much math. To get the most from this book, you should be comfortable with basic algebra and writing simple computer code. No prior understanding of quantum physics is needed to get started.

BookSep 2023342 pages

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5