Reader small image

You're reading from  The Statistics and Machine Learning with R Workshop

Product typeBook
Published inOct 2023
Reading LevelIntermediate
PublisherPackt
ISBN-139781803240305
Edition1st Edition
Languages
Right arrow
Author (1)
Liu Peng
Liu Peng
author image
Liu Peng

Peng Liu is an Assistant Professor of Quantitative Finance (Practice) at Singapore Management University and an adjunct researcher at the National University of Singapore. He holds a Ph.D. in statistics from the National University of Singapore and has ten years of working experience as a data scientist across the banking, technology, and hospitality industries.
Read more about Liu Peng

Right arrow

Summary

In this chapter, we touched upon several intermediate data processing techniques, ranging from structured tabular data to unstructured textual data. First, we covered how to transform categorical and numeric variables, including recoding categorical variables using recode(), creating new variables using case_when(), and binning numeric variables using cut(). Next, we looked at reshaping a DataFrame, including converting a long-format DataFrame into a wide format using spread() and back again using gather(). We also delved into working with strings, including how to create, convert, and format string data.

In addition, we covered some essential knowledge regarding the stringr package, which provides many helpful utility functions to ease string processing tasks. Common functions include str_c(), str_sub(), str_subset(), str_detect(), str_split(), str_count(), and str_replace(). These functions can be combined to create a powerful and easy-to-understand string processing pipeline...

lock icon
The rest of the page is locked
Previous PageNext Chapter
You have been reading a chapter from
The Statistics and Machine Learning with R Workshop
Published in: Oct 2023Publisher: PacktISBN-13: 9781803240305

Author (1)

author image
Liu Peng

Peng Liu is an Assistant Professor of Quantitative Finance (Practice) at Singapore Management University and an adjunct researcher at the National University of Singapore. He holds a Ph.D. in statistics from the National University of Singapore and has ten years of working experience as a data scientist across the banking, technology, and hospitality industries.
Read more about Liu Peng