You're reading from Data Wrangling with R

Product typeBook

Published inFeb 2023

PublisherPackt

ISBN-139781803235400

Edition1st Edition

Concepts

Data Mining

Author (1)

Gustavo R Santos

Working with Numbers

Variables are quantitative when they quantify a measurement of something. Numbers are the representation of those measurements, which will most likely vary for each observation and start to create variation patterns that can tell us you a lot about a subject.

In this chapter, we will work with numbers, learning how to handle them in vectors, matrices, or data frames, since there are differences in terms of dimensions and functions available for each data type.

Once we have that covered, it is then time to see how to do math operations in RStudio, not only using basic functions but also creating custom functions, which we will apply to numbers, making our set of tools more powerful so we can deal with many kinds of problems.

When working with numbers, it is hard not to talk about descriptive statistics, such an important step of data exploration. Statistics such as average, median, percentiles, standard deviation, and correlation are all about identifying...

Technical requirements

All the code can be found in the book’s GitHub repository: https://github.com/PacktPublishing/Data-Wrangling-with-R/tree/main/Part2/Chapter5.

Numbers in vectors, matrices, and data frames

A number represents a point in space. You may also have heard of a number being referred to as a scalar when it is followed by a unit of measure. In other words, it is a variable with a single number. When we have more than one number, it is possible to create a line in space, which is referred to as a vector. A collection of vectors put together gives new dimensions to data, which becomes matrices or data frames. These last two are similar structures, but data frames have some more enhanced features, such as headers and indexes, that help us to work with the information held by them.

We can quickly go over scalar, vector, matrix, and data frame creation in R, which is a simple process. You can understand what is being done by reading the comments:

# Creating a scalar
scalar <- 42
print(scalar)
[1] 42
# Creating a vector
vec <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
print(vec)
[1] 1 2 3 4 5 6 7 8 9
# Creating a Matrix
mtrx <- matrix...

Math operations with variables

As part of a data wrangling process, there will be tasks involving mathematical operations with variables, where there will be a need to add, multiply, or even calculate the log of numbers, for example. Ergo, working with a data frame or a Tibble object is recommended, due to the facilities to perform those operations with variables.

The most common math operators in R are as follows:

Figure 5.5 – A table with the R language’s math operators

If we still use the data frame with names and grades, just created for the last exercise, let’s imagine that the professor offered one extra point for those who wrote a paper. Let’s suppose everyone delivered it; here is how we can add a new column with the extra point:

# Extra point
# Scenario: everyone delivered
df$new_grade = df$grade + 1

Figure 5.6 – One point added to all the students

If the professor wants to normalize...

Descriptive statistics

Data is everywhere. So, when a dataset is created, it can be understood as a subset of a larger amount of data. Imagine a sales report of the last quarter, or a dataset with ages and heights of elementary students in a county, or even responses to an election poll. All of them are subsets of a larger universe of data. Let’s think about that for a minute – the sales report does not show all the history of sales, the ages and heights are not for all students across the country, and the election poll does not contain responses from every citizen eligible to vote. Hence, these are examples of samples, which were collected from the whole, which is called the population.

The population holds the true values of mean, median, maximum, and minimum, and when we refer to these metrics in relation to the population, they are called parameters. If it was possible to have all the data and there was enough computational power to process it, we could just use...

Summary

In this chapter, numbers were on display. The R language is great for dealing with numbers, since the software was created as a statistical tool. As we know, statistics is all about numbers, so we were able to see that many of the functions used during this chapter are from the Base R, eliminating the need to install or load any library to work with so many useful functions.

We started the chapter by learning about structures with numbers, such as vectors, matrices, and data frames. That knowledge prepared us for the next section, where we studied many operations to deal with numbers in vectors and data frames, and we learned a good resource for that is the apply family of functions.

We also went over how descriptive statistics are important to help us gain an understanding of data and its distribution, because that can drive our efforts of data wrangling before modeling.

Finally, we saw the correlation test and how to interpret its result.

Exercises

What are a vector, a matrix, and a data frame?
What is the difference between matrices and data frames?
What is data slicing and why is it important for data wrangling?
List the four functions of the apply family.
List three descriptive statistics functions.
How can you display a statistics summary in R?
What does it mean when a correlation is close to 1 and when it is close to 0?

Collection of functions apply: https://tinyurl.com/4h8km3c9
How to Fix in R: dim(X) must have a positive length.
If you try to use the apply() function in a single column, you might see this error. In this link you can read more about how to fix it.
https://tinyurl.com/yc2ffh45
Dealing with numbers in R: https://tinyurl.com/2p98p7ns
Code for this chapter in GitHub: https://tinyurl.com/2nm6ev48

The rest of the chapter is locked

You have been reading a chapter from

Data Wrangling with R

Published in: Feb 2023Publisher: PacktISBN-13: 9781803235400

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Gustavo R Santos

Gustavo R Santos has worked in the Technology Industry for 13 years, improving processes, and analyzing datasets and creating dashboards. Since 2020, he has been working as a Data Scientist in the retail industry, wrangling, analyzing, visualizing and modeling data with the most modern tools like R, Python and Databricks. Gustavo also gives lectures from time to time at an online school about Data Science concepts. He has a background in Marketing, is certified as Data Scientist by the Data Science Academy Brazil and pursues his specialist MBA in Data Science at the University of São Paulo
Read more about Gustavo R Santos

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Data Wrangling with R

Working with Numbers

Technical requirements

Numbers in vectors, matrices, and data frames

Math operations with variables

Descriptive statistics

Summary

Exercises

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook