You're reading from The Statistics and Machine Learning with R Workshop

Product typeBook

Published inOct 2023

Reading LevelIntermediate

PublisherPackt

ISBN-139781803240305

Edition1st Edition

Languages

Concepts

Machine Learning

Author (1)

Liu Peng

Intermediate Linear Algebra in R

The previous chapter covered the basics of linear algebra and its calculations in R. This chapter will go a step further by extending to intermediate linear algebra and cover topics such as the determinant, rank, and trace of a matrix, eigenvalues and eigenvectors, and principal component analysis (PCA). Besides providing an intuitive understanding of these abstract yet important mathematical concepts, we’ll cover the practical implementations of calculating these quantities in R.

By the end of this chapter, you will have grasped important matrix properties, such as determinant and rank, and gained hands-on experience in calculating these quantities.

In this chapter, we will cover the following topics:

Introducing the matrix determinant
Introducing the matrix trace
Understanding the matrix norm
Getting to know eigenvalues and eigenvectors
Introducing principal component analysis

Technical requirements

To run the code in this chapter, you will need to have the following:

The latest version of the Matrix package, which is 1.5.1 at the time of writing
The latest version of the factoextra package, which is 1.0.7 at the time of writing

All the code and data for this chapter is available at https://github.com/PacktPublishing/The-Statistics-and-Machine-Learning-with-R-Workshop/blob/main/Chapter_8/working.R.

Introducing the matrix determinant

The determinant of a matrix is a special scalar value that can be calculated from a matrix. Here, the matrix needs to be square, meaning it has an equal number of rows and columns. For a 2x2 square matrix, the determinant is simply calculated as the difference between the product of the diagonal elements and the off-diagonal elements.

Mathematically, suppose our 2x2 matrix is A = [a b c d ]. Its determinant, |A|, is thus calculated as follows:

det(A) = |A| = ad − bc

Please do not confuse these vertical lines with the absolute operation sign. They represent the determinant in the context of a matrix, and the determinant of a matrix can be negative as well.

Let’s say our 2x2 matrix is A = [2 6 1 8]. We can find its determinant like so:

|A| = 2 * 8 − 6 * 1 = 10

Calculating the determinant of a matrix is the easy part, but understanding its use is of equal importance. Before we...

Introducing the matrix trace

The trace is a quantity that only applies to square matrices, such as the covariance matrix often encountered in ML. It is denoted as tr(A) for a square matrix, A, and is calculated as the sum of the diagonal elements in a square matrix. Let’s take a look:

In the following code snippet, we are creating a 3x3 matrix, A, and using the diag() function to extract the diagonal elements and sum them up to obtain the trace of the matrix. Note that we first create a DataFrame consisting of three columns, each having three elements, and then convert it into a matrix format to store in A:
```
A = as.matrix(data.frame("c1"=c(1,2,3),"c2"=c(2,5,2),"c3"=c(-1,8,3)))
>>> A
     c1 c2 c3
[1,]  1  2 -1
[2,]  2  5  8
[3,]  3  2  3
>>> diag(A)
[1] 1 5 3
>>> sum(diag(A))
[1] 9
```
Since there is no built-in...

Understanding the matrix norm

The norm of a matrix is a scalar value that measures the magnitude of the matrix. Therefore, the norm is a way to measure the size or length of a vector or a matrix. For example, the weights of a deep neural network are stored in matrices, and we would typically constrain the norm of the weights to be small to prevent overfitting. This allows us to quantify the magnitude, which is useful when comparing different vectors or matrices, which often consist of multiple elements. As it generalizes from the vector norm, we will first go through the basics of the vector norm.

Understanding the vector norm

Suppose we have a vector, a = [1,0, − 1], and another vector, b = [1,2,0]. To assess the similarity between these two vectors, we can argue that they are the same in the first element only and different for the remaining two elements. To compare these two vectors holistically, we need a single metric – one that summarizes the whole vector...

Getting to know eigenvalues and eigenvectors

The eigenvalue, often denoted by a scalar value of λ, and the eigenvector, often denoted by v, are essential properties of a square matrix, A. Two central ideas are required to understand the purpose of eigenvalues and eigenvectors. The first is that the matrix, A, is a transformation that maps one input vector to another output vector, which possibly changes the direction. The second is that the eigenvector is a special vector that does not change direction after going through the transformation induced by A. Instead, the eigenvector gets scaled along the same original direction by a multiple of the corresponding scalar eigenvalue. The following equation sums this up:

Av = λv

These two points capture the essence of eigendecomposition, which represents the original matrix, A, in terms of its eigenvalues and eigenvectors and thus allows easier matrix operations in many cases. Let’s start by understanding a simple...

Introducing principal component analysis

When building an ML model, the dataset that’s used to train the model may have redundant information in the predictors. The redundancy in the predictors/columns of the dataset arises from correlated features in the dataset and needs to be taken care of when using a certain class of models. In such cases, PCA is a popular technique to address such challenges as it reduces the feature dimension of the dataset and thus shrinks the redundancy. The problem of collinearity, which says that two or more predictors are linearly correlated in a model, could thus be relieved via dimension reduction using PCA.

Collinearity among the predictors is often considered a big problem when building an ML model. Using the Pearson correlation coefficient, it is a number between -1 and 1, where a coefficient near 0 indicates two variables are linearly independent, and a coefficient near -1 or 1 indicates that two variables are linearly related.

When two...

Summary

In this chapter, we covered intermediate linear algebra and its implementations in R. We started by introducing the matrix determinant, a widely used property in numerical analysis. We highlighted the intuition behind the matrix determinant and its connection to matrix rank.

We also covered additional properties, including matrix trace and norm. In particular, we introduced three popular norms: L 1-norm, L 2-norm, and L ∞-norm. We detailed their mathematical constructs and calculation process.

Next, we covered eigendecomposition, which leads to a set of eigenvalues and eigenvectors of a square matrix. We provided a step-by-step derivation and analysis of the core equation, as well as the approach to compute them.

Finally, we covered PCA, a popular technique that’s used for dimension reduction. Specifically, we highlighted its role in removing collinearity in the dataset and provided a few ways to compute and visualize PCA results.

In...

The rest of the chapter is locked

You have been reading a chapter from

The Statistics and Machine Learning with R Workshop

Published in: Oct 2023Publisher: PacktISBN-13: 9781803240305

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Liu Peng

Peng Liu is an Assistant Professor of Quantitative Finance (Practice) at Singapore Management University and an adjunct researcher at the National University of Singapore. He holds a Ph.D. in statistics from the National University of Singapore and has ten years of working experience as a data scientist across the banking, technology, and hospitality industries.
Read more about Liu Peng

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages