Home Data R Machine Learning By Example

R Machine Learning By Example

By Dipanjan Sarkar , Raghav Bali
books-svg-icon Book
Subscription FREE
eBook + Subscription €14.99
eBook €32.99
Print + eBook €41.99
READ FOR FREE Free Trial for 7 days. €14.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW BUY NOW
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
READ FOR FREE Free Trial for 7 days. €14.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW BUY NOW
Subscription FREE
eBook + Subscription €14.99
eBook €32.99
Print + eBook €41.99
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
  1. Free Chapter
    Getting Started with R and Machine Learning
About this book
Data science and machine learning are some of the top buzzwords in the technical world today. From retail stores to Fortune 500 companies, everyone is working hard to making machine learning give them data-driven insights to grow their business. With powerful data manipulation features, machine learning packages, and an active developer community, R empowers users to build sophisticated machine learning systems to solve real-world data problems. This book takes you on a data-driven journey that starts with the very basics of R and machine learning and gradually builds upon the concepts to work on projects that tackle real-world problems. You’ll begin by getting an understanding of the core concepts and definitions required to appreciate machine learning algorithms and concepts. Building upon the basics, you will then work on three different projects to apply the concepts of machine learning, following current trends and cover major algorithms as well as popular R packages in detail. These projects have been neatly divided into six different chapters covering the worlds of e-commerce, finance, and social-media, which are at the very core of this data-driven revolution. Each of the projects will help you to understand, explore, visualize, and derive insights depending upon the domain and algorithms. Through this book, you will learn to apply the concepts of machine learning to deal with data-related problems and solve them using the powerful yet simple language, R.
Publication date:
March 2016
Publisher
Packt
Pages
340
ISBN
9781784390846

 

Chapter 1. Getting Started with R and Machine Learning

This introductory chapter will get you started with the basics of R which include various constructs, useful data structures, loops and vectorization. If you are already an R wizard, you can skim through these sections and dive right into the next part which talks about what machine learning actually represents as a domain and the main areas it encompasses. We will also talk about different machine learning techniques and algorithms used in each area. Finally, we will conclude by looking at some of the most popular machine learning packages in R, some of which we will be using in the subsequent chapters.

If you are a data or machine learning enthusiast, surely you would have heard by now that being a data scientist is referred to as the sexiest job of the 21st century by Harvard Business Review.

There is a huge demand in the current market for data scientists, primarily because their main job is to gather crucial insights and information from both unstructured and structured data to help their business and organization grow strategically.

Some of you might be wondering how machine learning or R relate to all this! Well, to be a successful data scientist, one of the major tools you need in your toolbox is a powerful language capable of performing complex statistical calculations and working with various types of data and building models which help you get previously unknown insights and R is the perfect language for that! Machine learning forms the foundation of the skills you need to build to become a data analyst or data scientist, this includes using various techniques to build models to get insights from data.

This book will provide you with some of the essential tools you need to be well versed with both R and machine learning by not only looking at concepts but also applying those concepts in real-world examples. Enough talk; now let's get started on our journey into the world of machine learning with R!

In this chapter, we will cover the following aspects:

  • Delving into the basics of R

  • Understanding the data structures in R

  • Working with functions

  • Controlling code flow

  • Taking further steps with R

  • Understanding machine learning basics

  • Familiarizing yourself with popular machine learning packages in R

 

Delving into the basics of R


It is assumed here that you are at least familiar with the basics of R or have worked with R before. Hence, we won't be talking much about downloading and installations. There are plenty of resources on the web which provide a lot of information on this. I recommend that you use RStudio which is an Integrated Development Environment (IDE), which is much better than the base R Graphical User Interface (GUI). You can visit https://www.rstudio.com/ to get more information about it.

Note

For details about the R project, you can visit https://www.r-project.org/ to get an overview of the language. Besides this, R has a vast arsenal of wonderful packages at its disposal and you can view everything related to R and its packages at https://cran.r-project.org/ which contains all the archives.

You must already be familiar with the R interactive interpreter, often called a Read-Evaluate-Print Loop (REPL). This interpreter acts like any command line interface which asks for input and starts with a > character, which indicates that R is waiting for your input. If your input spans multiple lines, like when you are writing a function, you will see a + prompt in each subsequent line, which means that you didn't finish typing the complete expression and R is asking you to provide the rest of the expression.

It is also possible for R to read and execute complete files containing commands and functions which are saved in files with an .R extension. Usually, any big application consists of several .R files. Each file has its own role in the application and is often called as a module. We will be exploring some of the main features and capabilities of R in the following sections.

Using R as a scientific calculator

The most basic constructs in R include variables and arithmetic operators which can be used to perform simple mathematical operations like a calculator or even complex statistical calculations.

> 5 + 6
[1] 11
> 3 * 2
[1] 6
> 1 / 0
[1] Inf

Remember that everything in R is a vector. Even the output results indicated in the previous code snippet. They have a leading [1] symbol indicating it is a vector of size 1.

You can also assign values to variables and operate on them just like any other programming language.

> num <- 6
> num ^ 2
[1] 36
> num
[1] 6     # a variable changes value only on re-assignment
> num <- num ^ 2 * 5 + 10 / 3
> num
[1] 183.3333

Operating on vectors

The most basic data structure in R is a vector. Basically, anything in R is a vector, even if it is a single number just like we saw in the earlier example! A vector is basically a sequence or a set of values. We can create vectors using the : operator or the c function which concatenates the values to create a vector.

> x <- 1:5
> x
[1] 1 2 3 4 5
> y <- c(6, 7, 8 ,9, 10)
> y
[1]  6  7  8  9 10
> z <- x + y
> z
[1]  7  9 11 13 15

You can clearly in the previous code snippet, that we just added two vectors together without using any loop, using just the + operator. This is known as vectorization and we will be discussing more about this later on. Some more operations on vectors are shown next:

> c(1,3,5,7,9) * 2
[1]  2  6 10 14 18
> c(1,3,5,7,9) * c(2, 4)
[1]  2 12 10 28 18 # here the second vector gets recycled

Output:

> factorial(1:5)
[1]   1   2   6  24 120
> exp(2:10)   # exponential function
[1]     7.389056    20.085537    54.598150   148.413159   403.428793  1096.633158
[7]  2980.957987  8103.083928 22026.465795
> cos(c(0, pi/4))   # cosine function
[1] 1.0000000 0.7071068
> sqrt(c(1, 4, 9, 16))
[1] 1 2 3 4
> sum(1:10)
[1] 55

You might be confused with the second operation where we tried to multiply a smaller vector with a bigger vector but we still got a result! If you look closely, R threw a warning also. What happened in this case is, since the two vectors were not equal in size, the smaller vector in this case c(2, 4) got recycled or repeated to become c(2, 4, 2, 4, 2) and then it got multiplied with the first vector c(1, 3, 5, 7 ,9) to give the final result vector, c(2, 12, 10, 28, 18). The other functions mentioned here are standard functions available in base R along with several other functions.

Tip

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

  • Log in or register to our website using your e-mail address and password.

  • Hover the mouse pointer on the SUPPORT tab at the top

  • Click on Code Downloads & Errata

  • Enter the name of the book in the Search box

  • Select the book for which you're looking to download the code files

  • Choose from the drop-down menu where you purchased this book from

  • Click on Code Download

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR / 7-Zip for Windows

  • Zipeg / iZip / UnRarX for Mac

  • 7-Zip / PeaZip for Linux

Special values

Since you will be dealing with a lot of messy and dirty data in data analysis and machine learning, it is important to remember some of the special values in R so that you don't get too surprised later on if one of them pops up.

> 1 / 0
[1] Inf
> 0 / 0
[1] NaN
> Inf / NaN
[1] NaN
> Inf / Inf
[1] NaN
> log(Inf)
[1] Inf
> Inf + NA
[1] NA

The main values which should concern you here are Inf which stands for Infinity, NaN which is Not a Number, and NA which indicates a value that is missing or Not Available. The following code snippet shows some logical tests on these special values and their results. Do remember that TRUE and FALSE are logical data type values, similar to other programming languages.

> vec <- c(0, Inf, NaN, NA)
> is.finite(vec)
[1]  TRUE FALSE FALSE FALSE
> is.nan(vec)
[1] FALSE FALSE  TRUE FALSE
> is.na(vec)
[1] FALSE FALSE  TRUE  TRUE
> is.infinite(vec)
[1] FALSE  TRUE FALSE FALSE

The functions are pretty self-explanatory from their names. They clearly indicate which values are finite, which are finite and checks for NaN and NA values respectively. Some of these functions are very useful when cleaning dirty data.

 

Data structures in R


Here we will be looking at the most useful data structures which exist in R and use using them on some fictional examples to get a better grasp on their syntax and constructs. The main data structures which we will be covering here include:

  • Vectors

  • Arrays and matrices

  • Lists

  • Data frames

These data structures are used widely inside R as well as by various R packages and functions, including machine learning functions and algorithms which we will be using in the subsequent chapters. So it is essential to know how to use these data structures to work with data efficiently.

Vectors

Just like we mentioned briefly in the previous sections, vectors are the most basic data type inside R. We use vectors to represent anything, be it input or output. We previously saw how we create vectors and apply mathematical operations on them. We will see some more examples here.

Creating vectors

Here we will look at ways to initialize vectors, some of which we had also worked on previously, using operators such as : and functions such as c. In the following code snippet, we will use the seq family of functions to initialize vectors in different ways.

> c(2.5:4.5, 6, 7, c(8, 9, 10), c(12:15))
 [1]  2.5  3.5  4.5  6.0  7.0  8.0  9.0 10.0 12.0 13.0 14.0 15.0
> vector("numeric", 5)
[1] 0 0 0 0 0
> vector("logical", 5)
[1] FALSE FALSE FALSE FALSE FALSE
> logical(5)
[1] FALSE FALSE FALSE FALSE FALSE
> # seq is a function which creates sequences
> seq.int(1,10)
 [1]  1  2  3  4  5  6  7  8  9 10
> seq.int(1,10,2)
[1] 1 3 5 7 9
> seq_len(10)
 [1]  1  2  3  4  5  6  7  8  9 10

Indexing and naming vectors

One of the most important operations we can do on vectors involves subsetting and indexing vectors to access specific elements which are often useful when we want to run some code only on specific data points. The following examples show some ways in which we can index and subset vectors:

> vec <- c("R", "Python", "Julia", "Haskell", "Java", "Scala")
> vec[1]
[1] "R"
> vec[2:4]
[1] "Python"  "Julia"   "Haskell"
> vec[c(1, 3, 5)]
[1] "R"     "Julia" "Java" 
> nums <- c(5, 8, 10, NA, 3, 11)
> nums
[1]  5  8 10 NA  3 11
> which.min(nums)   # index of the minimum element
[1] 5
> which.max(nums)   # index of the maximum element
[1] 6
> nums[which.min(nums)]  # the actual minimum element
[1] 3
> nums[which.max(nums)]  # the actual maximum element
[1] 11

Now we look at how we can name vectors. This is basically a nifty feature in R where you can label each element in a vector to make it more readable or easy to interpret. There are two ways this can be done, which are shown in the following examples:

> c(first=1, second=2, third=3, fourth=4, fifth=5)

Output:

> positions <- c(1, 2, 3, 4, 5)
> names(positions) 
NULL
> names(positions) <- c("first", "second", "third", "fourth", "fifth")
> positions

Output:

> names(positions)
[1] "first"  "second" "third"  "fourth" "fifth"
> positions[c("second", "fourth")]

Output:

Thus, you can see, it becomes really useful to annotate and name vectors sometimes, and we can also subset and slice vectors using element names rather than values.

Arrays and matrices

Vectors are one dimensional data structures, which means that they just have one dimension and we can get the number of elements they have using the length property. Do remember that arrays may also have a similar meaning in other programming languages, but in R they have a slightly different meaning. Basically, arrays in R are data structures which hold the data having multiple dimensions. Matrices are just a special case of generic arrays having two dimensions, namely represented by properties rows and columns. Let us look at some examples in the following code snippets in the accompanying subsection.

Creating arrays and matrices

First we will create an array that has three dimensions. Now it is easy to represent two dimensions in your screen, but to go one dimension higher, there are special ways in which R transforms the data. The following example shows how R fills the data (column first) in each dimension and shows the final output for a 4x3x3 array:

> three.dim.array <- array(
+     1:32,    # input data
+     dim = c(4, 3, 3),   # dimensions
+     dimnames = list(    # names of dimensions
+         c("row1", "row2", "row3", "row4"),
+         c("col1", "col2", "col3"),
+         c("first.set", "second.set", "third.set")
+     )
+ )
> three.dim.array

Output:

Like I mentioned earlier, a matrix is just a special case of an array. We can create a matrix using the matrix function, shown in detail in the following example. Do note that we use the parameter byrow to fill the data row-wise in the matrix instead of R's default column-wise fill in any array or matrix. The ncol and nrow parameters stand for number of columns and rows respectively.

> mat <- matrix(
+     1:24,   # data
+     nrow = 6,  # num of rows
+     ncol = 4,  # num of columns
+     byrow = TRUE,  # fill the elements row-wise
+ )
> mat

Output:

Names and dimensions

Just like we named vectors and accessed element names, will perform similar operations in the following code snippets. You have already seen the use of the dimnames parameter in the preceding examples. Let us look at some more examples as follows:

> dimnames(three.dim.array)

Output:

> rownames(three.dim.array)
[1] "row1" "row2" "row3" "row4"
> colnames(three.dim.array)
[1] "col1" "col2" "col3"
> dimnames(mat)
NULL
> rownames(mat)
NULL
> rownames(mat) <- c("r1", "r2", "r3", "r4", "r5", "r6")
> colnames(mat) <- c("c1", "c2", "c3", "c4")
> dimnames(mat)

Output:

> mat

Output:

To access details of dimensions related to arrays and matrices, there are special functions. The following examples show the same:

> dim(three.dim.array)
[1] 4 3 3
> nrow(three.dim.array)
[1] 4
> ncol(three.dim.array)
[1] 3
> length(three.dim.array)  # product of dimensions
[1] 36
> dim(mat)
[1] 6 4
> nrow(mat)
[1] 6
> ncol(mat)
[1] 4
> length(mat)
[1] 24

Matrix operations

A lot of machine learning and optimization algorithms deal with matrices as their input data. In the following section, we will look at some examples of the most common operations on matrices.

We start by initializing two matrices and then look at ways of combining the two matrices using functions such as c which returns a vector, rbind which combines the matrices by rows, and cbind which does the same by columns.

> mat1 <- matrix(
+     1:15,
+     nrow = 5,
+     ncol = 3,
+     byrow = TRUE,
+     dimnames = list(
+         c("M1.r1", "M1.r2", "M1.r3", "M1.r4", "M1.r5")
+         ,c("M1.c1", "M1.c2", "M1.c3")
+     )
+ )
> mat1

Output:

> mat2 <- matrix(
+     16:30,
+     nrow = 5,
+     ncol = 3,
+     byrow = TRUE,
+     dimnames = list(
+         c("M2.r1", "M2.r2", "M2.r3", "M2.r4", "M2.r5"),
+         c("M2.c1", "M2.c2", "M2.c3")
+     )
+ )
> mat2

Output:

> rbind(mat1, mat2)

Output:

> cbind(mat1, mat2)

Output:

> c(mat1, mat2)

Output:

Now we look at some of the important arithmetic operations which can be performed on matrices. Most of them are quite self-explanatory from the following syntax:

> mat1 + mat2   # matrix addition

Output:

> mat1 * mat2  # element-wise multiplication

Output:

> tmat2 <- t(mat2)  # transpose
> tmat2

Output:

> mat1 %*% tmat2   # matrix inner product

Output:

> m <- matrix(c(5, -3, 2, 4, 12, -1, 9, 14, 7), nrow = 3, ncol = 3)
> m

Output:

> inv.m <- solve(m)  # matrix inverse
> inv.m

Output:

> round(m %*% inv.m) # matrix * matrix_inverse = identity matrix

Output:

The preceding arithmetic operations are just some of the most popular ones amongst the vast number of functions and operators which can be applied to matrices. This becomes useful, especially in areas such as linear optimization.

Lists

Lists are a special case of vectors where each element in the vector can be of a different type of data structure or even simple data types. It is similar to the lists in the Python programming language in some aspects, if you have used it before, where the lists indicate elements which can be of different types and each have a specific index in the list. In R, each element of a list can be as simple as a single element or as complex as a whole matrix, a function, or even a vector of strings.

Creating and indexing lists

We will get started with looking at some common methods to create and initialize lists in the following examples. Besides that, we will also look at how we can access some of these list elements for further computations. Do remember that each element in a list can be a simple primitive data type or even complex data structures or functions.

> list.sample <- list(
+     1:5,+     c("first", "second", "third"),
+     c(TRUE, FALSE, TRUE, TRUE),
+     cos,
+     matrix(1:9, nrow = 3, ncol = 3)
+ )
> list.sample

Output:

> list.with.names <- list(
+     even.nums = seq.int(2,10,2),
+     odd.nums  = seq.int(1,10,2),
+     languages = c("R", "Python", "Julia", "Java"),
+     cosine.func = cos
+ )
> list.with.names

Output:

> list.with.names$cosine.func
function (x)  .Primitive("cos")
> list.with.names$cosine.func(pi)
[1] -1
>
> list.sample[[4]]
function (x)  .Primitive("cos")
> list.sample[[4]](pi)
[1] -1
>
> list.with.names$odd.nums
[1] 1 3 5 7 9
> list.sample[[1]]
[1] 1 2 3 4 5
> list.sample[[3]]
[1]  TRUE FALSE  TRUE  TRUE

You can see from the preceding examples how easy it is to access any element of the list and use it for further computations, such as the cos function.

Combining and converting lists

Now we will take a look at how to combine several lists together into one single list in the following examples:

> l1 <- list(
+     nums = 1:5,
+     chars = c("a", "b", "c", "d", "e"),
+     cosine = cos
+ )
> l2 <- list(
+     languages = c("R", "Python", "Java"),
+     months = c("Jan", "Feb", "Mar", "Apr"),
+     sine = sin
+ )
> # combining the lists now
> l3 <- c(l1, l2)
> l3

Output:

It is very easy to convert lists in to vectors and vice versa. The following examples show some common ways we can achieve this:

> l1 <- 1:5
> class(l1)
[1] "integer"
> list.l1 <- as.list(l1)
> class(list.l1)
[1] "list"
> list.l1

Output:

> unlist(list.l1)
[1] 1 2 3 4 5

Data frames

Data frames are special data structures which are typically used for storing data tables or data in the form of spreadsheets, where each column indicates a specific attribute or field and the rows consist of specific values for those columns. This data structure is extremely useful in working with datasets which usually have a lot of fields and attributes.

Creating data frames

We can create data frames easily using the data.frame function. We will look at some following examples to illustrate the same with some popular superheroes:

> df <- data.frame(
+     real.name = c("Bruce Wayne", "Clark Kent", "Slade Wilson", "Tony Stark", "Steve Rogers"),
+     superhero.name = c("Batman", "Superman", "Deathstroke", "Iron Man", "Capt. America"),
+     franchise = c("DC", "DC", "DC", "Marvel", "Marvel"),
+     team = c("JLA", "JLA", "Suicide Squad", "Avengers", "Avengers"),
+     origin.year = c(1939, 1938, 1980, 1963, 1941)
+ )
> df

Output:

> class(df)
[1] "data.frame"
> str(df)

Output:

> rownames(df)
[1] "1" "2" "3" "4" "5"
> colnames(df)

Output:

> dim(df)
[1] 5 5

The str function talks in detail about the structure of the data frame where we see details about the data present in each column of the data frame. There are a lot of datasets readily available in R base which you can directly load and start using. One of them is shown next. The mtcars dataset has information about various automobiles, which was extracted from the Motor Trend U.S. Magazine of 1974.

> head(mtcars)   # one of the datasets readily available in R

Output:

Operating on data frames

There are a lot of operations we can do on data frames, such as merging, combining, slicing, and transposing data frames. We will look at some of the important data frame operations in the following examples.

It is really easy to index and subset specific data inside data frames using simplex indexes and functions such as subset.

> df[2:4,]

Output:

> df[2:4, 1:2]

Output:

> subset(df, team=="JLA", c(real.name, superhero.name, franchise))

Output:

> subset(df, team %in% c("Avengers","Suicide Squad"), c(real.name, superhero.name, franchise))

Output:

We will now look at some more complex operations, such as combining and merging data frames.

> df1 <- data.frame(
+     id = c('emp001', 'emp003', 'emp007'),
+     name = c('Harvey Dent', 'Dick Grayson', 'James Bond'),
+     alias = c('TwoFace', 'Nightwing', 'Agent 007')
+ )
> 
> df2 <- data.frame(
+     id = c('emp001', 'emp003', 'emp007'),
+     location = c('Gotham City', 'Gotham City', 'London'),
+     speciality = c('Split Persona', 'Expert Acrobat', 'Gadget Master')
+ )
> df1

Output:

> df2

Output:

> rbind(df1, df2)   # not possible since column names don't match
Error in match.names(clabs, names(xi)) : 
  names do not match previous names
> cbind(df1, df2)

Output:

> merge(df1, df2, by="id")

Output:

From the preceding operations it is evident that rbind and cbind work in the same way as we saw previously with arrays and matrices. However, merge lets you merge the data frames in the same way as you join various tables in relational databases.

           
About the Authors
  • Dipanjan Sarkar

    Dipanjan (DJ) Sarkar is a Data Scientist at Intel, leveraging data science, machine learning, and deep learning to build large-scale intelligent systems. He holds a master of technology degree with specializations in Data Science and Software Engineering. He has been an analytics practitioner for several years now, specializing in machine learning, NLP, statistical methods, and deep learning. He is passionate about education and also acts as a Data Science Mentor at various organizations like Springboard, helping people learn data science. He is also a key contributor and editor for Towards Data Science, a leading online journal on AI and Data Science. He has also authored several books on R, Python, machine learning, NLP, and deep learning.

    Browse publications by this author
  • Raghav Bali

    Raghav Bali has a master's degree (gold medalist) in information technology from International Institute of Information Technology, Bangalore. He is a data scientist at Intel, the world's largest silicon company, where he works on analytics, business intelligence, and application development to develop scalable machine learning-based solutions. He has worked as an analyst and developer in domains such as ERP, finance, and BI with some of the top companies of the world.

    Browse publications by this author
Latest Reviews (16 reviews total)
cannot use packt mobile app to see this. mobile app displays only subscriptions, this is a one time purchase. There is no menu to navigate to one time purchases. mobile app keep displaying ads to subscribe. when you cllick link in email to see video/learn, it redirects to packt website and keeps asking if account info is correct, there is no way to fiind the courses you purched online too.
Great transaction. Book immediately downloadable. Fast shipment of physical copy.
Muy buen contenido, buen equilibrio entre aspectos conceptuales de los algoritmos estudiados y su aplicación práctica
R Machine Learning By Example
Unlock this book and the full library FREE for 7 days
Start now