Introduction to Machine Learning with R

Brett Lantz

February 2016

If science fiction stories are to be believed, the invention of artificial intelligence inevitably leads to apocalyptic wars between machines and their makers. In the early stages, computers are taught to play simple games of tic-tac-toe and chess. Later, machines are given control of traffic lights and communications, followed by military drones and missiles. The machine's evolution takes an ominous turn once the computers become sentient and learn how to teach themselves. Having no more need for human programmers, humankind is then deleted.

(For more resources related to this topic, see here.)

Thankfully, at the time of writing this, machines still require user input.

Though your impressions of machine learning may be colored by these mass-media depictions, today's algorithms are too application-specific to pose any danger of becoming self-aware. The goal of today's machine learning is not to create an artificial brain, but rather to assist us in making sense of the world's massive data stores.

Putting popular misconceptions aside, in this article we will learn the following topics:

  • Installing R packages
  • Loading and unloading R packages

Machine learning with R

Many of the algorithms needed for machine learning with R are not included as part of the base installation. Instead, the algorithms needed for machine learning are available via a large community of experts who have shared their work freely. These must be installed on top of base R manually. Thanks to R's status as free open source software, there is no additional charge for this functionality.

A collection of R functions that can be shared among users is called a package. Free packages exist for each of the machine learning algorithms covered in this book. In fact, this book only covers a small portion of all of R's machine learning packages.

If you are interested in the breadth of R packages, you can view a list at Comprehensive R Archive Network (CRAN), a collection of web and FTP sites located around the world to provide the most up-to-date versions of R software and packages. If you obtained the R software via download, it was most likely from CRAN at http://cran.r-project.org/index.html.

If you do not already have R, the CRAN website also provides installation instructions and information on where to find help if you have trouble.

The Packages link on the left side of the page will take you to a page where you can browse packages in an alphabetical order or sorted by the publication date. At the time of writing this, a total 6,779 packages were available—a jump of over 60% in the time since the first edition was written, and this trend shows no sign of slowing!

The Task Views link on the left side of the CRAN page provides a curated list of packages as per the subject area. The task view for machine learning, which lists the packages covered in this book (and many more), is available at http://cran.r-project.org/web/views/MachineLearning.html.

Installing R packages

Despite the vast set of available R add-ons, the package format makes installation and use a virtually effortless process. To demonstrate the use of packages, we will install and load the RWeka package, which was developed by Kurt Hornik, Christian Buchta, and Achim Zeileis (see Open-Source Machine Learning: R Meets Weka in Computational Statistics 24: 225-232 for more information). The RWeka package provides a collection of functions that give R access to the machine learning algorithms in the Java-based Weka software package by Ian H. Witten and Eibe Frank. More information on Weka is available at http://www.cs.waikato.ac.nz/~ml/weka/

To use the RWeka package, you will need to have Java installed (many computers come with Java preinstalled). Java is a set of programming tools available for free, which allow for the use of cross-platform applications such as Weka. For more information, and to download Java on your system, you can visit http://java.com.

The most direct way to install a package is via the install.packages() function. To install the RWeka package, at the R command prompt, simply type:

> install.packages("RWeka")

R will then connect to CRAN and download the package in the correct format for your OS. Some packages such as RWeka require additional packages to be installed before they can be used (these are called dependencies). By default, the installer will automatically download and install any dependencies.

The first time you install a package, R may ask you to choose a CRAN mirror. If this happens, choose the mirror residing at a location close to you. This will generally provide the fastest download speed.

The default installation options are appropriate for most systems. However, in some cases, you may want to install a package to another location. For example, if you do not have root or administrator privileges on your system, you may need to specify an alternative installation path. This can be accomplished using the lib option, as follows:

> install.packages("RWeka", lib="/path/to/library")

The installation function also provides additional options for installation from a local file, installation from source, or using experimental versions. You can read about these options in the help file, by using the following command:

> ?install.packages

More generally, the question mark operator can be used to obtain help on any R function. Simply type ? before the name of the function.

Loading and unloading R packages

In order to conserve memory, R does not load every installed package by default. Instead, packages are loaded by users as they are needed, using the library() function.

The name of this function leads some people to incorrectly use the terms library and package interchangeably. However, to be precise, a library refers to the location where packages are installed and never to a package itself.

To load the RWeka package we installed previously, you can type the following:

> library(RWeka)

Aside from RWeka, there are several other R packages.

To unload an R package, use the detach() function. For example, to unload the RWeka package shown previously use the following command:

> detach("package:RWeka", unload = TRUE)

This will free up any resources used by the package.

Summary

Machine learning originated at the intersection of statistics, database science, and computer science. It is a powerful tool, capable of finding actionable insight in large quantities of data. Still, caution must be used in order to avoid common abuses of machine learning in the real world.

Conceptually, learning involves the abstraction of data into a structured representation and the generalization of this structure into action that can be evaluated for utility. In practical terms, a machine learner uses data containing examples and features of the concept to be learned and summarizes this data in the form of a model, which is then used for predictive or descriptive purposes. These purposes can be grouped into tasks, including classification, numeric prediction, pattern detection, and clustering. Among the many options, machine learning algorithms are chosen on the basis of the input data and the learning task.

R provides support for machine learning in the form of community-authored packages. These powerful tools are free to download; however, they need to be installed before they can be used.

To learn more about R, you can refer the following books published by Packt Publishing (https://www.packtpub.com/):

Resources for Article:


Further resources on this subject:


You've been reading an excerpt of:

Machine Learning with R - Second Edition

Explore Title