The term machine learning has all sorts of meanings attached to it today, especially after Hollywood’s (and others’) movie studios have gotten into the picture. Films such as Ex Machina have tantalized the imaginations of moviegoers the world over and made machine learning into all sorts of things that it really isn’t. Of course, most of us have to live in the real world, where machine learning actually does perform an incredible array of tasks that have nothing to do with androids that can pass the Turing Test (fooling their makers into believing they’re human). Machine Learning For Dummies provides you with a view of machine learning in the real world and exposes you to the amazing feats you really can perform using this technology. Even though the tasks that you perform using machine learning may seem a bit mundane when compared to the movie version, by the time you finish this book, you realize that these mundane tasks have the power to impact the lives of everyone on the planet in nearly every aspect of their daily lives. In short, machine learning is an incredible technology — just not in the way that some people have imagined.
About This Book
The main purpose of Machine Learning For Dummies is to help you understand what machine learning can and can’t do for you today and what it might do for you in the future. You don’t have to be a computer scientist to use this book, even though it does contain many coding examples. In fact, you can come from any discipline that heavily emphasizes math because that’s how this book focuses on machine learning. Instead of dealing with abstractions, you see the concrete results of using specific algorithms to interact with big data in particular ways to obtain a certain, useful result. The emphasis is on useful because machine learning has the power to perform a wide array of tasks in a manner never seen before.
Part of the emphasis of this book is on using the right tools. This book uses both Python and R to perform various tasks. These two languages have special features that make them particularly useful in a machine learning setting. For example, Python provides access to a huge array of libraries that let you do just about anything you can imagine and more than a few you can’t. Likewise, R provides an ease of use that few languages can match. Machine Learning For Dummies helps you understand that both languages have their role to play and gives examples of when one language works a bit better than the other to achieve the goals you have in mind.
You also discover some interesting techniques in this book. The most important is that you don’t just see the algorithms used to perform tasks; you also get an explanation of how the algorithms work. Unlike many other books, Machine Learning For Dummies enables you to fully understand what you’re doing, but without requiring you to have a PhD in math. After you read this book, you finally have a basis on which to build your knowledge and go even further in using machine learning to perform tasks in your specific field.
Of course, you might still be worried about the whole programming environment issue, and this book doesn’t leave you in the dark there, either. At the beginning, you find complete installation instructions for both RStudio and Anaconda, which are the Integrated Development Environments (IDEs) used for this book. In addition, quick primers (with references) help you understand the basic R and Python programming that you need to perform. The emphasis is on getting you up and running as quickly as possible, and to make examples straightforward and simple so that the code doesn’t become a stumbling block to learning.
To help you absorb the concepts, this book uses the following conventions:
- Text that you’re meant to type just as it appears in the book is in bold. The exception is when you’re working through a step list: Because each step is bold, the text to type is not bold.
- Words that we want you to type in that are also in italics are used as placeholders, which means that you need to replace them with something that works for you. For example, if you see “Type Your Name and press Enter,” you need to replace Your Name with your actual name.
- We also use italics for terms we define. This means that you don’t have to rely on other sources to provide the definitions you need.
- Web addresses and programming code appear in
monofont. If you’re reading a digital version of this book on a device connected to the Internet, you can click the live link to visit that website, like this:
- When you need to click command sequences, you see them separated by a special arrow, like this: File ⇒ New File, which tells you to click File and then New File.
You might find it difficult to believe that we’ve assumed anything about you — after all, we haven’t even met you yet! Although most assumptions are indeed foolish, we made certain assumptions to provide a starting point for the book.
The first assumption is that you’re familiar with the platform you want to use because the book doesn’t provide any guidance in this regard. (Chapter 4 does, however, provide RStudio installation instructions, and Chapter 6 tells you how to install Anaconda.) To give you the maximum information about R and Python with regard to machine learning, this book doesn’t discuss any platform-specific issues. You really do need to know how to install applications, use applications, and generally work with your chosen platform before you begin working with this book.
This book isn’t a math primer. Yes, you see lots of examples of complex math, but the emphasis is on helping you use R, Python, and machine learning to perform analysis tasks rather than learn math theory. However, you do get explanations of many of the algorithms used in the book so that you can understand how the algorithms work. Chapters 1 and 2 guide you through a better understanding of precisely what you need to know in order to use this book successfully.
This book also assumes that you can access items on the Internet. Sprinkled throughout are numerous references to online material that will enhance your learning experience. However, these added sources are useful only if you actually find and use them.
Icons Used in This Book
As you read this book, you encounter icons in the margins that indicate material of interest (or not, as the case may be). Here’s what the icons mean:
Beyond the Book
This book isn’t the end of your R, Python, or machine learning experience — it’s really just the beginning. We provide online content to make this book more flexible and better able to meet your needs. That way, as we receive email from you, we can address questions and tell you how updates to R, Python, or their associated add-ons affect book content. In fact, you gain access to all these cool additions:
- Cheat sheet: You remember using crib notes in school to make a better mark on a test, don’t you? You do? Well, a cheat sheet is sort of like that. It provides you with some special notes about tasks that you can do with R, Python, RStudio, Anaconda, and machine learning that not every other person knows. To view this book’s Cheat Sheet, simply go to
www.dummies.comand search for “Machine Learning For Dummies Cheat Sheet” in the Search box. It contains really neat information such as finding the algorithms you commonly need for machine learning.
Updates: Sometimes changes happen. For example, we might not have seen an upcoming change when we looked into our crystal ball during the writing of this book. In the past, this possibility simply meant that the book became outdated and less useful, but you can now find updates to the book at
In addition to these updates, check out the blog posts with answers to reader questions and demonstrations of useful book-related techniques at
- Companion files: Hey! Who really wants to type all the code in the book and reconstruct all those plots manually? Most readers prefer to spend their time actually working with R, Python, performing machine learning tasks, and seeing the interesting things they can do, rather than typing. Fortunately for you, the examples used in the book are available for download, so all you need to do is read the book to learn machine learning usage techniques. You can find these files at
Where to Go from Here
It’s time to start your machine learning adventure! If you’re completely new to machine learning tasks, you should start with Chapter 1 and progress through the book at a pace that allows you to absorb as much of the material as possible. Make sure to read about both R and Python because the book uses both languages as needed for the examples.
If you’re a novice who’s in an absolute rush to get going with machine learning as quickly as possible, you can skip to Chapter 4 with the understanding that you may find some topics a bit confusing later. If you already have RStudio installed, you can skim Chapter 4. Likewise, if you already have Anaconda installed, you can skim Chapter 6. To use this book, you must install R version 3.2.3. The Python version we use is 2.7.11. The examples won’t work with the 3.x version of Python because this version doesn’t support some of the libraries we use.
Readers who have some exposure to both R and Python, and have the appropriate language versions installed, can save reading time by moving directly to Chapter 8. You can always go back to earlier chapters as necessary when you have questions. However, you do need to understand how each technique works before moving to the next one. Every technique, coding example, and procedure has important lessons for you, and you could miss vital content if you start skipping too much information.