Machine learning is everywhere. When you book a flight ticket, an algorithm decides the price you are going to pay for it. When you apply for a loan, machine learning may decide whether you are going to get it or not. When you scroll through your Facebook timeline, it picks which advertisements to show to you. Machine learning also plays a big role in your Google search results. It organizes your email's inbox and filters out spam, it goes through your resumé before recruiters when you apply for a job, and, more recently, it has also started to play the role of your personal assistant in the form of Siri and other virtual assistants.
In this book, we will learn about the theory and practice of machine learning. We will understand when and how to apply it. To get started, we will look at a high-level introduction to how machine learning works. You will then be able to differentiate...
Understanding machine learning
You may be wondering how machines actually learn. To get the answer to this query, let's take the following example of a fictional company. Space Shuttle Corporation has a few space vehicles to rent. They get applications every day from clients who want to travel to Mars. They are not sure whether those clients will ever return the vehicles—maybe they'll decide to continue living on Mars and never come back again. Even worse, some of the clients may be lousy pilots and crash their vehicles on the way. So, the company decides to hire shuttle rent-approval officers whose job is to go through the applications and decide who is worthy of a shuttle ride. Their business, however, grows so big that they need to formulate the shuttle-approval process.
A traditional shuttle company would start by having business rules and hiring junior employees to execute those rules. For example, if you are an alien, then sorry, you cannot rent...
The model development life cycle
When asked to solve a problem using machine learning, data scientists achieve this by following a sequence of steps. In this section, we are going to discuss those iterative steps.
Understanding a problem
The first thing to do when developing a model is to understand the problem you are trying to solve thoroughly. This not only involves understanding what problem you are solving, but also why you are solving it, what impact are you expecting to have, and what the currently available solution isthat you are comparing your new solution to. My understanding of what Box said when he stated that all models are wrong is that a model is just an approximation of reality by modeling one or more angles of it. By understanding the problem you are trying to solve, you can decide which angles of reality you need to model, and which ones you can tolerate...
Introduction to scikit-learn
Since you have already picked up this book, you probably don't need me to convince you why machine learning is important. However, you may still have doubts about why to use scikit-learn in particular. You may encounter names such as TensorFlow, PyTorch, and Spark more often during your daily news consumption than scikit-learn. So, let me convince you of my preference for the latter.
It plays well with the Python data ecosystem
scikit-learn is a Python toolkit built on top of NumPy, SciPy, and Matplotlib. These choices mean that it fits well into your daily data pipeline. As a data scientist, Python is most likely your language of choice since it is good for both offline analysis and real-time implementations. You will also be using tools such as pandas to load data from your database, which allows you to perform a vast amount of transformation to your data. Since both pandas and scikit-learn are built on top of NumPy, they play...
Installing the packages you need
It's time to install the packages we will need in this book, but first of all, make sure you have Python installed on your computer. In this book, we will be using Python version 3.6. If your computer comes with Python 2.x installed, then you should upgrade Python to version 3.6 or later. I will show you how to install the required packages using pip, Python's de facto package-management system. If you use other package-management systems, such as Anaconda, you can easily find the equivalent installation commands for each of the following packages online.
To install scikit-learn, run the following command:
$ pip install --upgrade scikit-learn==0.22
I will be using version 0.22 of scikit-learn here. You can add the --userswitch to the pip command to limit the installation to your own directories. This is important if you do not have root access to your machine or if you do not want to install...
Summary
Mastering machine learning is a desirable skill nowadays given its vast application everywhere, from business to academia. Nevertheless, just understanding the theory of it will only take you so far since practitioners also need to understand their tools to be self-sufficient and capable.
In this chapter, we started with a high-level introduction to machine learning and learned when to use each of the machine learning types; from classification and regression to clustering and reinforcement learning. We then learned about scikit-learn and why practitioners recommend it when solving both supervised and unsupervised learning problems. To keep this book self-sufficient, we also covered the basics of data manipulation for those who haven't used libraries such as pandas and Matplotlib before. In the following chapters, we will continue to combine our understanding of the underlying theory of machine learning with more practical examples using scikit-learn.
...Further reading
For more information on the relative topics of this chapter, please refer to the following links:
-
Learn Python Programming – Second Edition, by Fabrizio Romano: https://www.packtpub.com/application-development/learn-python-programming-second-edition
-
Hands-On Data Analysis with Pandas, by Stefanie Molin: https://www.packtpub.com/big-data-and-business-intelligence/hands-data-analysis-pandas