Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Machine Learning with scikit-learn Quick Start Guide
Machine Learning with scikit-learn Quick Start Guide

Machine Learning with scikit-learn Quick Start Guide: Classification, regression, and clustering techniques in Python

By Kevin Jolly
$15.99 per month
Book Oct 2018 172 pages 1st Edition
eBook
$25.99 $17.99
Print
$32.99
Subscription
$15.99 Monthly
eBook
$25.99 $17.99
Print
$32.99
Subscription
$15.99 Monthly

What do you get with a Packt Subscription?

Free for first 7 days. $15.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details


Publication date : Oct 30, 2018
Length 172 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781789343700
Category :
Table of content icon View table of contents Preview book icon Preview Book

Machine Learning with scikit-learn Quick Start Guide

Introducing Machine Learning with scikit-learn

Welcome to the world of machine learning with scikit-learn. I'm thrilled that you have chosen this book in order to begin or further advance your knowledge on the vast field of machine learning. Machine learning can be overwhelming at times and this is partly due to the large number of tools that are available on the market. This book will simplify this process of tool selection down to one scikit-learn.

If I were to tell you what this book can do for you in one sentence, it would be this The book gives you pipelines that can be implemented in order to solve a wide range of machine learning problems. True to what this sentence implies, you will learn how to construct an end-to-end machine learning pipeline using some of the most popular algorithms that are widely used in the industry and professional competitions, such as Kaggle.

However, in this introductory chapter, we will go through the following topics:

  • A brief introduction to machine learning
  • What is scikit-learn?
  • Installing scikit-learn
  • Algorithms that you will learn to implement scikit-learn in this book

Now, let's begin this fun journey into the world of machine learning with scikit-learn!

A brief introduction to machine learning

Machine learning has generated quite the buzz – from Elon Musk fearing the role of unregulated artificial intelligence in society, to Mark Zuckerberg having a view that contradicts Musk's.

So, what exactly is machine learning? Simply put, machine learning is a set of methods that can detect patterns in data and use those patterns to make future predictions. Machine learning has found immense value in a wide range of industries, ranging from finance to healthcare. This translates to a higher requirement of talent with the skill capital in the field of machine learning.

Broadly speaking, machine learning can be categorized into three main types:

  • Supervised learning
  • Unsupervised learning
  • Reinforcement learning

Scikit-learn is designed to tackle problems pertaining to supervised and unsupervised learning only, and does not support reinforcement learning at present.

Supervised learning

Supervised learning is a form of machine learning in which our data comes with a set of labels or a target variable that is numeric. These labels/categories usually belong to one feature/attribute, which is commonly known as the target variable. For instance, each row of your data could either belong to the category of Healthy or Not Healthy.

Given a set of features such as weight, blood sugar levels, and age, we can use the supervised machine learning algorithm to predict whether the person is healthy or not.

In the following simple mathematical expression, S is the supervised learning algorithm, X is the set of input features, such as weight and age, and Y is the target variable with the labels Healthy or Not Healthy:

Although supervised machine learning is the most common type of machine learning that is implemented with scikit-learn and in the industry, most datasets typically do not come with predefined labels. Unsupervised learning algorithms are first used to cluster data without labels into distinct groups to which we can then assign labels. This is discussed in detail in the following section.

Unsupervised learning

Unsupervised learning is a form of machine learning in which the algorithm tries to detect/find patterns in data that do not have an outcome/target variable. In other words, we do not have data that comes with pre-existing labels. Thus, the algorithm will typically use a metric such as distance to group data together depending on how close they are to each other.

As discussed in the previous section, most of the data that you will encounter in the real world will not come with a set of predefined labels and, as such, will only have a set of input features without a target attribute.

In the following simple mathematical expression, U is the unsupervised learning algorithm, while X is a set of input features, such as weight and age:

Given this data, our objective is to create groups that could potentially be labeled as Healthy or Not Healthy. The unsupervised learning algorithm will use a metric such as distance in order to identify how close a set of points are to each other and how far apart two such groups are. The algorithm will then proceed to cluster these groups into two distinct groups, as illustrated in the following diagram:

Clustering two groups together

What is scikit-learn?

Scikit-learn is a free and open source software that helps you tackle supervised and unsupervised machine learning problems. The software is built entirely in Python and utilizes some of the most popular libraries that Python has to offer, namely NumPy and SciPy.

The main reason why scikit-learn is very popular stems from the fact that most of the world's most popular machine learning algorithms can be implemented quite quickly in a plug and play format once you know what the core pipeline is like. Another reason is that popular algorithms for classification such as logistic regression and support vector machines are written in Cython. Cython is used to give these algorithms C-like performance and thus makes the use of scikit-learn quite efficient in the process.

Installing scikit-learn

There are two ways in which you can install scikit-learn on your personal device:

  • By using the pip method
  • By using the Anaconda method

The pip method can be implemented on the macOS/Linux Terminal or the Windows PowerShell, while the Anaconda method will work with the Anaconda prompt.

Choosing between these two methods of installation is pretty straightforward:

  • If you would like all the common Python package distributions for data science to be installed in one environment, the Anaconda method works best
  • If you would like to build you own environment from scratch for scikit-learn, the pip method works best (for advanced users of Python)
This book will be using Python 3.6 for all the code that is displayed throughout every chapter, unless mentioned otherwise.

The pip method

Scikit-learn requires a few packages to be installed on your device before you can install it. These are as follows:

  • NumPy: Version 1.8.2 or greater
  • SciPy: Version 0.13.3 or greater

These can be installed using the pip method by using the following commands:

pip3 install NumPy
pip3 install SciPy

Next, we can install scikit-learn using the following code:

pip3 install scikit-learn

Additionally, if you already have scikit-learn installed on your device and you simply want to upgrade it to the latest version, you can use the following code:

pip3 install -U scikit-learn
The version of scikit-learn implemented in the book is 0.19.1.

The Anaconda method

In the event that you have installed Python using the Anaconda distribution, you can install scikit-learn by using the following code in the Anaconda prompt:

The first step is to install the dependencies:

conda install NumPy
conda install SciPy

Next, we can install scikit-learn by using the following code:

conda install scikit-learn

Additionally, if you already have scikit-learn installed with the Anaconda distribution, you can upgrade it to the latest version by using the following code in the Anaconda prompt:

conda update scikit-learn
When upgrading or uninstalling scikit-learn that has been installed with Anaconda, avoid using the pip method at all costs as doing so is most likely going to fail upgrading or removing all the required files. Stick with either the pip method or the Anaconda method in order to maintain consistency.

Additional packages

In this section, we will talk about the packages that we will be installing outside of scikit-learn that will be used throughout this book.

Pandas

To install Pandas, you can use either the pip method or the Anaconda method, as follows:

Pip method:

pip3 install pandas

Anaconda method:

conda install pandas

Matplotlib

To install matplotlib, you can use either the pip method or the Anaconda method, as follows:

Pip method:

pip3 install matplotlib

Anaconda method:

conda install matplotlib

Tree

To install tree, you can use either the pip method or the Anaconda method, as follows:

Pip method:

pip3 install tree

Anaconda method:

conda install tree

Pydotplus

To install pydotplus, you can use either the pip method or the Anaconda method, as follows:

Pip method:

pip3 install pydotplus

Anaconda method:

conda install pydotplus

Image

To install Image, you can use either the pip method or the Anaconda method, as follows:

Pip method:

pip3 install Image

Anaconda method:

conda install Image

Algorithms that you will learn to implement using scikit-learn

The algorithms that you will learn about in this book are broadly classified into the following two categories:

  • Supervised learning algorithms
  • Unsupervised learning algorithms

Supervised learning algorithms

Supervised learning algorithms can be used to solve both classification and regression problems. In this book, you will learn how to implement some of the most popular supervised machine learning algorithms. Popular supervised machine learning algorithms are the ones that are widely used in industry and research, and have helped us solve a wide range of problems across a wide range of domains. These supervised learning algorithms are as follows:

  • Linear regression: This supervised learning algorithm is used to predict continuous numeric outcomes such as house prices, stock prices, and temperature, to name a few
  • Logistic regression: The logistic learning algorithm is a popular classification algorithm that is especially used in the credit industry in order to predict loan defaults
  • k-Nearest Neighbors: The k-NN algorithm is a classification algorithm that is used to classify data into two or more categories, and is widely used to classify houses into expensive and affordable categories based on price, area, bedrooms, and a whole range of other features
  • Support vector machines: The SVM algorithm is a popular classification algorithm that is used in image and face detection, along with applications such as handwriting recognition
  • Tree-Based algorithms: Tree-based algorithms such as decision trees, Random Forests, and Boosted trees are used to solve both classification and regression problems
  • Naive Bayes: The Naive Bayes classifier is a machine learning algorithm that uses the mathematical model of probability to solve classification problems

Unsupervised learning algorithms

Unsupervised machine learning algorithms are typically used to cluster points of data based on distance. The unsupervised learning algorithm that you will learn about in this book is as follows:

  • k-means: The k-means algorithm is a popular algorithm that is typically used to segment customers into unique categories based on a variety of features, such as their spending habits. This algorithm is also used to segment houses into categories based on their features, such as price and area.

Summary

This chapter has given you a brief introduction into what machine learning is for those of you who are just beginning your journey into the world of machine learning. You have learned about how scikit-learn fits into the context of machine learning and how you can go about installing the necessary software.

Finally, you had a brief glimpse at all the algorithms that you will learn to implement as you progress through this book, as well as its associated applications in the real world.

In the next chapter, you will learn how to implement your first algorithm the K-Nearest Neighbors algorithm!

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Build your first machine learning model using scikit-learn
  • Train supervised and unsupervised models using popular techniques such as classification, regression and clustering
  • Understand how scikit-learn can be applied to different types of machine learning problems

Description

Scikit-learn is a robust machine learning library for the Python programming language. It provides a set of supervised and unsupervised learning algorithms. This book is the easiest way to learn how to deploy, optimize, and evaluate all of the important machine learning algorithms that scikit-learn provides. This book teaches you how to use scikit-learn for machine learning. You will start by setting up and configuring your machine learning environment with scikit-learn. To put scikit-learn to use, you will learn how to implement various supervised and unsupervised machine learning models. You will learn classification, regression, and clustering techniques to work with different types of datasets and train your models. Finally, you will learn about an effective pipeline to help you build a machine learning project from scratch. By the end of this book, you will be confident in building your own machine learning models for accurate predictions.

What you will learn

Learn how to work with all scikit-learn s machine learning algorithms Install and set up scikit-learn to build your first machine learning model Employ Unsupervised Machine Learning Algorithms to cluster unlabelled data into groups Perform classification and regression machine learning Use an effective pipeline to build a machine learning project from scratch

What do you get with a Packt Subscription?

Free for first 7 days. $15.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details


Publication date : Oct 30, 2018
Length 172 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781789343700
Category :

Table of Contents

10 Chapters
Preface Chevron down icon Chevron up icon
Introducing Machine Learning with scikit-learn Chevron down icon Chevron up icon
Predicting Categories with K-Nearest Neighbors Chevron down icon Chevron up icon
Predicting Categories with Logistic Regression Chevron down icon Chevron up icon
Predicting Categories with Naive Bayes and SVMs Chevron down icon Chevron up icon
Predicting Numeric Outcomes with Linear Regression Chevron down icon Chevron up icon
Classification and Regression with Trees Chevron down icon Chevron up icon
Clustering Data with Unsupervised Machine Learning Chevron down icon Chevron up icon
Performance Evaluation Methods Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.