You're reading from Hands-On Predictive Analytics with Python

Product typeBook

Published inDec 2018

Reading LevelIntermediate

PublisherPackt

ISBN-139781789138719

Edition1st Edition

Languages

Python

Tools

TensorFlow

Concepts

Predictive Analytics

Author (1)

Alvaro Fuentes

Predicting Numerical Values with Machine Learning

Let's review what we have done so far: the business problem has been formulated, the data has been acquired and prepared, and we have a good understanding of the features and their possible relationships after applying exploratory data analysis (EDA). Now, it is finally time to build our first predictive models!

However, before building models for predictions, we should understand some of the basic foundational concepts of the field that we'll use in this book: machine learning (ML). We begin by providing a brief overview of what ML is and what the main ML techniques are. This is, of course, not a book on ML; it's just a tool, so we won't get into the theoretical or technical details that you would find in a typical ML book. Those books usually dedicate one chapter for each family of models. In addition, ML...

Technical requirements

Python 3.6 or higher
Jupyter Notebooks
Recent versions of the following Python libraries: NumPy, pandas, matplotlib, Seaborn, and scikit-learn

Introduction to ML

Machine learning is a term that has seen an explosion in popularity, and that is mainly because it works. It has produced very good results when applied to many scientific and industrial problems, and is present, in one form or another, in many technological products and services people use daily. If you interact with the internet, use apps on your smartphone, check your email, or do any telecommunications or banking transactions, then you have definitely interacted with an ML model. This is not a book about ML; we will focus on giving the very basic concepts necessary to use ML as a tool for doing predictive analytics, we won't delve deeper into this exciting field, and there will be many important things that we will leave out. However, because of the huge rise in interest in the subject, there are many excellent resources covering everything from deeply...

Practical considerations before modeling

We now have a basic understanding of some of the most important conceptual and theoretical aspects of ML. In this section, we will talk about some of the practical things we need to do before building a model; this includes some further data processing that is needed for feeding the data for model training. We will also introduce our main tool for model building: scikit-learn.

Introducing scikit-learn

If you go to the main web page of scikit-learn, the first things you will read are the following statements about it:

Simple and efficient tool for data mining and data analysis
Accessible to everybody, and reusable in various contexts
Built on NumPy, SciPy, and matplotlib
Open source...

MLR

In scikit-learn, ML models are implemented in classes known as estimators, which include any object that learns from data, mainly models or transformers. All estimators have a fit method, which is used with a dataset to train the estimator like this: estimator.fit(data).

It is important to note that the estimator has two kinds of parameters:

Estimator parameters: All the parameters of an estimator can be set when it is instantiated or by modifying the corresponding attribute. Some of these estimator parameters correspond to the ML model hyperparameters. We will talk about model hyperparameters more later.
Estimated parameters: When data is fitted with an estimator, parameters are estimated from the data at hand. All the estimated parameters are attributes of the estimator object, ending with an underscore.

Since scikit-learn has a very consistent API using estimators, it...

Lasso regression

Lasso is a clever modification to the multiple regression model that automatically excludes features that have little relevance to the accuracy of predictions. It performs a regularization strategy to perform variable selection in order to try to enhance the prediction accuracy of the multiple regression model. The equation that the lasso regression model uses to make the predictions is the same as in the multiple regression case: a linear combination of all the features, that is, each of them multiplied by a single coefficient. The modification is made in the quantity that the algorithm is trying to minimize; if we have P predictors, then the problem now is to find the combination of weights (w) that will minimize the following quantity:

Note that the first part of the quantity is almost the same as in the case of the MLR (except for the constant multiplying...

KNN

The KNN method is a method that can be used for both regression and classification problems. It belongs to the class of non-parametric models, because, unlike parametric models, the predictions are not based on the calculation of any parameters. Examples of parametric models are the regression models that we just discussed. The weights in the case of the former regression models are the parameters. KNN belongs to the family of non-parametric models, and despite its simplicity (or perhaps because of it), it frequently produces very good results, comparable to those produced by more complex and elaborate models. In its most basic implementation, it is easy understand how to it works: for a fixed number, K, which is the number of neighbors, and a given observation whose target value we want to predict, do the following:

Find the K data points that are closest in their feature...

Training versus testing error

The point of splitting the dataset into training and testing sets was to simulate the situation of using the model to make predictions on data the model has not seen. As we said before, the whole point is to generalize what we have learned from the observed data. The training MSE (or any metric calculated on the training dataset) may give us a biased view of the performance of our model, especially because of the possibility of overfitting. The metrics of performance we get from the training dataset will tend to be too optimistic. Let's take a look again at our illustration of overfitting:

If we calculate the training MSE for these three cases, we will definitely get the lowest one (hence the best) for the third model, the polynomial with 16 degrees; as we see, the model touches many points, making the error for those points exactly 0. However...

Summary

This was a dense chapter! We introduced some of the most important concepts of ML; we know that ML has three main branches, supervised, unsupervised, and reinforcement learning, and that we will be using only supervised learning in this book. Supervised learning has two types of tasks, regression and classification, whose only difference is the type of target we want to predict. We also talked about the very abstract concepts of hypothesis set and learning algorithm, and we even invented our (very bad) pseudo-ML model.

We also talked about the very important concept of generalization, which is the whole point of building ML models: to be able to learn how to map the features to the target using the data we have, and then use this knowledge to make predictions with data that we don't have yet. Cross-validation is a set of techniques to evaluate models; the most basic...

Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning. Springer series in statistics.
Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Springer.
Pedregosa, F. et. al. (2011). Scikit-learn: Machine learning in Python. In Journal of machine learning research.
Raschka, S., & Mirjalili, V. (2017). Python machine learning. Packt Publishing.
Weinberger, K. Q., Blitzer, J., & Saul, L. K. (2006). Distance metric learning for large margin nearest neighbor classification. In Advances in neural information processing systems (pp. 1473-1480).

The rest of the chapter is locked

You have been reading a chapter from

Hands-On Predictive Analytics with Python

Published in: Dec 2018Publisher: PacktISBN-13: 9781789138719

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Alvaro Fuentes

Alvaro Fuentes is a senior data scientist with a background in applied mathematics and economics. He has more than 14 years of experience in various analytical roles and is an analytics consultant at one of the ‘Big Three' global management consulting firms, leading advanced analytics projects in different industries like banking, technology, and consumer goods. Alvaro is also an author and trainer in analytics and data science and has published courses and books, such as 'Become a Python Data Analyst' and 'Hands-On Predictive Analytics with Python'. He has also taught data science and related topics to thousands of students both on-site and online through different platforms such as Springboard, Simplilearn, Udemy, and BSG Institute, among others.
Read more about Alvaro Fuentes

Other recommended products

Related to this chapter

Mastering Predictive Analytics with scikit-learn and TensorFlow

In this book, you will find a range of methods to improve the performance of almost any predictive model, from ensemble methods to dimensionality reduction and cross-validation. You will learn the tools to produce advanced predictive models. In addition, you will dive into the exiting field of Deep Learning using TensorFlow.

BookSep 2018154 pages

Become a Python Data Analyst

Become a Python Data Analyst book introduces you to the mainstream libraries of Python’s Data Science stack. With proven examples and real-world datasets, this book teaches how to effectively perform data manipulation, visualize and analyze data patterns and brings you to the ladder of advanced topics like Predictive Analytics.

BookAug 2018178 pages

Interactive Dashboards and Data Apps with Plotly and Dash

Learn how to design and build Dash apps from scratch with this practical book that covers the different functionalities of Plotly and Dash for building dashboards and data apps. You’ll start by exploring the Dash ecosystem and go on to build a fully functional app as you discover options for fine-tuning and extending your app using new techniques.

BookMay 2021364 pages

Machine Learning with scikit-learn Quick Start Guide

Scikit-learn is a robust machine learning library for the Python programming language. It provides a set of supervised and unsupervised learning algorithms. This book is the easiest way to learn how to deploy, optimize and evaluate all the important machine learning algorithms that scikit-learn provides.

BookOct 2018172 pages

Data Science Projects with Python

Ideal for anyone who is just getting started with machine learning, this hands-on data science book will give you experience building predictive models using industry-standard tools and techniques. It will help you develop the skills and understanding to generate valuable insights and make data-driven business decisions.

BookJul 2021432 pages

Data Science Projects with Python

Data Science Projects with Python will help you build a toolkit for solving data science problems with Python. You will learn how to implement machine learning techniques for deriving insights from data. These skills will help you develop the kind of state-of-the-art predictive models that are used to deliver value to businesses across industries.

BookApr 2019374 pages

Ensemble Machine Learning Cookbook

This book uses a recipe-based approach to showcase the power of machine learning algorithms to build ensemble models using Python libraries. Through this book, you will be able to pick up the code, understand in depth how it works, execute and implement it efficiently. This will be a desk reference to implement a wide range of tasks and solve the common and uncommon problems in ensemble machine learning domain.

BookJan 2019336 pages

Feature Engineering Made Easy

Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective.

BookJan 2018316 pages

scikit-learn Cookbook

scikit-learn has evolved as a robust library for machine learning applications in python with support for a wide range of supervised and unsupervised learning algorithms. This edition brings to you the various enhancements to its model implementations, API and bug fixes in the latest major release of scikit-learn to support Python. This book covers easy to follow recipes right from mathematical operations to implementing various supervised, unsupervised and deep learning algorithms with scikit-learn. Get practical hands-on knowledge to implement various models and algorithms like Multi-Layer Perceptrons, time-series split, MAE criterion for regression, criteria for gradient boosting, Classifier, Regressor, and much more.

BookNov 2017374 pages

Python Data Mining Quick Start Guide

This book is an introduction to data mining and its practical demonstration of working with real-world data sets. With this book, you will be able to extract useful insights using common Python libraries. You will also learn key stages like data loading, cleaning, analysis, visualization to build an efficient data mining pipeline.

BookApr 2019188 pages

Data Science Crash Course for Beginners

This course lays the groundwork for further study into data science for those students with little to no experience. Through step-by-step instructions, numerous exercises, and real-world examples, this book helps you master the basics of data science and implement those essential techniques in Python.

BookMar 2021310 pages

Applied Deep Learning with Keras

Applied Deep Learning with Keras takes you from a basic knowledge of machine learning and Python to an expert understanding of applying Keras to develop efficient deep learning solutions. This book teaches you new techniques to handle neural networks, and in turn, broadens your options as a data scientist.

BookApr 2019412 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages