Packt+ | Advance your knowledge in tech

You're reading from Practical Machine Learning Cookbook

Product typeBook

Published inApr 2017

Reading LevelIntermediate

PublisherPackt

ISBN-139781785280511

Edition1st Edition

Languages

Python

Tools

Apache Spark

Concepts

Machine Learning

Author (1)

Atul Tripathi

Chapter 4. Model Selection and Regularization

In this chapter, we will cover the following recipes:

Shrinkage methods - calories burned per day
Dimension reduction methods - Delta's Aircraft Fleet
Principal component analysis - understanding world cuisine

Introduction

Subset selection: The use of labeled examples to induce a model that classifies objects into a finite set of known classes is one of the main challenges of supervised classification in machine learning. Vectors of numeric or nominal features are used to describe the various examples. In the feature subset selection problem, a learning algorithm is faced with the problem of selecting some subset of features upon which to focus its attention, while ignoring the rest.

When fitting a linear regression model, a subset of variables that best describe the data are of interest. There are a number of different ways the best subset, applying a number of different strategies, can be adopted when searching for a variables set. If there are m variables and the best regression model consists of p variables, p≤m, then a more general approach to pick the best subset might be to try all possible combinations of p variables and select the model that fits the data the best.

However, there are...

Shrinkage methods - calories burned per day

In order to compare the metabolic rate of humans, the concept of basal metabolic rate (BMR) is critical, in a clinical context, as a means of determining thyroid status in humans. The BMR of mammals varies with body mass, with the same allometric exponent as field metabolic rate, and with many physiological and biochemical rates. Fitbit, as a device, uses BMR and activities performed during the day to estimate calories burned throughout the day.

Getting ready

In order to perform shrinkage methods, we shall be using a dataset collected from Fitbit and a calories-burned dataset.

Step 1 - collecting and describing data

The dataset titled fitbit_export_20160806.csv which is in CSV format shall be used. The dataset is in standard format. There are 30 rows of data and 10 variables. The numeric variables are as follows:

Calories Burned
Steps
Distance
Floors
Minutes Sedentary
Minutes Lightly Active
Minutes Fairly Active
ExAng
Minutes Very Active
Activity Calories

The...

Dimension reduction methods - Delta's Aircraft Fleet

Fleet planning is a part of the strategic planning process for any airline company. Fleet is the total number of aircraft that an airline operates, as well as the specific aircraft types that comprise the total fleet. Airline selection criteria for aircraft acquisition are based on technical/performance characteristics, economic and financial impact, environmental regulations and constraints, marketing considerations, and political realities. Fleet composition is a critical long-term strategic decision for an airline company. Each aircraft type has different technical performance characteristics, for example, the capacity to carry the payload over a maximum flight distance or range. It affects financial position, operating costs, and especially the ability to serve specific routes.

Getting ready

In order to perform dimension reduction we shall be using a dataset collected on Delta Airlines Aircraft Fleet.

Step 1 - collecting and describing...

Principal component analysis - understanding world cuisine

Food is a powerful symbol of who we are. There are many types of food identification, such as ethnic, religious, and class identifications. Ethnic food preferences become identity markers in the presence of gustatory foreigners, such as when one goes abroad, or when those foreigners visit the home shores.

Getting ready

In order to perform principal component analysis, we shall be using a dataset collected on the Epicurious recipe dataset.

Step 1 - collecting and describing data

The dataset titled epic_recipes.txt shall be used. The dataset is in standard format.

How to do it...

Let's get into the details.

Step 2 - exploring data

The first step is to load the following packages:

    > install.packages("glmnet") 
    > library(ggplot2)
    > library(glmnet)

Note

Version info: Code for this page was tested in R version 3.3.2 (2016-10-31)

Let's explore the data and understand the relationships among the variables. We'll begin by importing...

The rest of the chapter is locked

You have been reading a chapter from

Practical Machine Learning Cookbook

Published in: Apr 2017Publisher: PacktISBN-13: 9781785280511

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Atul Tripathi

Atul Tripathi has spent more than 11 years in the fields of machine learning and quantitative finance. He has a total of 14 years of experience in software development and research. He has worked on advanced machine learning techniques, such as neural networks and Markov models. While working on these techniques, he has solved problems related to image processing, telecommunications, human speech recognition, and natural language processing. He has also developed tools for text mining using neural networks. In the field of quantitative finance, he has developed models for Value at Risk, Extreme Value Theorem, Option Pricing, and Energy Derivatives using Monte Carlo simulation techniques.
Read more about Atul Tripathi

Other recommended products

Related to this chapter

Regression Analysis with R

Regression analysis is a statistical process which enables prediction of relationships between variables. This book will give you a rundown explaining what regression analysis is, explaining you the process from scratch. Each chapter starts with explaining the theoretical concepts and once the reader gets comfortable with the theory, we move to the practical examples to support the understanding. By the end of this book you will know all the concepts and pain-points related to regression analysis, and you will be able to implement your learning in your projects.

BookJan 2018422 pages

Machine Learning with R Cookbook

The R language is a powerful open source functional programming language. At its core, R is a statistical language that provides impressive tools to analyze data and create high-level graphics. This book covers the basics of R by setting up a user-friendly programming environment and programming ETL in R. Data exploration examples are provided that demonstrate how powerful data visualisation and machine learning is in discovering hidden relationships. You will also explore air quality data, steps to fix the missing values and visualising the same. You will then dive into important machine learning topics, including data classification, regression, survival analysis, time series analysis, clustering association rule mining, and dimension reduction.This book will include the latest code and examples based on R 3.3 and above—updated for better computation, accuracy, and speed with R.

BookOct 2017572 pages

Statistical Application Development with R and Python

Statistical Analysis involves collecting and examining data to describe the nature of data that needs to be analyzed. It helps you explore the relation of data and build models to make better decisions. You will begin with a brief understanding of the nature of data and end with modern and advanced statistical models like CART. Every step is taken with DATA and R code, and further enhanced by Python. By the end of this book you will be able to apply your statistical learning in major domains at work or in your projects.

BookAug 2017432 pages

Neural Networks with R

The book helps you learn neural networks and implement them in R. It covers real-world use cases that will help you better understand their concepts. A basic understanding of R and mathematics is required.

BookSep 2017270 pages

Learning Quantitative Finance with R

This book covers applications of quantitative finance in R. It starts with the basics of quantitative finance and goes to complexity at the end of the book along with a varying degree of R complexity. This will guide you to implement different trading strategies for various financial instruments using basic to complex techniques along with its optimization and keeping the risk of financial instruments in check.

BookMar 2017284 pages

Hands-On Ensemble Learning with R

This book introduces you to the concept of ensemble learning and demonstrates how different machine learning algorithms can be combined to build efficient machine learning models. Use R to implement the popular trilogy of ensemble techniques, i.e. bagging, random forest and boosting, to build faster and more accurate machine learning models.

BookJul 2018376 pages

R Data Analysis Cookbook

Data analytics with R has emerged as a very important focus for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data. This book empowers you by showing you ways to use R to generate professional analysis reports. The book also teaches you to quickly adapt the example code for your own needs and save yourself the time needed to construct code from scratch.

BookSep 2017560 pages

R Statistics Cookbook

With this book, you will learn to execute a series of intermediate to advanced statistical tasks as you walk through each chapter. You will not only get well versed with the traditional statistics but you will also cover the necessary statistics required for machine learning and deep learning concepts.

BookMar 2019448 pages3

Machine Learning with R

Brett Lantz teaches you how to uncover key insights and make new predictions with this hands-on, practical guide to machine learning with R. This third edition is for experienced R users and beginners. The book is fully updated to R 3.6, featuring newer and better libraries, advice on ethical and bias issues, and an introduction to deep learning.

BookApr 2019458 pages

Practical Predictive Analytics

This book teaches six specific steps needed to implement predictive analytics using R. It also teaches how team collaboration is critical and how it increases the chances of implementing a successful model. The book uses cases from healthcare, marketing, and government to build practical skills. Big Data is also covered, in this book, which will extend your skill sets by learning Databricks and RSpark.

BookJun 2017576 pages

Mastering Machine Learning with R

Machine learning is the field of Artificial Intelligence where we build systems that learn from data. Given the growing prominence of R—a cross-platform, zero-cost statistical programming environment—there has never been a better time to start applying machine learning to your data. This book will teach you advanced techniques in machine learning with the latest code in R 3.3.2.

BookApr 2017420 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages