Reader small image

You're reading from  Practical Machine Learning Cookbook

Product typeBook
Published inApr 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781785280511
Edition1st Edition
Languages
Right arrow
Author (1)
Atul Tripathi
Atul Tripathi
author image
Atul Tripathi

Atul Tripathi has spent more than 11 years in the fields of machine learning and quantitative finance. He has a total of 14 years of experience in software development and research. He has worked on advanced machine learning techniques, such as neural networks and Markov models. While working on these techniques, he has solved problems related to image processing, telecommunications, human speech recognition, and natural language processing. He has also developed tools for text mining using neural networks. In the field of quantitative finance, he has developed models for Value at Risk, Extreme Value Theorem, Option Pricing, and Energy Derivatives using Monte Carlo simulation techniques.
Read more about Atul Tripathi

Right arrow

Chapter 4. Model Selection and Regularization

In this chapter, we will cover the following recipes:

  • Shrinkage methods - calories burned per day
  • Dimension reduction methods - Delta's Aircraft Fleet
  • Principal component analysis - understanding world cuisine

Introduction


Subset selection: The use of labeled examples to induce a model that classifies objects into a finite set of known classes is one of the main challenges of supervised classification in machine learning. Vectors of numeric or nominal features are used to describe the various examples. In the feature subset selection problem, a learning algorithm is faced with the problem of selecting some subset of features upon which to focus its attention, while ignoring the rest.

When fitting a linear regression model, a subset of variables that best describe the data are of interest. There are a number of different ways the best subset, applying a number of different strategies, can be adopted when searching for a variables set. If there are m variables and the best regression model consists of p variables, p≤m, then a more general approach to pick the best subset might be to try all possible combinations of p variables and select the model that fits the data the best.

However, there are...

Shrinkage methods - calories burned per day


In order to compare the metabolic rate of humans, the concept of basal metabolic rate (BMR) is critical, in a clinical context, as a means of determining thyroid status in humans. The BMR of mammals varies with body mass, with the same allometric exponent as field metabolic rate, and with many physiological and biochemical rates. Fitbit, as a device, uses BMR and activities performed during the day to estimate calories burned throughout the day.

Getting ready

In order to perform shrinkage methods, we shall be using a dataset collected from Fitbit and a calories-burned dataset.

Step 1 - collecting and describing data

The dataset titled fitbit_export_20160806.csv which is in CSV format shall be used. The dataset is in standard format. There are 30 rows of data and 10 variables. The numeric variables are as follows:

  • Calories Burned
  • Steps
  • Distance
  • Floors
  • Minutes Sedentary
  • Minutes Lightly Active
  • Minutes Fairly Active
  • ExAng
  • Minutes Very Active
  • Activity Calories

The...

Dimension reduction methods - Delta's Aircraft Fleet


Fleet planning is a part of the strategic planning process for any airline company. Fleet is the total number of aircraft that an airline operates, as well as the specific aircraft types that comprise the total fleet. Airline selection criteria for aircraft acquisition are based on technical/performance characteristics, economic and financial impact, environmental regulations and constraints, marketing considerations, and political realities. Fleet composition is a critical long-term strategic decision for an airline company. Each aircraft type has different technical performance characteristics, for example, the capacity to carry the payload over a maximum flight distance or range. It affects financial position, operating costs, and especially the ability to serve specific routes.

Getting ready

In order to perform dimension reduction we shall be using a dataset collected on Delta Airlines Aircraft Fleet.

Step 1 - collecting and describing...

Principal component analysis - understanding world cuisine


Food is a powerful symbol of who we are. There are many types of food identification, such as ethnic, religious, and class identifications. Ethnic food preferences become identity markers in the presence of gustatory foreigners, such as when one goes abroad, or when those foreigners visit the home shores.

Getting ready

In order to perform principal component analysis, we shall be using a dataset collected on the Epicurious recipe dataset.

Step 1 - collecting and describing data

The dataset titled epic_recipes.txt shall be used. The dataset is in standard format.

How to do it...

Let's get into the details.

Step 2 - exploring data

The first step is to load the following packages:

    > install.packages("glmnet") 
    > library(ggplot2)
    > library(glmnet)

Note

Version info: Code for this page was tested in R version 3.3.2 (2016-10-31)

Let's explore the data and understand the relationships among the variables. We'll begin by importing...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Practical Machine Learning Cookbook
Published in: Apr 2017Publisher: PacktISBN-13: 9781785280511
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Atul Tripathi

Atul Tripathi has spent more than 11 years in the fields of machine learning and quantitative finance. He has a total of 14 years of experience in software development and research. He has worked on advanced machine learning techniques, such as neural networks and Markov models. While working on these techniques, he has solved problems related to image processing, telecommunications, human speech recognition, and natural language processing. He has also developed tools for text mining using neural networks. In the field of quantitative finance, he has developed models for Value at Risk, Extreme Value Theorem, Option Pricing, and Energy Derivatives using Monte Carlo simulation techniques.
Read more about Atul Tripathi