Reader small image

You're reading from  Microsoft Azure Machine Learning

Product typeBook
Published inJun 2015
Reading LevelIntermediate
Publisher
ISBN-139781784390792
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
Sumit Mund
Sumit Mund
author image
Sumit Mund

Sumit Mund is a BI/analytics consultant with about a decade of industry experience. He works in his own company, Mund Consulting Ltd., where he is a director and lead consultant. He is an expert in machine learning, predictive analytics, C#, R, and Python programming; he also has an active interest in Artificial Intelligence. He has extensive experience working with most of Microsoft Data Analytics tools and also on Big Data platforms, such as Hadoop and Spark. He is a Microsoft Certified Solution Expert (MCSE in Business Intelligence). Sumit regularly engages on social media platforms through his tweets, blogs, and LinkedIn profile, and often gives talks at industry conferences and local user group meetings.
Read more about Sumit Mund

Christina Storm
Christina Storm
View More author details
Right arrow

Data exploration and preparation


In your experiment, drag the Flight Delays Data sample dataset and click on the Visualize option to explore the dataset. You can find that some columns have lots of missing values. You can clean the missing data using a Clean Missing Data module by replacing it with MICE as the cleaning mode.

There are certain columns, such as DayOfWeek, OriginAirportID, and DestAirportID which contain continuous numbers; however, they are categorical variables. So, use the Metadata Editor module to set them as Categorical.

Feature selection

Before you start developing the model, it is important to select or generate a set of variables that have the most predictive power and remove any redundant and not so important features. In this case, all the data points are of the same year, so the year column is not required here. We are interested in predicting the delays before the journey starts, so the DepDel15 and DepDelay columns are not important. Again, both the ArrDelay and...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Microsoft Azure Machine Learning
Published in: Jun 2015Publisher: ISBN-13: 9781784390792

Authors (2)

author image
Sumit Mund

Sumit Mund is a BI/analytics consultant with about a decade of industry experience. He works in his own company, Mund Consulting Ltd., where he is a director and lead consultant. He is an expert in machine learning, predictive analytics, C#, R, and Python programming; he also has an active interest in Artificial Intelligence. He has extensive experience working with most of Microsoft Data Analytics tools and also on Big Data platforms, such as Hadoop and Spark. He is a Microsoft Certified Solution Expert (MCSE in Business Intelligence). Sumit regularly engages on social media platforms through his tweets, blogs, and LinkedIn profile, and often gives talks at industry conferences and local user group meetings.
Read more about Sumit Mund