Reader small image

You're reading from  Microsoft Azure Machine Learning

Product typeBook
Published inJun 2015
Reading LevelIntermediate
Publisher
ISBN-139781784390792
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
Sumit Mund
Sumit Mund
author image
Sumit Mund

Sumit Mund is a BI/analytics consultant with about a decade of industry experience. He works in his own company, Mund Consulting Ltd., where he is a director and lead consultant. He is an expert in machine learning, predictive analytics, C#, R, and Python programming; he also has an active interest in Artificial Intelligence. He has extensive experience working with most of Microsoft Data Analytics tools and also on Big Data platforms, such as Hadoop and Spark. He is a Microsoft Certified Solution Expert (MCSE in Business Intelligence). Sumit regularly engages on social media platforms through his tweets, blogs, and LinkedIn profile, and often gives talks at industry conferences and local user group meetings.
Read more about Sumit Mund

Christina Storm
Christina Storm
View More author details
Right arrow

Advanced data preprocessing


ML Studio also comes with advanced data processing options. The following are some of the common options that are discussed in brief.

Removing outliers

Outliers are data points that are distinctly separate from the rest of the data. Outliers, if present in your dataset, may cause problems by distorting your predictive model that may result in an unreliable prediction of the data. In many cases, it is a good idea to clip or remove the outliers.

ML Studio comes with the Clip Values module, which detects outliers and lets you clip or replace values with a threshold, mean, median, or missing value. By default, it is applied to all the numeric columns, but you can select one or more columns. You can find it by navigating to Data Transformation | Scale and then Reduce in the module palette.

Data normalization

Often, different columns in a dataset come in different scales; for example, you may have a dataset with two columns: age, with values ranging from 15 to 95, and annual...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Microsoft Azure Machine Learning
Published in: Jun 2015Publisher: ISBN-13: 9781784390792

Authors (2)

author image
Sumit Mund

Sumit Mund is a BI/analytics consultant with about a decade of industry experience. He works in his own company, Mund Consulting Ltd., where he is a director and lead consultant. He is an expert in machine learning, predictive analytics, C#, R, and Python programming; he also has an active interest in Artificial Intelligence. He has extensive experience working with most of Microsoft Data Analytics tools and also on Big Data platforms, such as Hadoop and Spark. He is a Microsoft Certified Solution Expert (MCSE in Business Intelligence). Sumit regularly engages on social media platforms through his tweets, blogs, and LinkedIn profile, and often gives talks at industry conferences and local user group meetings.
Read more about Sumit Mund