Reader small image

You're reading from  F# for Machine Learning Essentials

Product typeBook
Published inFeb 2016
Reading LevelExpert
Publisher
ISBN-139781783989348
Edition1st Edition
Languages
Right arrow
Author (1)
Sudipta Mukherjee
Sudipta Mukherjee
author image
Sudipta Mukherjee

Sudipta Mukherjee was born in Kolkata and migrated to Bangalore. He is an electronics engineer by education and a computer engineer/scientist by profession and passion. He graduated in 2004 with a degree in electronics and communication engineering. He has a keen interest in data structure, algorithms, text processing, natural language processing tools development, programming languages, and machine learning at large. His first book on Data Structure using C has been received quite well. Parts of the book can be read on Google Books. The book was also translated into simplified Chinese, available from Amazon.cn. This is Sudipta's second book with Packt Publishing. His first book, .NET 4.0 Generics , was also received very well. During the last few years, he has been hooked to the functional programming style. His book on functional programming, Thinking in LINQ, was released in 2014. He lives in Bangalore with his wife and son. Sudipta can be reached via e-mail at sudipto80@yahoo.com and via Twitter at @samthecoder.
Read more about Sudipta Mukherjee

Right arrow

Grubb's test for multivariate data using Mahalanobis distance


Grubb's test can be used for multivariate data by transforming multivariate data to univariate data using the following transformation:

Where is the covariance matrix of .

The following code finds these y-squared values from a given :

The following are the functions to calculate the covariance matrix:

The following is the input given:

This produces the following output:

ys = [([2.0; 2.0], -48066176.91); ([2.0; 5.0], -48066176.91);
 ([6.0; 5.0], -2584692.113); ([100.0; 345.0], -2.097348892e+12)]

Now, Grubb's test for univariate data can be applied on top of these generated values:

[-48066176.91; -48066176.91; -2584692.113; -2.097348892e+12]

The z scores of these values are:

[0.5773335755; 0.5773335755; 0.5773836562; 1.732050807]

As you can see, the z-score corresponding to the last entry is considerably bigger than the z-score of the rest. This means the last element in the multivariate dataset (which is [100;345]) is anomalous.

Imagine...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
F# for Machine Learning Essentials
Published in: Feb 2016Publisher: ISBN-13: 9781783989348

Author (1)

author image
Sudipta Mukherjee

Sudipta Mukherjee was born in Kolkata and migrated to Bangalore. He is an electronics engineer by education and a computer engineer/scientist by profession and passion. He graduated in 2004 with a degree in electronics and communication engineering. He has a keen interest in data structure, algorithms, text processing, natural language processing tools development, programming languages, and machine learning at large. His first book on Data Structure using C has been received quite well. Parts of the book can be read on Google Books. The book was also translated into simplified Chinese, available from Amazon.cn. This is Sudipta's second book with Packt Publishing. His first book, .NET 4.0 Generics , was also received very well. During the last few years, he has been hooked to the functional programming style. His book on functional programming, Thinking in LINQ, was released in 2014. He lives in Bangalore with his wife and son. Sudipta can be reached via e-mail at sudipto80@yahoo.com and via Twitter at @samthecoder.
Read more about Sudipta Mukherjee