Reader small image

You're reading from  50 Algorithms Every Programmer Should Know - Second Edition

Product typeBook
Published inSep 2023
PublisherPackt
ISBN-139781803247762
Edition2nd Edition
Right arrow
Author (1)
Imran Ahmad
Imran Ahmad
author image
Imran Ahmad

Imran Ahmad has been a part of cutting-edge research about algorithms and machine learning for many years. He completed his PhD in 2010, in which he proposed a new linear programming-based algorithm that can be used to optimally assign resources in a large-scale cloud computing environment. In 2017, Imran developed a real-time analytics framework named StreamSensing. He has since authored multiple research papers that use StreamSensing to process multimedia data for various machine learning algorithms. Imran is currently working at Advanced Analytics Solution Center (A2SC) at the Canadian Federal Government as a data scientist. He is using machine learning algorithms for critical use cases. Imran is a visiting professor at Carleton University, Ottawa. He has also been teaching for Google and Learning Tree for the last few years.
Read more about Imran Ahmad

Right arrow

Cosine distance

Euclidean and Manhattan distance measures do not perform well in high-dimensional space. In a high-dimensional problem space, cosine distance more accurately reflects the closeness between two data points in a multidimensional problem space. The cosine distance measure is calculated by measuring the cosine angle created by two points connected to a reference point. If the data points are close, then the angle will be narrow, irrespective of the dimensions they have. On the other hand, if they are far away, then the angle will be large:

Chart Description automatically generated

Textual data can almost be considered a highly dimensional space. As the cosine distance measure works very well with h-dimensional spaces, it is a good choice when dealing with textual data.Note that in the preceding figure, the cosine of the angle between A(2,5) and B(4.4) is the cosine distance. The reference between these points is the origin—that is, X(0,0). But in reality, any point in the problem space...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
50 Algorithms Every Programmer Should Know - Second Edition
Published in: Sep 2023Publisher: PacktISBN-13: 9781803247762

Author (1)

author image
Imran Ahmad

Imran Ahmad has been a part of cutting-edge research about algorithms and machine learning for many years. He completed his PhD in 2010, in which he proposed a new linear programming-based algorithm that can be used to optimally assign resources in a large-scale cloud computing environment. In 2017, Imran developed a real-time analytics framework named StreamSensing. He has since authored multiple research papers that use StreamSensing to process multimedia data for various machine learning algorithms. Imran is currently working at Advanced Analytics Solution Center (A2SC) at the Canadian Federal Government as a data scientist. He is using machine learning algorithms for critical use cases. Imran is a visiting professor at Carleton University, Ottawa. He has also been teaching for Google and Learning Tree for the last few years.
Read more about Imran Ahmad