Reader small image

You're reading from  Learning Bayesian Models with R

Product typeBook
Published inOct 2015
Reading LevelBeginner
PublisherPackt
ISBN-139781783987603
Edition1st Edition
Languages
Right arrow
Author (1)
Hari Manassery Koduvely
Hari Manassery Koduvely
author image
Hari Manassery Koduvely

Dr. Hari M. Koduvely is an experienced data scientist working at the Samsung R&D Institute in Bangalore, India. He has a PhD in statistical physics from the Tata Institute of Fundamental Research, Mumbai, India, and post-doctoral experience from the Weizmann Institute, Israel, and Georgia Tech, USA. Prior to joining Samsung, the author has worked for Amazon and Infosys Technologies, developing machine learning-based applications for their products and platforms. He also has several publications on Bayesian inference and its applications in areas such as recommendation systems and predictive health monitoring. His current interest is in developing large-scale machine learning methods, particularly for natural language understanding.
Read more about Hari Manassery Koduvely

Right arrow

Spark – in-memory distributed computing


One of the issues with Hadoop is that after a MapReduce operation, the resulting files are written to the hard disk. Therefore, when there is a large data processing operation, there would be many read and write operations on the hard disk, which makes processing in Hadoop very slow. Moreover, the network latency, which is the time required to shuffle data between different nodes, also contributes to this problem. Another disadvantage is that one cannot make real-time queries from the files stored in HDFS. For machine learning problems, during training phase, the MapReduce will not persist over iterations. All this makes Hadoop not an ideal platform for machine learning.

A solution to this problem was invented at Berkeley University's AMP Lab in 2009. This came out of the PhD work of Matei Zaharia, a Romanian born computer scientist. His paper Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing (reference 4 in...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Learning Bayesian Models with R
Published in: Oct 2015Publisher: PacktISBN-13: 9781783987603

Author (1)

author image
Hari Manassery Koduvely

Dr. Hari M. Koduvely is an experienced data scientist working at the Samsung R&D Institute in Bangalore, India. He has a PhD in statistical physics from the Tata Institute of Fundamental Research, Mumbai, India, and post-doctoral experience from the Weizmann Institute, Israel, and Georgia Tech, USA. Prior to joining Samsung, the author has worked for Amazon and Infosys Technologies, developing machine learning-based applications for their products and platforms. He also has several publications on Bayesian inference and its applications in areas such as recommendation systems and predictive health monitoring. His current interest is in developing large-scale machine learning methods, particularly for natural language understanding.
Read more about Hari Manassery Koduvely