Reader small image

You're reading from  Apache Mahout Essentials

Product typeBook
Published inJun 2015
Reading LevelIntermediate
Publisher
ISBN-139781783554997
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Jayani Withanawasam
Jayani Withanawasam
author image
Jayani Withanawasam

Jayani Withanawasam is R&D engineer and a senior software engineer at Zaizi Asia, where she focuses on applying machine learning techniques to provide smart content management solutions. She is currently pursuing an MSc degree in artificial intelligence at the University of Moratuwa, Sri Lanka, and has completed her BE in software engineering (with first class honors) from the University of Westminster, UK. She has more than 6 years of industry experience, and she has worked in areas such as machine learning, natural language processing, and semantic web technologies during her tenure. She is passionate about working with semantic technologies and big data.
Read more about Jayani Withanawasam

Right arrow

How Apache Mahout works?


Let's take a look at the various components of Mahout.

The high-level design

The following table represents the high-level design of a Mahout implementation. Machine learning applications access the API, which provides support for implementing different machine learning techniques, such as clustering, classification, and recommendations.

Also, if the application requires preprocessing (for example, stop word removal and stemming) for text input, it can be achieved with Apache Lucene. Apache Hadoop provides data processing and storage to enable scalable processing.

Also, there will be performance optimizations using Java Collections and the Mahout-Math library. The Mahout-integration library contains utilities such as displaying the data and results.

The distribution

MapReduce is a programming paradigm to enable parallel processing. When it is applied to machine learning, we assign one MapReduce engine to one algorithm (for each MapReduce engine, one master is assigned).

Input is provided as Hadoop sequence files, which consist of binary key-value pairs. The master node manages the mappers and reducers. Once the input is represented as sequence files and sent to the master, it splits data and assigns the data to different mappers, which are other nodes. Then, it collects the intermediate outcome from mappers and sends them to related reducers for further processing. Lastly, the final outcome is generated.

Previous PageNext Page
You have been reading a chapter from
Apache Mahout Essentials
Published in: Jun 2015Publisher: ISBN-13: 9781783554997
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Jayani Withanawasam

Jayani Withanawasam is R&D engineer and a senior software engineer at Zaizi Asia, where she focuses on applying machine learning techniques to provide smart content management solutions. She is currently pursuing an MSc degree in artificial intelligence at the University of Moratuwa, Sri Lanka, and has completed her BE in software engineering (with first class honors) from the University of Westminster, UK. She has more than 6 years of industry experience, and she has worked in areas such as machine learning, natural language processing, and semantic web technologies during her tenure. She is passionate about working with semantic technologies and big data.
Read more about Jayani Withanawasam