Apache Mahout Cookbook

Whether you’re a beginner or advanced user of Apache Mahout, this cookbook will expand your skills through a host of recipes, illustrations, and real-world examples. Your data mining will take on a totally new level of capability.
Preview in Mapt

Apache Mahout Cookbook

Piero Giacomelli

1 customer reviews
Whether you’re a beginner or advanced user of Apache Mahout, this cookbook will expand your skills through a host of recipes, illustrations, and real-world examples. Your data mining will take on a totally new level of capability.
Mapt Subscription
FREE
$29.99/m after trial
eBook
$18.90
RRP $26.99
Save 29%
Print + eBook
$44.99
RRP $44.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$18.90
$44.99
$29.99p/m after trial
RRP $26.99
RRP $44.99
Subscription
eBook
Print + eBook
Start 30 Day Trial

Frequently bought together


Apache Mahout Cookbook Book Cover
Apache Mahout Cookbook
$ 26.99
$ 18.90
Apache Mahout Clustering Designs Book Cover
Apache Mahout Clustering Designs
$ 27.99
$ 19.60
Buy 2 for $35.00
Save $19.98
Add to Cart
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 

Book Details

ISBN 139781849518024
Paperback250 pages

Book Description

The rise of the Internet and social networks has created a new demand for software that can analyze large datasets that can scale up to 10 billion rows. Apache Hadoop has been created to handle such heavy computational tasks. Mahout gained recognition for providing data mining classification algorithms that can be used with such kind of datasets.

"Apache Mahout Cookbook" provides a fresh, scope-oriented approach to the Mahout world for both beginners as well as advanced users. The book gives an insight on how to write different data mining algorithms to be used in the Hadoop environment and choose the best one suiting the task in hand.

"Apache Mahout Cookbook" looks at the various Mahout algorithms available, and gives the reader a fresh solution-centered approach on how to solve different data mining tasks. The recipes start easy but get progressively complicated. A step-by-step approach will guide the developer in the different tasks involved in mining a huge dataset. You will also learn how to code your Mahout’s data mining algorithm to determine the best one for a particular task. Coupled with this, a whole chapter is dedicated to loading data into Mahout from an external RDMS system. A lot of attention has also been put on using your data mining algorithm inside your code so as to be able to use it in an Hadoop environment. Theoretical aspects of the algorithms are covered for information purposes, but every chapter is written to allow the developer to get into the code as quickly and smoothly as possible. This means that with every recipe, the book provides the code for reusing it using Maven as well as the Maven Mahout source code.

By the end of this book you will be able to code your procedure to do various data mining tasks with different algorithms and to evaluate and choose the best ones for your tasks.

Table of Contents

Chapter 1: Mahout is Not So Difficult!
Introduction
Installing Java and Hadoop
Setting up a Maven and NetBeans development environment
Coding a basic recommender
Chapter 2: Using Sequence Files – When and Why?
Introduction
Creating sequence files from the command line
Generating sequence files from code
Reading sequence files from code
Chapter 3: Integrating Mahout with an External Datasource
Introduction
Importing an external datasource into HDFS
Exporting data from HDFS to RDBMS
Creating a Sqoop job to deal with RDBMS
Importing data using Sqoop API
Chapter 4: Implementing the Naϊve Bayes classifier in Mahout
Introduction
Using the Mahout text classifier to demonstrate the basic use case
Using the Naïve Bayes classifier from code
Using Complementary Naïve Bayes from the command line
Coding the Complementary Naïve Bayes classifier
Chapter 5: Stock Market Forecasting with Mahout
Introduction
Preparing data for logistic regression
Predicting GOOG movements using logistic regression
Using adaptive logistic regression in Java code
Using logistic regression on large-scale datasets
Using Random Forest to forecast market movements
Chapter 6: Canopy Clustering in Mahout
Introduction
Command-line-based Canopy clustering
Command-line-based Canopy clustering with parameters
Using Canopy clustering from the Java code
Coding your own cluster distance evaluation
Chapter 7: Spectral Clustering in Mahout
Introduction
Using EigenCuts from the command line
Using EigenCuts from Java code
Creating a similarity matrix from raw data
Using spectral clustering with image segmentation
Chapter 8: K-means Clustering
Introduction
Using K-means clustering from Java code
Clustering traffic accidents using K-means
K-means clustering using MapReduce
Using K-means clustering from the command line
Chapter 9: Soft Computing with Mahout
Introduction
Frequent Pattern Mining with Mahout
Creating metrics for Frequent Pattern Mining
Using Frequent Pattern Mining from Java code
Using LDA for creating topics
Chapter 10: Implementing the Genetic Algorithm in Mahout
Introduction
Setting up Mahout for using GA
Using the genetic algorithm over graphs
Using the genetic algorithm from Java code

What You Will Learn

  • Configure from scratch a full development environment for Mahout with NetBeans and Maven
  • Handle sequencefiles for better performance
  • Query and store results into an RDBMS system with SQOOP
  • Use logistic regression to predict the next step
  • Understand text mining of raw data with Naïve Bayes
  • Create and understand clusters
  • Customize Mahout to evaluate different cluster algorithms
  • Use the mapreduce approach to solve real world data mining problems

Authors

Table of Contents

Chapter 1: Mahout is Not So Difficult!
Introduction
Installing Java and Hadoop
Setting up a Maven and NetBeans development environment
Coding a basic recommender
Chapter 2: Using Sequence Files – When and Why?
Introduction
Creating sequence files from the command line
Generating sequence files from code
Reading sequence files from code
Chapter 3: Integrating Mahout with an External Datasource
Introduction
Importing an external datasource into HDFS
Exporting data from HDFS to RDBMS
Creating a Sqoop job to deal with RDBMS
Importing data using Sqoop API
Chapter 4: Implementing the Naϊve Bayes classifier in Mahout
Introduction
Using the Mahout text classifier to demonstrate the basic use case
Using the Naïve Bayes classifier from code
Using Complementary Naïve Bayes from the command line
Coding the Complementary Naïve Bayes classifier
Chapter 5: Stock Market Forecasting with Mahout
Introduction
Preparing data for logistic regression
Predicting GOOG movements using logistic regression
Using adaptive logistic regression in Java code
Using logistic regression on large-scale datasets
Using Random Forest to forecast market movements
Chapter 6: Canopy Clustering in Mahout
Introduction
Command-line-based Canopy clustering
Command-line-based Canopy clustering with parameters
Using Canopy clustering from the Java code
Coding your own cluster distance evaluation
Chapter 7: Spectral Clustering in Mahout
Introduction
Using EigenCuts from the command line
Using EigenCuts from Java code
Creating a similarity matrix from raw data
Using spectral clustering with image segmentation
Chapter 8: K-means Clustering
Introduction
Using K-means clustering from Java code
Clustering traffic accidents using K-means
K-means clustering using MapReduce
Using K-means clustering from the command line
Chapter 9: Soft Computing with Mahout
Introduction
Frequent Pattern Mining with Mahout
Creating metrics for Frequent Pattern Mining
Using Frequent Pattern Mining from Java code
Using LDA for creating topics
Chapter 10: Implementing the Genetic Algorithm in Mahout
Introduction
Setting up Mahout for using GA
Using the genetic algorithm over graphs
Using the genetic algorithm from Java code

Book Details

ISBN 139781849518024
Paperback250 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

Machine Learning with Spark Book Cover
Machine Learning with Spark
$ 29.99
$ 3.00
Big Data Analytics with R and Hadoop Book Cover
Big Data Analytics with R and Hadoop
$ 29.99
$ 21.00
Building Machine Learning Systems with Python Book Cover
Building Machine Learning Systems with Python
$ 29.99
$ 6.00
Machine Learning with R Book Cover
Machine Learning with R
$ 32.99
$ 23.10
Practical Data Analysis Book Cover
Practical Data Analysis
$ 29.99
$ 21.00
Hadoop Real-World Solutions Cookbook Book Cover
Hadoop Real-World Solutions Cookbook
$ 29.99
$ 21.00