Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Learning Apache Mahout

You're reading from  Learning Apache Mahout

Product type Book
Published in Mar 2015
Publisher
ISBN-13 9781783555215
Pages 250 pages
Edition 1st Edition
Languages

Preface

Learning Apache Mahout is aimed at providing a strong foundation in machine learning using Mahout. This book is ideal for learning the core concepts of machine learning and the basics of Mahout. This book will go from the basics of Mahout and machine learning, to feature engineering and the implementation of various machine learning algorithms in Mahout. Algorithm usage examples will be explained using both the Mahout command line and its Java API. We will conclude the book with two chapters of end-to-end case studies. Ideally, chapters 1, 2 and 3 should be read sequentially, chapters 4 to 8 in any order, and chapters 9 and 10 after chapter 1 to 8 have been completed.

What this book covers

Chapter 1, Introduction to Mahout, covers the setup of the learning environment, installation, and the configuration of the various tools required for this book. It will discuss the need for a machine learning library such as Mahout and introduce the basics of Mahout with command line and code examples.

Chapter 2, Core Concepts in Machine Learning, covers the fundamental concepts in machine learning. It will discuss the important steps involved in a machine learning project, such as data processing, model training, and efficacy, and provides an intuitive explanation of different algorithms.

Chapter 3, Feature Engineering, covers the most important phase of a machine learning project, feature extraction and representation. It will discuss common data preprocessing tasks, manual and automated feature transformation, feature selection, and dimensionality reduction.

Chapter 4, Classification with Mahout, covers classification algorithms implemented in Mahout. It will discuss the important phases of building a classifier, such as preprocessing data, creating a train and test set, and measuring model efficacy. The algorithms that will be covered are logistic regression, random forest, and naïve Bayes.

Chapter 5, Frequent Pattern Mining and Topic Modeling, covers algorithms for frequent pattern mining and topic modeling. This chapter will provide an intuitive explanation of the algorithms and include both command line and code examples, while also providing practical examples.

Chapter 6, Recommendation with Mahout, covers algorithms to build recommender systems in Mahout. It will discuss item-based and user-based recommenders. This chapter will provide an intuitive explanation of the algorithms and include both command line and code examples, while also providing practical examples.

Chapter 7, Clustering with Mahout, covers algorithms to perform clustering in Mahout. It will discuss algorithms such as k-means, fuzzy k-means, streaming k-means, and so on. This chapter will provide an intuitive explanation of the algorithm and include both command line and code examples, while also providing practical examples.

Chapter 8, New Paradigm in Mahout, covers the porting of Mahout on top of Apache Spark. It will discuss the installation and configuration of Mahout and Spark, explain the important concepts of Spark and Mahout binding, and cover some basic examples.

Chapter 9, Case Study – Churn Analytics and Customer Segmentation, covers the steps involved in a machine learning project from start to finish. It will discuss all the important steps that need to be performed for a successful machine learning project. It will take a couple of use cases from customer analytics, churn analytics, and customer segmentation, to walk through the process.

Chapter 10, Case Study – Text Analytics, covers the steps involved in a text analytics project. It will discuss the vector space model of representing text, text clustering, and classification.

What you need for this book

For this book, you will need the following software:

  • Java 1.6 or higher

  • Maven 2.0 or higher

  • Hadoop 1.2.1

  • Eclipse with Maven plug-in

  • Mahout 0.9

  • Python

  • R

We will cover every software needed for this book in the corresponding chapters. All the examples in the book have been coded using the Ubuntu 12.04 LTS release.

Who this book is for

If you are a Java developer and want to use Mahout and machine learning to solve Big Data Analytics use cases, then this book is ideal for you. This book is good for self-learners who want to learn the fundamental concepts of machine learning and the practical implementations of Mahout. Some familiarity with shell scripts, Python, and R is assumed, but no prior experience is required.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The output of the sed command is saved to the new file adult.data.csv."

Any command-line input or output is written as follows:

sudo pip install pandas

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "Once the search results are displayed hit Install and follow the instructions."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to , and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at if you are having a problem with any aspect of the book, and we will do our best to address it.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Learning Apache Mahout
Published in: Mar 2015 Publisher: ISBN-13: 9781783555215
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}