Reader small image

You're reading from  Big Data Analytics with Java

Product typeBook
Published inJul 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781787288980
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
RAJAT MEHTA
RAJAT MEHTA
author image
RAJAT MEHTA

The author is a VP (Technical Architect) in technology in JP Morgan Chase in New York. The author is a sun certified java developer and has worked on java related technologies for more than 16 years. Current role for the past few years heavily involves the usage of bid data stack and running analytics on it. Author is also a contributor in various open source projects that are available on his GitHub repository and is also a frequent write on dev magazines.
Read more about RAJAT MEHTA

Right arrow

Chapter 8. Ensembling on Big Data

Have you used a Kinect while playing video games on Microsoft Xbox? It's so smooth how it detects your motion while you are playing games. It enables users to control and interact with their game without using any external device like a game controller. But how does it do that? How does the device detect the user's motion from the camera and predict the command that the motion suggested? Some users on different forums have claimed that a powerful random forest machine learning algorithm runs behind it and the link for the same is https://www.quora.com/Why-did-Microsoft-decide-to-use-Random-Forests-in-the-Kinect. Though I am myself not sure how true this claim is, this example at least demonstrates at what scale and level this powerful machine learning algorithm has the potential to be used. Random forests are perhaps one of the best machine learning algorithms because of the accuracy they bring in the predicted results and because of their implicit feature...

Ensembling


Imagine that a group of friends are deciding which movie they want to see together. For this, they select their movie of choice from a set of, say, five or six movies. At the end, all their votes are collected and read. The movie with the maximum votes is picked and watched. What just happened is a real-life example of the ensembling approach. Basically, multiple entities act on a problem and give their selection out of a collection of discrete choices (in the case of a classification problem). The selection that was suggested by the maximum number of entities is chosen as the predicted choice.

This explanation was a general approach to ensembling. From the perspective of machine learning, it just means that multiple machine learning programs act on a problem that can be either of type classification or regression. The output from each machine learning algorithm is collected. The results from all the algorithms are then analyzed with different approaches like voting, averaging...

Summary


In this chapter, we learnt about a very popular approach called ensembling in machine learning. We learnt how a group of decision trees can be parallelly built, trained, and run on a dataset in the case of random forests. Finally, their results can be combined by techniques like voting for classification to figure out the best voted classification or averaging the results in case of regression. We also learnt how a group of weak decision tree learners or models can be sequentially trained one after the other with every step boosting the results of the previous model in the workflow by minimizing an error function using techniques such as gradient descent. We also saw how powerful these approaches are and saw their advantages over other simple approaches. We also ran the two ensembling approaches on a real-world dataset provided by Lending Club and analyzed the accuracy of our results.

In the next chapter, we will cover the concept of clustering using the k-means algorithm. We will...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Big Data Analytics with Java
Published in: Jul 2017Publisher: PacktISBN-13: 9781787288980
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
RAJAT MEHTA

The author is a VP (Technical Architect) in technology in JP Morgan Chase in New York. The author is a sun certified java developer and has worked on java related technologies for more than 16 years. Current role for the past few years heavily involves the usage of bid data stack and running analytics on it. Author is also a contributor in various open source projects that are available on his GitHub repository and is also a frequent write on dev magazines.
Read more about RAJAT MEHTA