Have you used a Kinect while playing video games on Microsoft Xbox? It's so smooth how it detects your motion while you are playing games. It enables users to control and interact with their game without using any external device like a game controller. But how does it do that? How does the device detect the user's motion from the camera and predict the command that the motion suggested? Some users on different forums have claimed that a powerful random forest machine learning algorithm runs behind it and the link for the same is https://www.quora.com/Why-did-Microsoft-decide-to-use-Random-Forests-in-the-Kinect. Though I am myself not sure how true this claim is, this example at least demonstrates at what scale and level this powerful machine learning algorithm has the potential to be used. Random forests are perhaps one of the best machine learning algorithms because of the accuracy they bring in the predicted results and because of their implicit feature...
You're reading from Big Data Analytics with Java
Imagine that a group of friends are deciding which movie they want to see together. For this, they select their movie of choice from a set of, say, five or six movies. At the end, all their votes are collected and read. The movie with the maximum votes is picked and watched. What just happened is a real-life example of the ensembling approach. Basically, multiple entities act on a problem and give their selection out of a collection of discrete choices (in the case of a classification problem). The selection that was suggested by the maximum number of entities is chosen as the predicted choice.
This explanation was a general approach to ensembling. From the perspective of machine learning, it just means that multiple machine learning programs act on a problem that can be either of type classification or regression. The output from each machine learning algorithm is collected. The results from all the algorithms are then analyzed with different approaches like voting, averaging...
In this chapter, we learnt about a very popular approach called ensembling in machine learning. We learnt how a group of decision trees can be parallelly built, trained, and run on a dataset in the case of random forests. Finally, their results can be combined by techniques like voting for classification to figure out the best voted classification or averaging the results in case of regression. We also learnt how a group of weak decision tree learners or models can be sequentially trained one after the other with every step boosting the results of the previous model in the workflow by minimizing an error function using techniques such as gradient descent. We also saw how powerful these approaches are and saw their advantages over other simple approaches. We also ran the two ensembling approaches on a real-world dataset provided by Lending Club and analyzed the accuracy of our results.
In the next chapter, we will cover the concept of clustering using the k-means algorithm. We will...