Reader small image

You're reading from  Machine Learning with Scala Quick Start Guide

Product typeBook
Published inApr 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789345070
Edition1st Edition
Languages
Right arrow
Authors (2):
Md. Rezaul Karim
Md. Rezaul Karim
author image
Md. Rezaul Karim

Md. Rezaul Karim is a researcher, author, and data science enthusiast with a strong computer science background, coupled with 10 years of research and development experience in machine learning, deep learning, and data mining algorithms to solve emerging bioinformatics research problems by making them explainable. He is passionate about applied machine learning, knowledge graphs, and explainable artificial intelligence (XAI). Currently, he is working as a research scientist at Fraunhofer FIT, Germany. He is also a PhD candidate at RWTH Aachen University, Germany. Before joining FIT, he worked as a researcher at the Insight Centre for Data Analytics, Ireland. Previously, he worked as a lead software engineer at Samsung Electronics, Korea.
Read more about Md. Rezaul Karim

Ajay Kumar N
Ajay Kumar N
author image
Ajay Kumar N

Ajay Kumar N has experience in big data, and specializes in cloud computing and various big data frameworks, including Apache Spark and Apache Hadoop. His primary language of choice is Python, but he also has a special interest in functional programming languages such as Scala. He has worked extensively with NumPy, pandas, and scikit-learn, and often contributes to open source projects related to data science and machine learning.
Read more about Ajay Kumar N

View More author details
Right arrow

Random forest for supervised learning

In this section, we'll see how to use RF to solve both regression and classification problems. We'll use DT implementation from the Spark ML package in Scala. Although both GBT and RF are ensembles of trees, the training processes are different. For instance, RF uses the bagging technique to perform the example, while GBT uses boosting. Nevertheless, there are several practical trade-offs between both the ensembles that can pose a dilemma about what to choose. However, RF would be the winner in most of the cases. Here are some justifications:

  • GBTs train one tree at a time, but RF can train multiple trees in parallel. So the training time is lower with RF. However, in some special cases, training and using a smaller number of trees with GBTs is faster and more convenient.
  • RFs are less prone to overfitting. In other words, RFs reduces...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Machine Learning with Scala Quick Start Guide
Published in: Apr 2019Publisher: PacktISBN-13: 9781789345070

Authors (2)

author image
Md. Rezaul Karim

Md. Rezaul Karim is a researcher, author, and data science enthusiast with a strong computer science background, coupled with 10 years of research and development experience in machine learning, deep learning, and data mining algorithms to solve emerging bioinformatics research problems by making them explainable. He is passionate about applied machine learning, knowledge graphs, and explainable artificial intelligence (XAI). Currently, he is working as a research scientist at Fraunhofer FIT, Germany. He is also a PhD candidate at RWTH Aachen University, Germany. Before joining FIT, he worked as a researcher at the Insight Centre for Data Analytics, Ireland. Previously, he worked as a lead software engineer at Samsung Electronics, Korea.
Read more about Md. Rezaul Karim

author image
Ajay Kumar N

Ajay Kumar N has experience in big data, and specializes in cloud computing and various big data frameworks, including Apache Spark and Apache Hadoop. His primary language of choice is Python, but he also has a special interest in functional programming languages such as Scala. He has worked extensively with NumPy, pandas, and scikit-learn, and often contributes to open source projects related to data science and machine learning.
Read more about Ajay Kumar N