Reader small image

You're reading from  Machine Learning with Apache Spark Quick Start Guide

Product typeBook
Published inDec 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789346565
Edition1st Edition
Languages
Right arrow
Author (1)
Jillur Quddus
Jillur Quddus
author image
Jillur Quddus

Jillur Quddus is a lead technical architect, polyglot software engineer and data scientist with over 10 years of hands-on experience in architecting and engineering distributed, scalable, high-performance, and secure solutions used to combat serious organized crime, cybercrime, and fraud. Jillur has extensive experience of working within central government, intelligence, law enforcement, and banking, and has worked across the world including in Japan, Singapore, Malaysia, Hong Kong, and New Zealand. Jillur is both the founder of Keisan, a UK-based company specializing in open source distributed technologies and machine learning, and the lead technical architect at Methods, the leading digital transformation partner for the UK public sector.
Read more about Jillur Quddus

Right arrow

Artificial Intelligence and Machine Learning

In this chapter, we will define what we mean by artificial intelligence, machine learning, and cognitive computing. We will study common classes of algorithms within the field of machine learning and its broader applications, including the following:

  • Supervised learning
  • Unsupervised learning
  • Reinforced learning
  • Deep learning
  • Natural language processing
  • Cognitive computing
  • Apache Spark's machine learning library, MLlib, and how it can be used to implement these algorithms within machine learning pipelines

Artificial intelligence

Artificial intelligence is a broad term given to the theory and application of machines that exhibit intelligent behavior. Artificial intelligence encompasses many applied fields of study, including machine learning and subsequent deep learning, as illustrated in Figure 3.1:

Figure 3.1: Artificial intelligence overview

Machine learning

Machine learning is an applied field of study within the broader subject of artificial intelligence that focuses on learning from data by detecting patterns, trends, and relationships in order to make predictions and ultimately deliver actionable insights to help decision making. Machine learning models can be split into three main types: supervised learning, unsupervised learning, and reinforced learning.

Supervised learning

In supervised learning, the goal is to learn a function that is able to map inputs x to outputs y given a labeled set of input-output pairs D, where D is referred to as the training set and N is the number of input-output pairs in the training set:

In simple applications of supervised...

Deep learning

In deep learning, a subfield within the broader field of machine learning, the goal is still to learn a function but by employing an architecture that mimics the neural architecture found in the human brain in order to learn from experience using a hierarchy of concepts or representations. This enables us to develop more complex and powerful functions in order to predict outcomes better.

Many machine learning models employ a two-layer architecture, where some sort of function maps an input to an output. However, in the human brain, multiple layers of processing are found, in other words, a neural network. By mimicking natural neural networks, artificial neural networks (ANN) offer the ability to learn complex non-linear representations with no restrictions on the input features and are ideally suited to a wide variety of exciting use cases, including speech, image...

NLP

NLP refers to a family of computer science disciplines, including machine learning, linguistics, information engineering, and data management, used to analyze and understand natural languages, including speech and text. NLP can be applied to a wide variety of real-world use cases, including the following:

  • Named entity recognition (NER): Automatically identifying and parsing entities from text, including people, physical addresses, and email addresses
  • Relationship extraction: Automatically identifying the types of relationships between parsed entities
  • Machine translation and transcription: Automatically translating from one natural language to another, for example, from English to Chinese
  • Searching: Automatically searching across vast collections of structured, semi-structured, and unstructured documents and objects in order to fulfill a natural language query
  • Speech recognition...

Cognitive computing

Similar to NLP, cognitive computing actually refers to a family of computer science disciplines, including machine learning, deep learning, NLP, statistics, business intelligence, data engineering, and information retrieval that, together, are used to develop systems that simulate human thought processes. Real-world implementations of cognitive systems include chatbots and virtual assistants (such as Amazon Alexa, Google Assistant, and Microsoft Cortana) that understand natural human language and provide contextual conversation interfaces, including question-answering, personalized recommendations, and information retrieval systems.

Machine learning pipelines in Apache Spark

To end this chapter, we will take a look at how Apache Spark can be used to implement the algorithms that we have previously discussed by taking a look at how its machine learning library, MLlib, works under the hood. MLlib provides a suite of tools designed to make machine learning accessible, scalable, and easy to deploy.

Note that as of Spark 2.0, the MLlib RDD-based API is in maintenance mode. The examples in this book will use the DataFrame-based API, which is now the primary API for MLlib. For more information, please visit https://spark.apache.org/docs/latest/ml-guide.html.

At a high level, the typical implementation of machine learning models can be thought of as an ordered pipeline of algorithms, as follows:

  1. Feature extraction, transformation, and selection
  2. Train a predictive model based on these feature vectors and labels
  3. Make...

Summary

In this chapter, we have defined what is meant by artificial intelligence, machine learning, and cognitive computing. We have explored common machine learning algorithms at a high level, including deep learning and ANNs, as well as taking a look at Apache Spark's machine learning library, MLlib, and how it can be used to implement these algorithms within machine learning pipelines.

In the next chapter, we will start developing, deploying, and testing supervised machine learning models applied to real-world use cases using PySpark and MLlib.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning with Apache Spark Quick Start Guide
Published in: Dec 2018Publisher: PacktISBN-13: 9781789346565
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Jillur Quddus

Jillur Quddus is a lead technical architect, polyglot software engineer and data scientist with over 10 years of hands-on experience in architecting and engineering distributed, scalable, high-performance, and secure solutions used to combat serious organized crime, cybercrime, and fraud. Jillur has extensive experience of working within central government, intelligence, law enforcement, and banking, and has worked across the world including in Japan, Singapore, Malaysia, Hong Kong, and New Zealand. Jillur is both the founder of Keisan, a UK-based company specializing in open source distributed technologies and machine learning, and the lead technical architect at Methods, the leading digital transformation partner for the UK public sector.
Read more about Jillur Quddus