Reader small image

You're reading from  Machine Learning with Apache Spark Quick Start Guide

Product typeBook
Published inDec 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789346565
Edition1st Edition
Languages
Right arrow
Author (1)
Jillur Quddus
Jillur Quddus
author image
Jillur Quddus

Jillur Quddus is a lead technical architect, polyglot software engineer and data scientist with over 10 years of hands-on experience in architecting and engineering distributed, scalable, high-performance, and secure solutions used to combat serious organized crime, cybercrime, and fraud. Jillur has extensive experience of working within central government, intelligence, law enforcement, and banking, and has worked across the world including in Japan, Singapore, Malaysia, Hong Kong, and New Zealand. Jillur is both the founder of Keisan, a UK-based company specializing in open source distributed technologies and machine learning, and the lead technical architect at Methods, the leading digital transformation partner for the UK public sector.
Read more about Jillur Quddus

Right arrow

What this book covers

Chapter 1, The Big Data Ecosystem, provides an introduction to the current big data ecosystem. With the multitude of on-premises and cloud-based technologies, tools, services, libraries, and frameworks available in the big data, artificial intelligence, and machine learning space (and growing every day!), it is vitally important to understand the logical function of each layer within the big data ecosystem so that we may understand how they integrate with each other in order to ultimately architect and engineer end-to-end data intelligence and machine learning pipelines. This chapter also provides a logical introduction to Apache Spark within the context of the wider big data ecosystem.

Chapter 2, Setting Up a Local Development Environment, provides a detailed and hands-on guide to installing, configuring, and deploying a local Linux-based development environment on your personal desktop, laptop, or cloud-based infrastructure. You will learn how to install and configure all the software services required for this book in one self-contained location, including installing and configuring prerequisite programming languages (Java JDK 8 and Python 3), a distributed data processing and analytics engine (Apache Spark 2.3), a distributed real-time streaming platform (Apache Kafka 2.0), and a web-based notebook for interactive data insights and analytics (Jupyter Notebook).

Chapter 3, Artificial Intelligence and Machine Learning, provides a concise theoretical summary of the various applied subjects that fall under the artificial intelligence field of study, including machine learning, deep learning, and cognitive computing. This chapter also provides a logical introduction into how end-to-end data intelligence and machine learning pipelines may be architected and engineered using Apache Spark and its machine learning library, MLlib.

Chapter 4, Supervised Learning Using Apache Spark, provides a hands-on guide to engineering, training, validating, and interpreting the results of supervised machine learning algorithms using Apache Spark through real-world use-cases. The chapter describes and implements commonly used classification and regression techniques including linear regression, logistic regression, classification and regression trees (CART), and random forests.

Chapter 5, Unsupervised Learning Using Apache Spark, provides a hands-on guide to engineering, training, validating, and interpreting the results of unsupervised machine learning algorithms using Apache Spark through real-world use-cases. The chapter describes and implements commonly-used unsupervised techniques including hierarchical clustering, K-means clustering, and dimensionality reduction via Principal Component Analysis (PCA).

Chapter 6, Natural Language Processing Using Apache Spark, provides a hands-on guide to engineering natural language processing (NLP) pipelines using Apache Spark through real-world use-cases. The chapter describes and implements commonly used NLP techniques including tokenisation, stemming, lemmatization, normalization, and other feature transformers, and feature extractors such as the bag of words and Term Frequency-Inverse Document Frequency (TF-IDF) algorithms.

Chapter 7, Deep Learning Using Apache Spark, provides a hands-on exploration of the exciting and cutting-edge world of deep learning! The chapter uses third-party deep learning libraries in conjunction with Apache Spark to train and interpret the results of Artificial Neural Networks (ANNs) including Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) applied to real-world use-cases.

Chapter 8, Real-Time Machine Learning Using Apache Spark, extends the deployment of machine learning models beyond batch processing in order to learn from data, make predictions, and identify trends in real-time! The chapter provides a hands-on guide to engineering and deploying real-time stream processing and machine learning pipelines using Apache Spark and Apache Kafka to transport, transform, and analyze data streams as they are being created around the world.

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Machine Learning with Apache Spark Quick Start Guide
Published in: Dec 2018Publisher: PacktISBN-13: 9781789346565

Author (1)

author image
Jillur Quddus

Jillur Quddus is a lead technical architect, polyglot software engineer and data scientist with over 10 years of hands-on experience in architecting and engineering distributed, scalable, high-performance, and secure solutions used to combat serious organized crime, cybercrime, and fraud. Jillur has extensive experience of working within central government, intelligence, law enforcement, and banking, and has worked across the world including in Japan, Singapore, Malaysia, Hong Kong, and New Zealand. Jillur is both the founder of Keisan, a UK-based company specializing in open source distributed technologies and machine learning, and the lead technical architect at Methods, the leading digital transformation partner for the UK public sector.
Read more about Jillur Quddus