Packt+ | Advance your knowledge in tech

You're reading from Apache Spark 2.x Machine Learning Cookbook

Product typeBook

Published inSep 2017

Reading LevelIntermediate

PublisherPackt

ISBN-139781783551606

Edition1st Edition

Languages

Scala

Tools

Apache Spark

Concepts

Machine Learning

Authors (5):

Mohammed Guller

Siamak Amirghodsi

Shuen Mei

Meenakshi Rajendran

Broderick Hall

View More author details

Chapter 4. Common Recipes for Implementing a Robust Machine Learning System

In this chapter, we will cover:

Spark's basic statistical API to help you build your own algorithms
ML pipelines for real-life machine learning applications
Normalizing data with Spark
Splitting data for training and testing
Common operations with the new Dataset API
Creating and using RDD versus DataFrame versus Dataset from a text file in Spark 2.0
LabeledPoint data structure for Spark ML
Getting access to Spark cluster in Spark 2.0+
Getting access to Spark cluster pre-Spark 2.0
Getting access to SparkContext vis-a-vis SparkSession object in Spark 2.0
New model export and PMML markup in Spark 2.0
Regression model evaluation using Spark 2.0
Binary classification model evaluation using Spark 2.0
Multilabel classification model evaluation using Spark 2.0
Multiclass classification model evaluation using Spark 2.0
Using the Scala Breeze library to do graphics in Spark 2.0

Introduction

In every line of business ranging from running a small business to creating and managing a mission critical application, there are a number of tasks that are common and need to be included as a part of almost every workflow that is required during the course of executing the functions. This is true even for building robust machine learning systems. In Spark machine learning, some of these tasks range from splitting the data for model development (train, test, validate) to normalizing input feature vector data to creating ML pipelines via the Spark API. We provide a set of recipes in this chapter to enable the reader to think about what is actually required to implement an end-to-end machine learning system.

This chapter attempts to demonstrate a number of common tasks which are present in any robust Spark machine learning system implementation. To avoid redundant references these common tasks in every recipe covered in this book, we have factored out such common tasks as short...

Spark's basic statistical API to help you build your own algorithms

In this recipe, we cover Spark's multivariate summary (that is, Statistics.colStats) such as correlation, stratified sampling, hypothesis testing, random data generation, kernel density estimators, and much more, which can be applied to extremely large datasets while taking advantage of both parallelism and resiliency via RDDs.

How to do it...

Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.

Set up the package location where the program will reside:

package spark.ml.cookbook.chapter4

Import the necessary packages for the Spark session to gain access to the cluster and log4j.Logger to reduce the amount of output produced by Spark:

import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.stat.Statistics
import org.apache.spark.sql.SparkSession
import org.apache.log4j.Logger
import org.apache.log4j.Level

Set the output level to ERROR to reduce Spark...

ML pipelines for real-life machine learning applications

This is the first of two recipes which cover the ML pipeline in Spark 2.0. For a advanced treatment of ML pipelines with additional details such as API and parameter extraction, see later chapters in this book.

In this recipe, we attempt to have a single pipeline that can tokenize text, use HashingTF (an old trick) to map term frequencies, run a regression to fit a model, and then predict which group a new term belongs to (for example, news filtering, gesture classification, and so on).

How to do it...

Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.

Set up the package location where the program will reside:

package spark.ml.cookbook.chapter4

Import the necessary packages for the Spark session to gain access to the cluster and log4j.Logger to reduce the amount of output produced by Spark:

import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.classification.LogisticRegression...

Normalizing data with Spark

In this recipe, we normalizing (scaling) the data prior to importing the into an ML algorithm. There are a good number of ML algorithms such as Support Vector Machine (SVM) that work with scaled input vectors rather than with the raw values.

How to do it...

Go to the UCI Machine Learning Repository and download the http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data file.

Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.

Set up the package location where the program will reside:

package spark.ml.cookbook.chapter4

Import the necessary packages for the Spark session to gain access to the cluster and log4j.Logger to reduce the amount of output produced by Spark:

import org.apache.spark.sql.SparkSession
import org.apache.spark.ml.linalg.{Vector, Vectors}
import org.apache.spark.ml.feature.MinMaxScaler

Define a method to parse wine data into a tuple:

def parseWine(str: String): (Int, Vector...

Splitting data for training and testing

In this recipe, you will learn to use Spark's API to split your input data into different datasets that can be used for training and validation phases. It is common to use an 80/20 split, but other of splitting the data can be considered as well based on your preference.

How to do it...

Go to the UCI Machine Learning Repository and download the http://archive.ics.uci.edu/ml/machine-learning-databases/00359/NewsAggregatorDataset.zip file.

Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
Set up the package location where the program will reside:

package spark.ml.cookbook.chapter4

Import the necessary packages for the Spark session to gain access to the cluster and log4j.Logger to reduce the amount of output produced by Spark:

import org.apache.spark.sql.SparkSession
import org.apache.log4j.{ Level, Logger}

Set the output level to ERROR to reduce Spark's logging output:

Logger.getLogger("org"...

Common operations with the new Dataset API

In this recipe, we cover the Dataset API, which is the way for data wrangling in Spark 2.0 and beyond. In Chapter 3, Spark's Three Data Musketeers for Machine Learning - Perfect Together we covered three detailed recipes for dataset, and in this chapter we cover some of the common, repetitive operations that are required to work with these new API sets. Additionally, we demonstrate the query plan generated by the Spark SQL Catalyst optimizer.

How to do it...

Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.

We will use a JSON data file named cars.json, which has been created for this example:

name,city
Bears,Chicago
Packers,Green Bay
Lions,Detroit
Vikings,Minnesota

Set up the package location where the program will reside:

package spark.ml.cookbook.chapter4

Import the necessary packages for the Spark session to get access to the cluster and log4j.Logger to reduce the amount of output produced...

Creating and using RDD versus DataFrame versus Dataset from a text file in Spark 2.0

In this recipe, we explore the differences in creating RDD, DataFrame, and Dataset a text file and their relationship to each other via a short sample code:

Dataset: spark.read.textFile()
RDD: spark.sparkContext.textFile()
DataFrame: spark.read.text()

Note

Assume spark is the session name

How to do it...

Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.
Set up the package location where the program will reside:

package spark.ml.cookbook.chapter4

Import the necessary packages for the Spark session to gain access to the cluster and log4j.Logger to reduce the amount of output produced by Spark:

import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession

We also define a case class to host the data used:

case class Beatle(id: Long, name: String)

Set the output level to ERROR to reduce Spark's logging output:

Logger.getLogger("org").setLevel...

LabeledPoint data structure for Spark ML

LabeledPoint is a data structure that has been since the early days for a feature vector along with a label so it can be used in unsupervised learning algorithms. We demonstrate a short recipe that uses LabeledPoint, the Seq data structure, and DataFrame to run a logistic regression for binary classification of the data. The emphasis here is on LabeledPoint, and the regression algorithms are covered in more depth in Chapter 5, Practical Machine Learning with Regression and Classification in Spark 2.0 - Part I and Chapter 6, Practical Machine Learning with Regression and Classification in Spark 2.0 - Part II.

How to do it...

Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.

Set up the package location where the program will reside:

package spark.ml.cookbook.chapter4

Import the necessary packages for SparkContext to get access to the cluster:

import org.apache.spark.ml.feature.LabeledPoint...

Getting access to Spark cluster in Spark 2.0

In this recipe, we demonstrate how to get access to a cluster using a single point access named SparkSession. Spark 2.0 abstracts multiple contexts (such as SQLContext, HiveContext) into a single entry point, SparkSession, which allows you to get access to all Spark in a unified way.

How to do it...

Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
Set up the package location where the program will reside:

package spark.ml.cookbook.chapter4

Import the necessary packages for SparkContext to get access to the cluster.
In Spark 2.x, SparkSession is more commonly used instead.

import org.apache.spark.sql.SparkSession

Create Spark's configuration and SparkSession so we can have access to the cluster:

val spark = SparkSession
.builder
.master("local[*]") // if use cluster master("spark://master:7077")
.appName("myAccesSparkCluster20")
.config("spark.sql.warehouse.dir", ".")
.getOrCreate()

The preceding...

Getting access to Spark cluster pre-Spark 2.0

This is a pre-Spark 2.0 recipe, but it will be helpful for who want to quickly compare and contrast the cluster access for porting pre-Spark 2.0 programs to Spark 2.0's new paradigm.

How to do it...

Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.

Set up the package location where the program will reside:

package spark.ml.cookbook.chapter4

Import the necessary packages for SparkContext to get access to the cluster:

import org.apache.spark.{SparkConf, SparkContext}

Create Spark's configuration and SparkContext so we can have access to the cluster:

val conf = new SparkConf()
.setAppName("MyAccessSparkClusterPre20")
.setMaster("local[4]") // if cluster setMaster("spark://MasterHostIP:7077")
.set("spark.sql.warehouse.dir", ".")

val sc = new SparkContext(conf)

The preceding code utilizes the setMaster() function to set the cluster master location. As you can see, we are running the code in...

Getting access to SparkContext vis-a-vis SparkSession object in Spark 2.0

In this recipe, we demonstrate how to get of SparkContext using a object in Spark 2.0. This recipe will demonstrate the creation, usage, and back and forth conversion of RDD to Dataset. The reason this is important is that even though we prefer Dataset going forward, we must still be able to use and augment the legacy (pre-Spark 2.0) code mostly utilizing RDD.

How to do it...

Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.
Set up the package location where the program will reside:

package spark.ml.cookbook.chapter4

Import the necessary packages for the Spark session to gain access to the cluster and log4j.Logger to reduce the amount of output produced by Spark:

import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
import scala.util.Random

Set the output level to ERROR to reduce Spark's logging output:

Logger.getLogger("org").setLevel...

New model export and PMML markup in Spark 2.0

In this recipe, we explore the model export facility available in Spark 2.0 to use Predictive Model Markup Language (PMML). This standard XML-based language you to export and run your on other systems (some limitations apply). You can explore the There's more... section for more information.

How to do it...

Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
Set up the package location where the program will reside:

package spark.ml.cookbook.chapter4

Import the necessary packages for SparkContext to get access to the cluster:

import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.sql.SparkSession
import org.apache.spark.mllib.clustering.KMeans

Create Spark's configuration and SparkContext:

val spark = SparkSession
.builder
.master("local[*]")   // if use cluster master("spark://master:7077")
.appName("myPMMLExport")
.config("spark.sql.warehouse.dir", ".")
.getOrCreate()

We read...

Regression model evaluation using Spark 2.0

In this recipe, we explore how to evaluate a regression model (a regression decision tree in this example). Spark provides the RegressionMetrics facility which has statistical facilities such as Mean Squared Error (MSE), R-Squared, and so on, right out of the box.

The objective in this recipe is to understand the metrics provided by out of the box. It is best to concentrate on step 8 since we cover regression in more detail in Chapter 5, Practical Machine Learning with Regression and Classification in Spark 2.0 - Part I and Chapter 6, Practical Machine Learning with Regression and Classification in Spark 2.0 - Part II and the book.

How to do it...

Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.

Set up the package location where the program will reside:

package spark.ml.cookbook.chapter4

Import the necessary packages for SparkContext to get access to the cluster:

import org.apache.spark...

Binary classification model evaluation using Spark 2.0

In this recipe, we demonstrate the use of the BinaryClassificationMetrics facility in Spark 2.0 and its application to evaluating a model that has a outcome (for example, a logistic regression).

The purpose here is not to showcase the regression itself, but to demonstrate how to go about evaluating it using common metrics such as receiver operating characteristic (ROC), Area Under ROC Curve, thresholds, and so on.

We recommend that you concentrate on step 8 since we cover regression in detail in Chapter 5, Practical Machine Learning with Regression and Classification in Spark 2.0 - Part I and Chapter 6, Practical Machine Learning with Regression and Classification in Spark 2.0 - Part II.

How to do it...

Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.

Set up the package location where the program will reside:

package spark.ml.cookbook.chapter4

Import the necessary packages...

Multiclass classification model evaluation using Spark 2.0

In this recipe, we explore MulticlassMetrics, which you to evaluate a l that classifies the output to more than two labels (for example, red, blue, green, purple, do-not-know). It highlights the use of confusion matrix (confusionMatrix) and model accuracy.

How to do it...

Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.

Set up the package location where the program will reside:

package spark.ml.cookbook.chapter4

Import the necessary packages for SparkContext to get access to the cluster:

import org.apache.spark.sql.SparkSession
import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
import org.apache.spark.mllib.evaluation.MulticlassMetrics
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.util.MLUtils

Create Spark's configuration and SparkContext:

val spark = SparkSession
.builder
.master("local[*]")
.appName("myMulticlass...

Multilabel classification model evaluation using Spark 2.0

In this recipe, we explore multilabel classification MultilabelMetrics in Spark 2.0 which should not be mixed up with the previous recipe dealing with multiclass classification MulticlassMetrics. The key to this recipe is to on evaluation metrics such as Hamming loss, accuracy, f1-measure, and so on, and what they measure.

How to do it...

Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.

Set up the package location where the program will reside:

package spark.ml.cookbook.chapter4

Import the necessary packages for SparkContext to get access to the cluster:

import org.apache.spark.sql.SparkSession
import org.apache.spark.mllib.evaluation.MultilabelMetrics
import org.apache.spark.rdd.RDD

Create Spark's configuration and SparkContext:

val spark = SparkSession
.builder
.master("local[*]")
.appName("myMultilabel")
.config("spark.sql.warehouse.dir", ".")
.getOrCreate()

We create the...

Using the Scala Breeze library to do graphics in Spark 2.0

In this recipe, we will use the functions scatter() and plot() from the Scala Breeze linear algebra library (part of) to draw a scatter plot from a two-dimensional data. Once the results are on the Spark cluster, either the actionable data can be used in the driver for drawing or a JPEG or GIF can be generated in the backend and pushed forward for and speed (popular with GPU-based analytical databases such as MapD)

How to do it...

First, we need to download the necessary ScalaNLP library. Download the JAR from the Maven repository available at https://repo1.maven.org/maven2/org/scalanlp/breeze-viz_2.11/0.12/breeze-viz_2.11-0.12.jar.

Place the JAR in the C:\spark-2.0.0-bin-hadoop2.7\examples\jars directory on a Windows machine:
In macOS, please put the JAR in its correct path. For our setting examples, the path is /Users/USERNAME/spark/spark-2.0.0-bin-hadoop2.7/examples/jars/.
The following is the sample screenshot showing the JARs:
Start...

The rest of the chapter is locked

You have been reading a chapter from

Apache Spark 2.x Machine Learning Cookbook

Published in: Sep 2017Publisher: PacktISBN-13: 9781783551606

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (5)

Mohammed Guller

Author of Big Data Analytics with Spark - http://www.apress.com/9781484209653
Read more about Mohammed Guller

Siamak Amirghodsi

Siamak Amirghodsi (Sammy) is interested in building advanced technical teams, executive management, Spark, Hadoop, big data analytics, AI, deep learning nets, TensorFlow, cognitive models, swarm algorithms, real-time streaming systems, quantum computing, financial risk management, trading signal discovery, econometrics, long-term financial cycles, IoT, blockchain, probabilistic graphical models, cryptography, and NLP.
Read more about Siamak Amirghodsi

Shuen Mei

Shuen Mei is a big data analytic platforms expert with 15+ years of experience in designing, building, and executing large-scale, enterprise-distributed financial systems with mission-critical low-latency requirements. He is certified in the Apache Spark, Cloudera Big Data platform, including Developer, Admin, and HBase. He is also a certified AWS solutions architect with emphasis on peta-byte range real-time data platform systems.
Read more about Shuen Mei

Meenakshi Rajendran

Meenakshi Rajendran is experienced in the end-to-end delivery of data analytics and data science products for leading financial institutions. Meenakshi holds a master's degree in business administration and is a certified PMP with over 13 years of experience in global software delivery environments. Her areas of research and interest are Apache Spark, cloud, regulatory data governance, machine learning, Cassandra, and managing global data teams at scale.
Read more about Meenakshi Rajendran

Broderick Hall

Broderick Hall is a hands-on big data analytics expert and holds a masters degree in computer science with 20 years of experience in designing and developing complex enterprise-wide software applications with real-time and regulatory requirements at a global scale. He is a deep learning early adopter and is currently working on a large-scale cloud-based data platform with deep learning net augmentation.
Read more about Broderick Hall

Other recommended products

Related to this chapter

Hands-On Big Data Analytics with PySpark

In this book, you'll learn to implement some practical and proven techniques to improve aspects of programming and administration in Apache Spark. Techniques are demonstrated using practical examples and best practices. You will also learn how to use Spark and its Python API to create performant analytics with large-scale data.

BookMar 2019182 pages

Machine Learning with Spark

Spark ML is the machine learning module of Spark. It uses in-memory RDDs to process machine learning models faster for clustering, classification, and regression.

BookApr 2017532 pages

Hands-On Data Analysis with Scala

This book will help you perform effective data analysis with Scala using practical examples. You will come across different challenges and their effective solutions for a variety of data processing tasks - be it data exploration, data manipulation, or real-time data analysis using Apache Spark.

BookMay 2019298 pages

Apache Spark Quick Start Guide

Apache Spark is a ?exible in-memory framework that allows processing of both batch and real-time data. Its unified engine has made it quite popular for big data use cases. This book will help you to quickly get started with Apache Spark 2.0 and write efficient big data applications for a variety of use cases.

BookJan 2019154 pages

Modern Scala Projects

Scala is a multipurpose programming language, especially for analyzing large datasets without impacting the application performance. Its functional libraries can interact with databases and build scalable frameworks that create robust data pipelines. This book showcases how you can use Scala and its constructs to meet specific project demands.

BookJul 2018334 pages

PySpark Cookbook

This cookbook presents recipes on leveraging the power of Python and putting it to use in the Apache Spark ecosystem. By the end of this book, you will be able to solve any problem associated with building effective, data-intensive applications and performing machine learning and structured streaming using PySpark.

BookJun 2018330 pages

Learning Apache Spark 2

Apache Spark is one of the most popular Big Data processing frameworks today, delivering speed, accuracy and real-time results – all in one solution. With this book, you will delve into the world of Apache Spark and learn about the new features introduced in Spark 2, along with the architecture and the associated concepts. A comprehensive guide to Apache Spark 2 for beginners, this book covers everything you need to know to get up and running with Big Data processing, machine learning and stream processing with Apache Spark, and allows you to easily understand each of these concepts through real-world examples.

BookMar 2017356 pages

Machine Learning with Scala Quick Start Guide

Scala as a programming language is a highly scalable integration of object-oriented and functional programming, which makes it easy to build scalable and complex big data applications. This book is a handy guide for machine learning developers and data scientists who want to train effective machine learning models using this popular language.

BookApr 2019220 pages

Scala and Spark for Big Data Analytics

Over the last few years, Scala has been adopted increasingly, especially in the field of data science and analytics, along with Apache Spark, which is built on Scala and is widely used in the field of analytics. With this book, you’ll learn how to leverage the power of both Scala and Spark to make sense of big data.

BookJul 2017796 pages

Apache Spark 2.x for Java Developers

Apache Spark is the buzzword in the big data industry right now, especially with the increasing need for real-time streaming and data processing. While Spark is built on Scala, the Spark Java API exposes all the Spark features available in the Scala version for Java developers. This book will show you how you can implement various functionalities of the Apache Spark framework in Java, without stepping out of your comfort zone.

BookJul 2017350 pages

Learning Spark SQL

In the past year, Apache Spark has been increasingly adopted for development of distributed applications. Spark SQL APIs provides an optimized interface that helps developers build such applications quickly and easily. However, designing web-scale production applications using Spark SQL APIs can be a complex task. Understanding the design and implementation best practices for Spark SQL API based applications before you start your project will help you avoid these problems and ensure that your project is a success. Learning Spark SQL gives an insight into the engineering practices used to design and build real-world Spark-based applications. The hands-on examples will give you the required confidence to work on any future projects you encounter in Spark SQL.

BookSep 2017452 pages

Apache Spark 2.x Cookbook

Apache Spark has become the hottest platform and sought after skill set when it comes to the fields of Big Data, Analytics and Data Science. Apache Spark 2.x comes with series of new improvements in the areas of performance, scalability, operational and production readiness for structured processing of massive datasets. This book brings in a systematic way of getting a practical hands on to using its improved programming APIs, expanded SQL functionalities and implement distributed machine learning applications with Spark ML. Through the course of chapters, you will have explored the power of Spark DataFrames/Datasets, harness MLLib for Data mining, analyze complex problems with iterative or multi-stage Spark scripts and other associated toolsets such as Spark SQL, Spark Streaming and GraphX .

BookMay 2017294 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages