You're reading from Big Data Analytics with Java

Product typeBook

Published inJul 2017

Reading LevelIntermediate

PublisherPackt

ISBN-139781787288980

Edition1st Edition

Languages

Java

Tools

Apache Spark Hadoop

Concepts

Big Data

Author (1)

RAJAT MEHTA

Index

A

Activation Function / Perceptron
advanced visualization technique
- about / Advanced visualization technique
- prefuse / Prefuse
- IVTK Graph toolkit / IVTK Graph toolkit
Alternating Least Square (ALS) / Alternating least square – collaborative filtering
Apache Kafka
- about / Apache Kafka
- IoT sensors, integration / Apache Kafka
- social media real-time analytics / Apache Kafka
- healthcare analytics / Apache Kafka
- log analytics / Apache Kafka
- risk aggregation, in finance / Apache Kafka
Apache Spark
- about / Apache Spark
- concepts / Concepts
- transformations / Transformations
- actions / Actions
- Spark Java API / Spark Java API
- samples, Java 8 used / Spark samples using Java 8
- data, loading / Loading data
- data operations / Data operations – cleansing and munging
- data, analyzing / Analyzing data – count, projection, grouping, aggregation, and max/min
- common transformations, on Spark RDDs / Analyzing data – count, projection, grouping, aggregation, and max/min
- actions, on RDDs / Actions on RDDs
- paired RDDs / Paired RDDs
- data, saving / Saving data
- results, collecting / Collecting and printing results
- results, printing / Collecting and printing results
- programs, executing on Hadoop / Executing Spark programs on Hadoop
- subprojects / Apache Spark sub-projects
- machine learning modules / Spark machine learning modules
- Apache Mahout / Mahout – a popular Java ML library
- Deeplearning4j / Deeplearning4j – a deep learning library
- Apriori algorithm, implementation / Implementation of the Apriori algorithm in Apache Spark
- FP-Growth algorithm, executing / Running FP-Growth on Apache Spark
Apache Spark, machine learning modules
- MLlib Java API / MLlib Java API
- machine learning libraries / Other machine learning libraries
Apache Spark machine learning API
- about / The new Spark ML API
- machine learning algorithms / The new Spark ML API
- features handling tools / The new Spark ML API
- model selection / The new Spark ML API
- tuning tools / The new Spark ML API
- utility methods / The new Spark ML API
Apriori algorithm
- implementation, in Apache Spark / Implementation of the Apriori algorithm in Apache Spark
- using / Implementation of the Apriori algorithm in Apache Spark
- disadvantages / Implementation of the Apriori algorithm in Apache Spark
artificial neural network / Introduction to neural networks

B

bagging / Bagging
bag of words / Bag of words
bar chart
- about / Bar charts
- dataset, creating / Bar charts
base project setup / Base project setup
- default Kafka configurations, used / Base project setup
- Maven Java project, for Spark Streaming / Base project setup
bayes theorem / Bayes theorem
bid data
- Analytical products / Basics of Hadoop – a Java sub-project
- Batch products / Basics of Hadoop – a Java sub-project
- Streamlining / Basics of Hadoop – a Java sub-project
- Machine learning libraries / Basics of Hadoop – a Java sub-project
- NoSQL / Basics of Hadoop – a Java sub-project
- Search / Basics of Hadoop – a Java sub-project
bidirected graph / Refresher on graphs
big data
- data analytics on / Why data analytics on big data?
- for data analytics / Big data for analytics
- to bigger pay package, for Java developers / Big data – a bigger pay package for Java developers
- Hadoop, basics / Basics of Hadoop – a Java sub-project
big data stack
- HDFS / Basics of Hadoop – a Java sub-project
- Spark / Basics of Hadoop – a Java sub-project
- Impala / Basics of Hadoop – a Java sub-project
- MapReduce / Basics of Hadoop – a Java sub-project
- Sqoop / Basics of Hadoop – a Java sub-project
- Oozie / Basics of Hadoop – a Java sub-project
- Flume / Basics of Hadoop – a Java sub-project
- Kafka / Basics of Hadoop – a Java sub-project
- Yarn / Basics of Hadoop – a Java sub-project
binary classification dataset / What are the feature types that can be extracted from the datasets?
boosting / Boosting
bootstrapping / Bagging
box plots / Box plots

C

charts
- used, in big data analytics / Using charts in big data analytics
- for initial data exploration / Using charts in big data analytics
- for data visualization and reporting / Using charts in big data analytics
clustering
- about / Clustering
- customer segmentation / Clustering
- search engines / Clustering
- data exploration / Clustering
- epidemic breakout zones, finding / Clustering
- biology / Clustering
- news categorization / Clustering
- news, summarization / Clustering
- types / Types of clustering
- hierarchical clustering / Hierarchical clustering
- K-means clustering / K-means clustering
- k-means clustering, bisecting / Bisecting k-means clustering
- for customer segmentation / Clustering for customer segmentation
clustering algorithm
- changing / Changing the clustering algorithm
code
- diving / Diving into the code:
cold start problem / Content-based recommendation systems
collaborative recommendation systems
- about / Collaborative recommendation systems
- advantages / Advantages
- disadvantages / Disadvantages
- collaborative filtering / Alternating least square – collaborative filtering
common transformations, on Spark RDDs
- Filter / Analyzing data – count, projection, grouping, aggregation, and max/min
- Map / Analyzing data – count, projection, grouping, aggregation, and max/min
- FlatMap / Analyzing data – count, projection, grouping, aggregation, and max/min
- other transformations / Analyzing data – count, projection, grouping, aggregation, and max/min
Conditional-FP tree / Efficient market basket analysis using FP-Growth algorithm
Conditional FP Tree / Efficient market basket analysis using FP-Growth algorithm
Conditional Pattern / Efficient market basket analysis using FP-Growth algorithm
Conditional Patterns Base / Efficient market basket analysis using FP-Growth algorithm
conditional probability / Conditional probability
content-based recommendation systems
- about / Content-based recommendation systems
- Euclidean Distance / Content-based recommendation systems
- Pearson Correlation / Content-based recommendation systems
- dataset / Dataset
- content-based recommender, on MovieLens dataset / Content-based recommender on MovieLens dataset
- collaborative recommendation systems / Collaborative recommendation systems
content-based recommender
- on MovieLens dataset / Content-based recommender on MovieLens dataset
context
- building / Building SparkConf and context
customer segmentation / Customer segmentation
- clustering / Clustering for customer segmentation

D

data
- cleaning / Data cleaning and munging, Cleaning and munging the data
- munging / Data cleaning and munging, Cleaning and munging the data
- unwanted data, filtering / Data cleaning and munging
- missing data, handling / Data cleaning and munging
- incomplete data, handling / Data cleaning and munging
- discarding / Data cleaning and munging
- constant value, filling / Data cleaning and munging
- average value, populating / Data cleaning and munging
- nearest neighbor approach / Data cleaning and munging
- converting, to proper format / Data cleaning and munging
- basic analysis, with Spark SQL / Basic analysis of data with Spark SQL
- parsing / Load and parse data
- loading / Load and parse data
- Spark-SQL way / Analyzing data – the Spark-SQL way
- Spark SQL, for data exploration and analytics / Spark SQL for data exploration and analytics
- Apriori algorithm / Market basket analysis – Apriori algorithm
- Full Apriori algorithm / Full Apriori algorithm
- preparing / Preparing the data
- formatting / Formatting the data
- storing / Storing the data
data analytics
- on big data / Why data analytics on big data?
- distributed computing, on Hadoop / Distributed computing on Hadoop
- HDFS concepts / HDFS concepts
- Apache Spark / Apache Spark
data exploration
- of text data / Data exploration of text data
/ Data exploration, Data exploration
dataframe / Dataframe and datasets
DataNode / Main components of HDFS
dataset / Dataset, Dataset
- URL, for downloading / All India seasonal and annual average temperature series dataset
- fields / All India seasonal and annual average temperature series dataset
- data / All India seasonal and annual average temperature series dataset
- reference link / Predicting house prices using linear regression
- data, munging / Data cleaning and munging
- full batch approach / Accuracy of multi-layer perceptrons
- partial batch approach / Accuracy of multi-layer perceptrons
dataset, linear regression
- data, cleaning / Data cleaning and munging
- exploring / Exploring the dataset
- number of rows / Exploring the dataset
- average price per zipcode, sorting by highest on top / Exploring the dataset
- linear regression model, executing / Running and testing the linear regression model
- linear regression model, testing / Running and testing the linear regression model
dataset, logistic regression
- data, cleaning / Data cleaning and munging
- data, munging / Data cleaning and munging
- data, missing / Data cleaning and munging
- categorical data / Data cleaning and munging
- data exploration / Data exploration
- executing / Running and testing the logistic regression model
- testing / Running and testing the logistic regression model
dataset object / Training and testing the model
datasets / Datasets, Dataframe and datasets
- airports dataset / Datasets
- routes dataset / Datasets
- airlines dataset / Datasets
datasets splitting
- features selected / Choosing the best features for splitting the datasets
- Gini Impurity / Choosing the best features for splitting the datasets
data transfer techniques
- Flume / Getting and preparing data in Hadoop
- FTP / Getting and preparing data in Hadoop
- Kafka / Getting and preparing data in Hadoop
- HBase / Getting and preparing data in Hadoop
- Hive / Getting and preparing data in Hadoop
- Impala / Getting and preparing data in Hadoop
data visualization
- with Java JFreeChart / Data visualization with Java JFreeChart
- charts, used in big data analytics / Using charts in big data analytics
decision tree
- about / What is a decision tree?
- for classification / What is a decision tree?
- for regression / What is a decision tree?
- building / Building a decision tree
- datasets splitting, features selected / Choosing the best features for splitting the datasets
- advantages / Advantages of using decision trees
- disadvantages / Disadvantages of using decision trees
- dataset / Dataset
- data exploration / Data exploration
- data, cleaning / Cleaning and munging the data
- data, munging / Cleaning and munging the data
- model, training / Training and testing the model
- model, testing / Training and testing the model
deep learning
- about / Deep learning
- advantages / Advantages and use cases of deep learning
- use cases / Advantages and use cases of deep learning
- no feature engineering required / Advantages and use cases of deep learning
- accuracy / Advantages and use cases of deep learning
- information / More information on deep learning
deeplearning4j / Deeplearning4j
- references / Deeplearning4j
Deeplearning4j
- about / Deeplearning4j – a deep learning library
- data, compressing / Compressing data
- Avro / Avro and Parquet
- Parquet / Avro and Parquet
distributed computing
- on Hadoop / Distributed computing on Hadoop

E

edges / Refresher on graphs
efficient market basket analysis
- FP-Growth algorithm, used / Efficient market basket analysis using FP-Growth algorithm
ensembling
- about / Ensembling
- voting / Ensembling
- averaging / Ensembling
- machine learning algorithm, used / Ensembling
- types / Types of ensembling
- bagging / Bagging
- boosting / Boosting
- advantages / Advantages and disadvantages of ensembling
- disadvantages / Advantages and disadvantages of ensembling
- random forest / Random forests
- Gradient boosted trees (GBTs) / Gradient boosted trees (GBTs)

F

feature selection
- filter methods / How do you select the best features to train your models?
- pearson correlation / How do you select the best features to train your models?
- chi-square / How do you select the best features to train your models?
- wrapper method / How do you select the best features to train your models?
- forward selection / How do you select the best features to train your models?
- backward elimination / How do you select the best features to train your models?
- embedded method / How do you select the best features to train your models?
FP-Growth algorithm
- used, for efficient market basket analysis / Efficient market basket analysis using FP-Growth algorithm
- transaction dataset / Efficient market basket analysis using FP-Growth algorithm
- frequency of items, calculating / Efficient market basket analysis using FP-Growth algorithm
- priority, assigning to items / Efficient market basket analysis using FP-Growth algorithm
- array items, by priority / Efficient market basket analysis using FP-Growth algorithm
- FP-Tree, building / Efficient market basket analysis using FP-Growth algorithm
- frequent patterns, identifying from FP-Tree / Efficient market basket analysis using FP-Growth algorithm
- conditional patterns, mining / Efficient market basket analysis using FP-Growth algorithm
- conditional patterns, from leaf node Diapers / Efficient market basket analysis using FP-Growth algorithm
- executing, on Apache Spark / Running FP-Growth on Apache Spark
Frequent Item sets / Efficient market basket analysis using FP-Growth algorithm
Frequent Pattern Mining
- reference link / Running FP-Growth on Apache Spark
Full Apriori algorithm
- about / Full Apriori algorithm
- dataset / Full Apriori algorithm
- apriori implementation / Full Apriori algorithm

G

Gradient boosted trees (GBTs)
- about / Advantages and disadvantages of ensembling, Gradient boosted trees (GBTs)
- dataset, used / Classification problem and dataset used
- issues, classifying / Classification problem and dataset used
- data exploration / Data exploration
- random forest model, training / Training and testing our random forest model
- random forest model, testing / Training and testing our random forest model
- gradient boosted tree model, testing / Training and testing our gradient boosted tree model
- gradient boosted tree model, training / Training and testing our gradient boosted tree model
graph analytics
- about / Graph analytics
- path analytics / Graph analytics
- connectivity analytics / Graph analytics
- community analytics / Graph analytics
- centrality analytics / Graph analytics
- GraphFrames / GraphFrames
- GraphFrames, used for building a graph / Building a graph using GraphFrames
- on airports / Graph analytics on airports and their flights
- on flights / Graph analytics on airports and their flights
- datasets / Datasets
- on flights data / Graph analytics on flights data
graphs
- refresher / Refresher on graphs
- representing / Representing graphs
- adjacency matrix / Representing graphs
- adjacency list / Representing graphs
- common terminology / Common terminology on graphs
- common algorithms / Common algorithms on graphs
- plotting / Plotting graphs
graphs, common algorithms
- breadth first search / Common algorithms on graphs
- depth first search / Common algorithms on graphs
- dijkstra shortest path / Common algorithms on graphs
- PageRank algorithm / Common algorithms on graphs
graphs, common terminology
- vertices / Common terminology on graphs
- edges / Common terminology on graphs
- degrees / Common terminology on graphs
- indegrees / Common terminology on graphs
- outdegrees / Common terminology on graphs
GraphStream library
- reference link / Plotting graphs

H

Hadoop
- basics / Basics of Hadoop – a Java sub-project
- features / Basics of Hadoop – a Java sub-project
- distributed computing on / Distributed computing on Hadoop
- core / Distributed computing on Hadoop
- HDFS / Distributed computing on Hadoop
Hadoop Distributed File System (HDFS)
- about / Distributed computing on Hadoop
- Open Source / Design and architecture of HDFS
- Immense scalability, for amount of data / Design and architecture of HDFS
- failover support / Design and architecture of HDFS
- fault tolerance / Design and architecture of HDFS
- data locality / Design and architecture of HDFS
- NameNode / Main components of HDFS
- DataNode / Main components of HDFS
/ Real-time SQL queries using Impala
hand written digit recognizition
- using CNN / Hand written digit recognizition using CNN
HBase / Real-time data processing
HDFS concepts
- about / HDFS concepts
- architecture / Design and architecture of HDFS
- design / Design and architecture of HDFS
- components / Main components of HDFS
- simple commands / HDFS simple commands
hierarchical clustering / Hierarchical clustering
histogram
- about / Histograms
- using / When would you use a histogram?
- creating, JFreeChart used / How to make histograms using JFreeChart?
human neuron
- dendrite / Introduction to neural networks
- cell body / Introduction to neural networks
- axom terminal / Introduction to neural networks
hyperplane / Scatter plots, What is simple linear regression?

I

Impala
- used, for real-time SQL queries / Real-time SQL queries using Impala
- advantages / Real-time SQL queries using Impala
- flight delay analysis / Flight delay analysis using Impala
- Apache Kafka / Apache Kafka
- Spark Streaming / Spark Streaming, Typical uses of Spark Streaming
- trending videos / Trending videos
Iris dataset
- reference link / Flower species classification using multi-Layer perceptrons
IVTK Graph toolkit
- about / IVTK Graph toolkit
- other libraries / Other libraries

J

JFreeChart API
- dataset loading, Apache Spark used / Simple single Time Series chart
- chart object, creating / Simple single Time Series chart
- dataset object, filling / Bar charts
- chart component, creating / Bar charts

K

k-means clustering
- bisecting / Bisecting k-means clustering
K-means clustering / K-means clustering

L

linear regression
- about / Linear regression
- using / Where is linear regression used?
- used, for predicting house prices / Predicting house prices using linear regression
- dataset / Dataset
line charts / Line charts
logistic regression
- about / Logistic regression
- mathematical functions, used / Which mathematical functions does logistic regression use?
- Gradient ascent or descent / Which mathematical functions does logistic regression use?
- Stochastic gradient descent / Which mathematical functions does logistic regression use?
- used for / Where is logistic regression used?
- heart disease, predicting / Where is logistic regression used?
- dataset / Dataset

M

machine learning
- about / What is machine learning?
- example / Real-life examples of machine learning
- at Netflix / Real-life examples of machine learning
- spam filter / Real-life examples of machine learning
- Hand writing detection, on cheque submitted via ATMs / Real-life examples of machine learning
- type / Type of machine learning
- supervised learning / Type of machine learning
- un-supervised learning / Type of machine learning
- semi supervised learning / Type of machine learning
- supervised learning, case study / A small sample case study of supervised and unsupervised learning
- unsupervised learning, case study / A small sample case study of supervised and unsupervised learning
- issues / Steps for machine learning problems
- model, selecting / Choosing the machine learning model
- training/test set / Choosing the machine learning model
- cross validation / Choosing the machine learning model
- features extracted from datasets / What are the feature types that can be extracted from the datasets?
- categorical features / What are the feature types that can be extracted from the datasets?
- numerical features / What are the feature types that can be extracted from the datasets?
- text features / What are the feature types that can be extracted from the datasets?
- features, selecting to train models / How do you select the best features to train your models?
- analytics, executing on big data / How do you run machine learning analytics on big data?
- data, preparing in Hadoop / Getting and preparing data in Hadoop
- data, obtaining in Hadoop / Getting and preparing data in Hadoop
- models, storing on big data / Training and storing models on big data
- models, training on big data / Training and storing models on big data
- Apache Spark machine learning API / Apache Spark machine learning API
massive graphs
- on big data / Massive graphs on big data
- graph analytics / Graph analytics
- graph analytics, on airports / Graph analytics on airports and their flights
maths stats
- min / Box plots
- max / Box plots
- mean / Box plots
- median / Box plots
- lower quartile / Box plots
- upper quartile / Box plots
- outliers / Box plots
mean squared error (MSE) / Bisecting k-means clustering
median value / Box plots
MNIST database
- reference link / Hand written digit recognizition using CNN
model
- selecting / Training and storing models on big data
- training / Training and storing models on big data, Training and testing the model
- storing / Training and storing models on big data
- testing / Training and testing the model
multi-Layer perceptron
- used, for flower species classification / Flower species classification using multi-Layer perceptrons
multi-layer perceptron
- about / Multi-layer perceptrons
- accuracy / Accuracy of multi-layer perceptrons
multiple linear regression / What is simple linear regression?

N

N-grams
- about / N-grams
- examples / N-grams
NameNode / Main components of HDFS
Natural Language Processing (NLP) / What are the feature types that can be extracted from the datasets?, Concepts for sentimental analysis
Naïve bayes algorithm
- about / Naive Bayes algorithm
- advantages / Advantages of Naive Bayes
- disadvantages / Disadvantages of Naive Bayes
neural networks / Introduction to neural networks

O

OpenFlights airports database
- reference link / Datasets

P

paired RDDs
- about / Paired RDDs
- transformations / Transformations on paired RDDs
perceptron
- about / Perceptron
- issues / Problems with perceptrons
- Logical AND / Problems with perceptrons
- Logical OR / Problems with perceptrons
- sigmoid neuron / Sigmoid neuron
- multi-layer perceptron / Multi-layer perceptrons
PFP / Running FP-Growth on Apache Spark
prefuse
- about / Prefuse
- reference link / Prefuse

R

random forest / Random forests
real-time analytics
- about / Real-time analytics
- fraud analytics / Real-time analytics
- sensor data analysis (Internet of Things) / Real-time analytics
- recommendations, giving to users / Real-time analytics
- in healthcare / Real-time analytics
- ad-processing / Real-time analytics
- big data stack / Big data stack for real-time analytics
real-time data ingestion / Real-time data ingestion and storage
- Apache Kafka / Real-time data ingestion and storage
- Apache Flume / Real-time data ingestion and storage
- HBase / Real-time data ingestion and storage
- Cassandra / Real-time data ingestion and storage
real-time data processing / Real-time data processing
- Spark Streaming / Real-time data processing
- Storm / Real-time data processing
real-time SQL queries
- on big data / Real-time SQL queries on big data
- impala / Real-time SQL queries on big data
- Apache Drill / Real-time SQL queries on big data
- Impala, used / Real-time SQL queries using Impala
real-time storage / Real-time data ingestion and storage
Recency, Frequency, and Monetary (RFM) / Customer segmentation
recommendation system
- about / Recommendation systems and their types
- types / Recommendation systems and their types
- content-based recommendation systems / Content-based recommendation systems
Resilient Distributed Dataset (RDD) / Concepts, Dataframe and datasets

S

scatter plots / Scatter plots
sentimental analysis
- about / Sentimental analysis
- concepts / Concepts for sentimental analysis
- tokenization / Tokenization
- stemming / Stemming
- N-grams / N-grams
- term presence / Term presence and Term Frequency
- term frequency / Term presence and Term Frequency
- Term Frequency and Inverse Document Frequency (TF-IDF) / TF-IDF
- bag of words / Bag of words
- dataset / Dataset
- text data, data exploration / Data exploration of text data
- on dataset / Sentimental analysis on this dataset
sigmoid neuron / Sigmoid neuron
simple linear regression / Linear regression, What is simple linear regression?
smoothing factor / Disadvantages of Naive Bayes
SOLR / Real-time data processing
SPAM Detector Model / Type of machine learning
SparkConf
- building / Building SparkConf and context
Spark ML / Apache Spark machine learning API
Spark SQL
- used, for basic analysis on data / Basic analysis of data with Spark SQL
- SparkConf, building / Building SparkConf and context
- context, building / Building SparkConf and context
- dataframe / Dataframe and datasets
- datasets / Dataframe and datasets
- data, loading / Load and parse data
- data, parsing / Load and parse data
Spark Streaming
- about / Spark Streaming, Typical uses of Spark Streaming
- use cases / Typical uses of Spark Streaming
- data collection, in real time / Typical uses of Spark Streaming
- storage, in real time / Typical uses of Spark Streaming
- predictive analytics, in real time / Typical uses of Spark Streaming
- windowed calculations / Typical uses of Spark Streaming
- cumulative calculations / Typical uses of Spark Streaming
- base project setup / Base project setup
stemming / Stemming
stop words removal / Stop words removal
Storm / Spark Streaming
sum of mean squared errors (SMEs) / Bisecting k-means clustering
supervised learning
- about / Type of machine learning
- classification / Type of machine learning
- regression / Type of machine learning
Support Vector Machine (SVM) / SVM or Support Vector Machine

T

tendency / Content-based recommendation systems
term frequency
- about / Term presence and Term Frequency
- example / Term presence and Term Frequency
Term Frequency and Inverse Document Frequency (TF-IDF) / TF-IDF
- about / TF-IDF
- term frequency / TF-IDF
- inverse document frequency / TF-IDF
TimeSeries chart
- about / Time Series chart
- all india seasonal / All India seasonal and annual average temperature series dataset
- annual average temperature series dataset / All India seasonal and annual average temperature series dataset
- simple single TimeSeries chart / Simple single Time Series chart
- multiple TimeSeries, on single chart window / Multiple Time Series on a single chart window
tokenization
- about / Tokenization
- regular expression, used / Tokenization
- pre-trained model, used / Tokenization
- stop words removal / Stop words removal
trending videos
- about / Trending videos
- sentiment analysis, at real time / Sentiment analysis in real time

V

vertexes / Refresher on graphs
Visualization ToolKit (VTK)
- about / IVTK Graph toolkit
- URL / IVTK Graph toolkit

W

windowed calculations / Trending videos

The rest of the chapter is locked

You have been reading a chapter from

Big Data Analytics with Java

Published in: Jul 2017Publisher: PacktISBN-13: 9781787288980

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

RAJAT MEHTA

The author is a VP (Technical Architect) in technology in JP Morgan Chase in New York. The author is a sun certified java developer and has worked on java related technologies for more than 16 years. Current role for the past few years heavily involves the usage of bid data stack and running analytics on it. Author is also a contributor in various open source projects that are available on his GitHub repository and is also a frequent write on dev magazines.
Read more about RAJAT MEHTA

Other recommended products

Related to this chapter

Machine Learning with scikit-learn Quick Start Guide

Scikit-learn is a robust machine learning library for the Python programming language. It provides a set of supervised and unsupervised learning algorithms. This book is the easiest way to learn how to deploy, optimize and evaluate all the important machine learning algorithms that scikit-learn provides.

BookOct 2018172 pages

Apache Spark 2.x for Java Developers

Apache Spark is the buzzword in the big data industry right now, especially with the increasing need for real-time streaming and data processing. While Spark is built on Scala, the Spark Java API exposes all the Spark features available in the Scala version for Java developers. This book will show you how you can implement various functionalities of the Apache Spark framework in Java, without stepping out of your comfort zone.

BookJul 2017350 pages

Apache Spark Quick Start Guide

Apache Spark is a ?exible in-memory framework that allows processing of both batch and real-time data. Its unified engine has made it quite popular for big data use cases. This book will help you to quickly get started with Apache Spark 2.0 and write efficient big data applications for a variety of use cases.

BookJan 2019154 pages

Mastering Machine Learning with Spark 2.x

The purpose of machine learning is to build systems that learn from data. With the meteoric rise of machine learning, developers are now keen on finding out how can they make their Spark applications smarter. The book commences by defining machine learning primitives by the MLlib and H2O libraries. You will learn how to use Binary classification to detect the Higgs Boson particle in the huge amount of data produced by CERN particle collider and classify daily health activities using ensemble Methods for Multi-Class Classification. Finally, you will build different pattern mining models using MLlib, perform complex manipulation of DataFrames using Spark and Spark SQL, and deploy your app in a Spark streaming environment.

BookAug 2017340 pages

Apache Spark 2.x Machine Learning Cookbook

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability, and optimization. This book begins with a quick overview of setting up the necessary IDEs to facilitate the execution of code examples that will be covered in various chapters. It also highlights some key issues developers face while working with machine learning algorithms on the Spark platform. We progress by uncovering the various Spark APIs and the implementation of ML algorithms with developing classification systems, recommendation engines, text analytics, clustering, and learning systems. Toward the final chapters, we’ll focus on building high-end applications and explain various unsupervised methodologies and challenges to tackle when implementing with big data ML systems.

BookSep 2017666 pages

Learning Spark SQL

In the past year, Apache Spark has been increasingly adopted for development of distributed applications. Spark SQL APIs provides an optimized interface that helps developers build such applications quickly and easily. However, designing web-scale production applications using Spark SQL APIs can be a complex task. Understanding the design and implementation best practices for Spark SQL API based applications before you start your project will help you avoid these problems and ensure that your project is a success. Learning Spark SQL gives an insight into the engineering practices used to design and build real-world Spark-based applications. The hands-on examples will give you the required confidence to work on any future projects you encounter in Spark SQL.

BookSep 2017452 pages

Mastering Apache Spark 2.x

Apache Spark is an in-memory cluster-based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and more. This book will familiarize you with the newest features in Apache Spark 2.x, and take you through an exciting journey of complex Big Data processing, analytics, streaming analytics as well as advanced machine learning with Apache Spark. During the course of the book, you will leverage different functionalities and modules of Apache Spark such as Spark SQL, Spark MLlib, Spark Streaming, SparkML and more, to build efficient data processing solutions. By the end of this book, you will have all the necessary knowledge to use Apache Spark effectively in your day to day tasks.

BookJul 2017354 pages

Hands-On Deep Learning with Apache Spark

Deep Learning is a subset of Machine Learning where data sets with several layers of complexity can be processed. This book teaches you the different techniques using which deep learning solutions can be implemented at scale, on Apache Spark. This will help you gain experience of implementing your deep learning models in many real-world use cases.

BookJan 2019322 pages

Machine Learning with Spark

Spark ML is the machine learning module of Spark. It uses in-memory RDDs to process machine learning models faster for clustering, classification, and regression.

BookApr 2017532 pages

Machine Learning with Scala Quick Start Guide

Scala as a programming language is a highly scalable integration of object-oriented and functional programming, which makes it easy to build scalable and complex big data applications. This book is a handy guide for machine learning developers and data scientists who want to train effective machine learning models using this popular language.

BookApr 2019220 pages

Learning PySpark

This book will get you to grips with the Spark Python API. You’ll explore how Python can be used with Spark to build scalable and reliable data-intensive applications.

BookFeb 2017274 pages

Apache Spark 2.x Cookbook

Apache Spark has become the hottest platform and sought after skill set when it comes to the fields of Big Data, Analytics and Data Science. Apache Spark 2.x comes with series of new improvements in the areas of performance, scalability, operational and production readiness for structured processing of massive datasets. This book brings in a systematic way of getting a practical hands on to using its improved programming APIs, expanded SQL functionalities and implement distributed machine learning applications with Spark ML. Through the course of chapters, you will have explored the power of Spark DataFrames/Datasets, harness MLLib for Data mining, analyze complex problems with iterative or multi-stage Spark scripts and other associated toolsets such as Spark SQL, Spark Streaming and GraphX .

BookMay 2017294 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages