You're reading from Mastering Predictive Analytics with Python

Product typeBook

Published inAug 2016

Reading LevelIntermediate

Publisher

ISBN-139781785882715

Edition1st Edition

Languages

Python

Concepts

Predictive Analytics

Author (1)

Joseph Babcock

Index

A

A/B testing
- models, iterating / Iterating on models through A/B testing
- experimental allocation / Experimental allocation – assigning customers to experiments
- sample size, deciding / Deciding a sample size
- multiple hypothesis testing / Multiple hypothesis testing
adjacency matrix / Where agglomerative clustering fails
affinity propagation
- cluster numbers, selecting automatically / Affinity propagation – automatically choosing cluster numbers
agglomerative clustering
- about / Agglomerative clustering
- failures / Where agglomerative clustering fails
Alternating Least Squares (ALS) / Case Study: Training a Recommender System in PySpark
Amazon Web Services (AWS) / Working in the cloud
analytic pipeline
- data splitting / Modeling layer
- parameter tuning / Modeling layer
- model performance / Modeling layer
- model persistence / Modeling layer
analytic solution, advanced
- designing / Designing an advanced analytic solution
- data layer / Data layer: warehouses, lakes, and streams
- modeling layer / Modeling layer
- deployment layer / Deployment layer
- reporting layer / Reporting layer
application layer / Deployment layer
Area Under Curve (AUC) / Evaluating changes in model performance
area under curve (AUC)
- about / Evaluating classification models
auto-regressive moving average (ARMA) / Time series data

B

back-propagation
- about / Parameter fitting with back-propagation
boosting
- about / Fitting and SVM to the census data, Boosting – combining small models to improve accuracy
broker / Persisting information with database systems

C

categorical data
- similarity metrics / Similarity metrics for categorical data
- normalizing / Similarity metrics for categorical data
Celery library
- URL / The web application
Classification and Regression Trees (CART) algorithm / Decision trees
classification models
- evaluating / Evaluating classification models
- improving / Strategies for improving classification models
client layer / Deployment layer
client requests
- handling / Clients and making requests
- GET requests, implementing / The GET requests
- POST request, implementing / The POST request
- HEAD request, implementing / The HEAD request
- PUT request, implementing / The PUT request
- DELETE request, implementing / The DELETE request
communication
- guidelines / Guidelines for communication
- terms, translating to business values / Translate terms to business values
- results, visualizing / Visualizing results
convexity
- about / Jointly optimizing all parameters with second-order methods
convolutional network
- about / Convolutional networks and rectified units
- input layer / Convolutional networks and rectified units
- convolutional layer / Convolutional networks and rectified units
- rectifying layer / Convolutional networks and rectified units
- downsampling layer / Convolutional networks and rectified units
- fully connected layer / Convolutional networks and rectified units
correlation similarity metrics
- about / Correlation similarity metrics and time series
covariance / Correlation similarity metrics and time series
curl command
- about / The architecture of a prediction service
- URL / The architecture of a prediction service

D

database systems
- using / Persisting information with database systems
data layer / Designing an advanced analytic solution
decision trees
- about / Decision trees
dendrograms / Agglomerative clustering
deployment layer / Deployment layer
digit recognition / The TensorFlow library and digit recognition
distance metrics
- about / Similarity and distance metrics
- numerical distance metrics / Numerical distance metrics
- time series / Correlation similarity metrics and time series
- blending / Similarity metrics for categorical data
Dow Jones Industrial Average (DJIA) / Correlation similarity metrics and time series
Driver / Creating the SparkContext
Dynamic Time Warping (DTW) / Correlation similarity metrics and time series

E

e-mail campaigns, case study
- about / Case study: targeted e-mail campaigns
- data input and transformation / Data input and transformation
- sanity checking / Sanity checking
- model development / Model development
- scoring / Scoring
- visualization and reporting / Visualization and reporting
Executors / Creating the SparkContext

F

false positive rate (FPR)
- about / Evaluating classification models
familywise error rate (FWER) / Multiple hypothesis testing
Flask
- URL / Application – the engine of the predictive services

G

Gaussian kernel
- about / Fitting and SVM to the census data
Gauss Markov Theorem / Linear regression
generalized linear models
- about / Generalized linear models
Generalized Linear Models (GLMs) / Logistic regression
Generalize Estimating Equations (GEE)
- about / Generalize estimating equations
geospatial data
- about / Working with geospatial data
- loading / Loading geospatial data
- cloud, working in / Working in the cloud
gradient boosted decision trees
- about / Gradient boosted decision trees
- versus, support vector machines and logistic regression / Comparing classification methods
gradient boosted machine (GBM) / Evaluating changes in model performance
graphical user interface (GUI) / Cleaning textual data
graphics processing unit (GPU) / The TensorFlow library and digit recognition

H

H20
- URL / Joining signals and correlation
Hadoop distributed file system (HDFS) / Creating an RDD
hierarchical clustering / Agglomerative clustering
hinge loss
- about / Separating Nonlinear boundaries with Support vector machines
horizontal scaling / Server – the web traffic controller
HTTP Status Codes / The GET requests
hypertext transfer protocol (HTTP)
- about / The architecture of a prediction service

I

images
- about / Images
- image data, cleaning / Cleaning image data
- thresholding, for highlighting objects / Thresholding images to highlight objects
- dimensionality reduction, for image analysis / Dimensionality reduction for image analysis
Indicator Function / Extracting features from textual data
Internet Movie Database
- URL / Exploring categorical and numerical data in IPython
IPython notebook
- about / Exploring categorical and numerical data in IPython
- installing / Installing IPython notebook
- interface / The notebook interface
- data, loading / Loading and inspecting data
- data, inspecting / Loading and inspecting data
- basic manipulations / Basic manipulations – grouping, filtering, mapping, and pivoting
- Matplotlib, charting with / Charting with Matplotlib
iteratively reweighted least squares (IRLS)
- about / Jointly optimizing all parameters with second-order methods

K

K-means ++ / K-means clustering
K-means clustering
- about / K-means clustering
k-medoids
- about / k-medoids
kernel function
- about / Separating Nonlinear boundaries with Support vector machines

L

Labeled RDD / Streaming clustering in Spark
Latent Dirichlet Allocation (LDA)
- about / Latent Dirichlet Allocation
Latent Semantic Indexing (LSI) / Principal component analysis
linear regression
- about / Linear regression
- data, preparing / Data preparation
- evaluation / Model fitting and evaluation
- model, fitting / Model fitting and evaluation
- statistical significance / Statistical significance of regression outputs
- Generalize Estimating Equations (GEE) / Generalize estimating equations
- mixed effects models / Mixed effects models
- time series data / Time series data
- generalized linear models / Generalized linear models
- regularization, applying to linear models / Applying regularization to linear models
linkage metric / Where agglomerative clustering fails
link functions
- Logit / Generalized linear models
- Poisson / Generalized linear models
- Exponential / Generalized linear models
logistic regression
- about / Logistic regression
- multiclass logistic classifiers / Multiclass logistic classifiers: multinomial regression
- dataset, formatting for classification problems / Formatting a dataset for classification problems
- stochastic gradient descent (SGD) / Learning pointwise updates with stochastic gradient descent
- parameters, optimizing with second-order methods / Jointly optimizing all parameters with second-order methods
- model, fitting / Fitting the model
- versus, support vector machines and gradient boosted decision trees / Comparing classification methods
logistic regression service
- as case study / Case study – logistic regression service
- database, setting up / Setting up the database
- web server, setting up / The web server
- web application, setting up / The web application
- model, training / The flow of a prediction service – training a model
- on-demand and bulk prediction, obtaining / On-demand and bulk prediction
Long Short Term Memory Networks (LSTM) / Optimizing the learning rate

M

Matplotlib
- charting with / Charting with Matplotlib
message passing / Affinity propagation – automatically choosing cluster numbers
Mixed National Institute of Standards and Technology (MNIST) database / The MNIST data
modeling layer / Modeling layer
model performance
- checking, with diagnostic / Checking the health of models with diagnostics
- changes, evaluating / Evaluating changes in model performance
- changes in feature importance, evaluating / Changes in feature importance
- unsupervised model performance, changes / Changes in unsupervised model performance
models
- iterating, through A/B testing / Iterating on models through A/B testing
multiclass logistic classifiers
- about / Multiclass logistic classifiers: multinomial regression
multidimensional scaling (MDS) / Numerical distance metrics
multinomial regression / Multiclass logistic classifiers: multinomial regression

N

natural language toolkit (NLTK) library / Cleaning textual data
neural networks
- patterns, learning with / Learning patterns with neural networks
- perceptron / A network of one – the perceptron
- perceptrons, combining / Combining perceptrons – a single-layer neural network
- single-layer neural network / Combining perceptrons – a single-layer neural network
- parameter fitting, with back-propagation / Parameter fitting with back-propagation
- discriminative, versus generative models / Discriminative versus generative models
- gradients, vanishing / Vanishing gradients and explaining away
- belief networks, pretraining / Pretraining belief networks
- regularizing, dropout used / Using dropout to regularize networks
- convolutional networks / Convolutional networks and rectified units
- rectified units / Convolutional networks and rectified units
- data compressing, with autoencoder networks / Compressing Data with autoencoder networks
- learning rate, optimizing / Optimizing the learning rate
neurons / Combining perceptrons – a single-layer neural network
Newton methods
- about / Jointly optimizing all parameters with second-order methods
non-relational database / Persisting information with database systems
numerical distance metrics
- about / Numerical distance metrics

O

Ordinary Least Squares (OLS) / Linear regression

P

prediction service
- architecture / The architecture of a prediction service
- sever, using / Server – the web traffic controller
- application, setting up / Application – the engine of the predictive services
- information, persisting with database systems / Persisting information with database systems
Principal Component Analysis (PCA)
- about / Principal component analysis
- Latent Dirichlet Allocation (LDA) / Latent Dirichlet Allocation
- dimensionality reduction, using in predective modeling / Using dimensionality reduction in predictive modeling
pseudo-residuals / Gradient boosted decision trees
pyspark
- classifier models, implementing / Case study: fitting classifier models in pyspark
PySpark
- URL / Joining signals and correlation, Introduction to PySpark
- about / Introduction to PySpark, Scaling out with PySpark – predicting year of song release
- SparkContext, creating / Creating the SparkContext
- RDD, creating / Creating an RDD
- Spark DataFrame, creating / Creating a Spark DataFrame
- example / Scaling out with PySpark – predicting year of song release
Python requests library
- URL / The GET requests

R

RabbitMQ
- URL / The web application
random forest
- about / Random forest
RDD
- creating / Creating an RDD
Receiver-Operator-Characteristic (ROC) / Evaluating changes in model performance
receiver operator characteristic (ROC) / Logistic regression
Receiver Operator Characteristic (ROC) curve
- about / Evaluating classification models
recommender system training, in PySpark
- case study / Case Study: Training a Recommender System in PySpark
Rectified Linear Unit (ReLU) / Convolutional networks and rectified units
Recurrent Neural Networks (RNNs) / Optimizing the learning rate
Redis
- URL / Setting up the database
relational database / Persisting information with database systems
reporting layer / Reporting layer
reporting service
- about / Case Study: building a reporting service
- report server, setting up / The report server
- report application, setting up / The report application
- visualization layer, using / The visualization layer
Resilient Distributed Dataset (RDD) / Streaming clustering in Spark
Resilient Distributed Datasets (RDDs) / Introduction to PySpark

S

second-order methods
- about / Formatting a dataset for classification problems
- parameters, optimizing / Jointly optimizing all parameters with second-order methods
server
- used, for communicating with external systems / Server – the web traffic controller
similarity metrics
- about / Similarity and distance metrics
- correlation similarity metrics / Correlation similarity metrics and time series
- for categorical data / Similarity metrics for categorical data
Singular Value Decomposition (SVD) / Numerical distance metrics, Principal component analysis
social media feeds, case study
- about / Case study: sentiment analysis of social media feeds
- data input and transformation / Data input and transformation
- sanity checking / Sanity checking
- model development / Model development
- scoring / Scoring
- visualization and reporting / Visualization and reporting
soft-margin formulation / Separating Nonlinear boundaries with Support vector machines
Spark
- streaming clustering / Streaming clustering in Spark
SparkContext
- creating / Creating the SparkContext
Spark DataFrame
- creating / Creating a Spark DataFrame
spectral clustering / Where agglomerative clustering fails
statsmodels
- URL / Model fitting and evaluation
stochastic gradient descent
- about / Learning pointwise updates with stochastic gradient descent
stochastic gradient descent (SGD)
- about / Formatting a dataset for classification problems
streaming clustering
- about / Streaming clustering in Spark
support-vector networks
- about / Separating Nonlinear boundaries with Support vector machines
support vector machine (SVM)
- nonlinear boundaries, separating / Separating Nonlinear boundaries with Support vector machines
- implementing, to census data / Fitting and SVM to the census data
- boosting / Boosting – combining small models to improve accuracy
- versus, logistic regression and gradient boosted decision trees / Comparing classification methods

T

TensorFlow library
- about / The TensorFlow library and digit recognition
- MNIST data / The MNIST data
- network, constructing / Constructing the network
term-frequency-inverse document frequency (tf-idf) / Extracting features from textual data
textual data
- working with / Working with textual data
- cleaning / Cleaning textual data
- features, extracting from / Extracting features from textual data
- dimensionality reduction, used for simplyfying datasets / Using dimensionality reduction to simplify datasets
time series
- about / Correlation similarity metrics and time series
time series analysis
- about / Time series analysis
- cleaning and converting / Cleaning and converting
- time series diagnostics / Time series diagnostics
- signals and correlation, joining / Joining signals and correlation
transformations and operations
- URL / Creating an RDD
tree methods
- about / Tree methods
- decision trees / Decision trees
- random forest / Random forest
true positive rate (TPR)
- about / Evaluating classification models

U

units / Combining perceptrons – a single-layer neural network
Unweighted Pair Group Method with Arithmetic Mean (UPGMA) / Agglomerative clustering

V

vertical scaling / Server – the web traffic controller

W

Web Server Gateway Interface (WSGI)
- about / The architecture of a prediction service

X

XGBoost
- URL / Joining signals and correlation

The rest of the chapter is locked

You have been reading a chapter from

Mastering Predictive Analytics with Python

Published in: Aug 2016Publisher: ISBN-13: 9781785882715

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Joseph Babcock

Joseph Babcock has spent more than a decade working with big data and AI in the e-commerce, digital streaming, and quantitative finance domains. Through his career he has worked on recommender systems, petabyte scale cloud data pipelines, A/B testing, causal inference, and time series analysis. He completed his PhD studies at Johns Hopkins University, applying machine learning to the field of drug discovery and genomics.
Read more about Joseph Babcock

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages