Reader small image

You're reading from  Mastering Predictive Analytics with Python

Product typeBook
Published inAug 2016
Reading LevelIntermediate
Publisher
ISBN-139781785882715
Edition1st Edition
Languages
Right arrow
Author (1)
Joseph Babcock
Joseph Babcock
author image
Joseph Babcock

Joseph Babcock has spent more than a decade working with big data and AI in the e-commerce, digital streaming, and quantitative finance domains. Through his career he has worked on recommender systems, petabyte scale cloud data pipelines, A/B testing, causal inference, and time series analysis. He completed his PhD studies at Johns Hopkins University, applying machine learning to the field of drug discovery and genomics.
Read more about Joseph Babcock

Right arrow

Chapter 8. Sharing Models with Prediction Services

Thus far, we have examined how to build a variety of models with data sources ranging from standard 'tabular' data to text and images. However, this only accomplishes part of our goal in business analysis: we can generate predictions from a dataset, but we cannot easily share the results with colleagues or with other software systems within a company. We also cannot easily replicate the results as new data becomes available without manually re-running the sorts of analyses discussed in previous chapters or scale it to larger datasets over time. We will also have difficulty to use our models in a public setting, such as a company's website, without revealing the details of the analysis through the model parameters exposed in our code.

To overcome these challenges, the following chapter will describe how to build 'prediction services', web applications that encapsulate and automate the core components of data transformation, model fitting,...

The architecture of a prediction service


Now with a clear goal in mind—to share and scale the results of our predictive modeling using a web application—what are the components required to accomplish this objective?

The first is the client: this could be either a web browser or simply a user entering a curl command in the terminal (see Aside). In either case, the client sends requests using hypertext transfer protocol (HTTP), a standard transport convention to retrieve or transmit information over a network (Berners-Lee, Tim, Roy Fielding, and Henrik Frystyk. Hypertext transfer protocol--HTTP/1.0. No. RFC 1945. 1996). An important feature of the HTTP standard is that the client and server do not have to 'know' anything about how the other is implemented (for example, which programming language is used to write these components) because the message will remain consistent between them regardless by virtue of following the HTTP standard.

The next component is the server, which receives HTTP...

Clients and making requests


When a client issues requests to the server and the downstream application, we might potentially have a major design problem: how do we know in advance what kind of requests we might receive? If we had to re-implement a new set of standard requests every time we developed a web application, it would be difficult to reuse code and write generic services that other programs could call, since their requests would potentially have to change for every web application a client might interact with.

This is the problem solved by the HTTP standard, which describes a standard language and format in which requests are sent between servers and clients, allowing us to rely upon a common command syntax, which could be consumed by many different applications. While we could, in theory, issue some of these commands to our prediction service by pasting a URL into the address bar of our browser (such as GET, described below), this will only cover a subset of the kinds of requests...

Server – the web traffic controller


To run our prediction service, we need to communicate with external systems to receive requests to train a model, score new data, evaluate existing performance, or provide model parameter information. The web server performs this function, accepting incoming HTTP requests and forwarding them on to our web application either directly or through whatever middleware may be used.

Though we could have made many different choices of server in illustrating this example, we have chosen the CherryPy library because unlike other popular servers such as Apache Tomcat or Nginx, it is written in Python (allowing us to demonstrate its functionality inside a notebook) and is scalable, processing many requests in only a few milliseconds (http://www.aminus.org/blogs/index.php/2006/12/23/cherrypy_3_has_fastest_wsgi_server_yet.). The server is attached to a particular port, or endpoint (this is usually given in the format url:port), to which we direct requests that are then...

Persisting information with database systems


Our prediction service will use data in a number of ways. When we start the service, we have standard configurations we would like to retrieve (for example, the model parameters), and we might also like to log records of the requests that the application responds to for debugging purposes. As we score data or prepare trained models, we would ideally like to store these somewhere in case the prediction service needs to be restarted. Finally, as we will discuss in more detail, a database can allow us to keep track of application state (such as which tasks are in progress). For all these uses, a number of database systems can be applied.

Databases are generally categorized into two groups: relational and non-relational. Relational databases are probably familiar to you, as they are used in most business data warehouses. Data is stored in the form of tables, often with facts (such as purchases or search events) containing columns (such as user account...

Case study – logistic regression service


As an illustration of the architecture covered previously, let us look at an example of a prediction service that implements a logistic regression model. The model is both trained and scores new data using information passed through URLs (either through the web browser or invoking curl on the command line), and illustrates how these components fit together. We will also examine how we can interactively test these components using the same IPython notebooks as before, while also allowing us to seamlessly deploying the resulting code in an independent application.

Our first task is to set up the databases used to store the information used in modeling, as well as the result and model parameters.

Setting up the database

As a first step in our application, we will set up the database to store our training data and models, and scores obtained for new data. The examples for this exercise consist of data from a marketing campaign, where the objective was to...

Summary


In this chapter, we described the three components of a basic prediction service: a client, the server, and the web application. We discussed how this design allows us to share the results of predictive modelling with other users or software systems, and scale our modeling horizontally and modularly to meet the demands of various use cases. Our code examples illustrate how to create a prediction service with generic model and data parsing functions that can be reused as we try different algorithms for a particular business use case. By utilizing background tasks through Celery worker threads and distributed training and scoring on Spark, we showed how to potentially scale this application to large datasets while providing intermediate feedback to the client on task status. We also showed how an on-demand prediction utility could be used to generate real-time scores for streams of data through a REST API.

Using this prediction service framework, in the next chapter we will extend this...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Predictive Analytics with Python
Published in: Aug 2016Publisher: ISBN-13: 9781785882715
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Joseph Babcock

Joseph Babcock has spent more than a decade working with big data and AI in the e-commerce, digital streaming, and quantitative finance domains. Through his career he has worked on recommender systems, petabyte scale cloud data pipelines, A/B testing, causal inference, and time series analysis. He completed his PhD studies at Johns Hopkins University, applying machine learning to the field of drug discovery and genomics.
Read more about Joseph Babcock