Reader small image

You're reading from  Scala for Data Science

Product typeBook
Published inJan 2016
Reading LevelIntermediate
Publisher
ISBN-139781785281372
Edition1st Edition
Languages
Right arrow
Author (1)
Pascal Bugnion
Pascal Bugnion
author image
Pascal Bugnion

Pascal Bugnion is a data engineer at the ASI, a consultancy offering bespoke data science services. Previously, he was the head of data engineering at SCL Elections. He holds a PhD in computational physics from Cambridge University. Besides Scala, Pascal is a keen Python developer. He has contributed to NumPy, matplotlib and IPython. He also maintains scikit-monaco, an open source library for Monte Carlo integration. He currently lives in London, UK.
Read more about Pascal Bugnion

Right arrow

Chapter 13. Web APIs with Play

In the first 12 chapters of this book, we introduced basic tools and libraries for anyone wanting to build data science applications: we learned how to interact with SQL and MongoDB databases, how to build fast batch processing applications using Spark, how to apply state-of-the-art machine learning algorithms using MLlib, and how to build modular concurrent applications in Akka.

In the last chapters of this book, we will branch out to look at a web framework: Play. You might wonder why a web framework would feature in a data science book; surely such topics are best left to software engineers or web developers. Data scientists, however, rarely exist in a vacuum. They often need to communicate results or insights to stakeholders. As compelling as an ROC curve may be to someone well versed in statistics, it may not carry as much weight with less technical people. Indeed, it can be much easier to sell insights when they are accompanied by an engaging visualization...

Client-server applications


A website works through the interaction between two computers: the client and the server. If you enter the URL www.github.com/pbugnion/s4ds/graphs in a web browser, your browser queries one of the GitHub servers. The server will look though its database for information concerning the repository that you are interested in. It will serve this information as HTML, CSS, and JavaScript to your computer. Your browser is then responsible for interpreting this response in the correct way.

If you look at the URL in question, you will notice that there are several graphs on that page. Unplug your internet connection and you can still interact with the graphs. All the information necessary for interacting with the graphs was transferred, as JavaScript, when you loaded that webpage. When you play with the graphs, the CPU cycles necessary to make those changes happen are spent on your computer, not a GitHub server. The code is executed client-side. Conversely, when you request...

Introduction to web frameworks


This section is a brief introduction to how modern web applications are designed. Go ahead and skip it if you already feel comfortable writing backend code.

Loosely, a web framework is a set of tools and code libraries for building web applications. To understand what a web framework provides, let's take a step back and think about what you would need to do if you did not have one.

You want to write a program that listens on port 80 and sends HTML (or JSON or XML) back to clients that request it. This is simple if you are serving the same file back to every client: just load the HTML from file when you start the server, and send it to clients who request it.

So far, so good. But what if you now want to customize the HTML based on the client request? You might choose to respond differently based on part of the URL that the client put in his browser, or based on specific elements in the HTTP request. For instance, the product page on amazon.com is different to the...

Model-View-Controller architecture


Many web frameworks impose program architectures: it is difficult to provide wires to bind disparate components together without making some assumptions about what those components are. The Model-View-Controller (MVC) architecture is particularly popular on the Web, and it is the architecture the Play framework assumes. Let's look at each component in turn:

  • The model is the data underlying the application. For example, I expect the application underlying GitHub has models for users, repositories, organizations, pull requests and so on. In the Play framework, a model is often an instance of a case class. The core responsibility of the model is to remember the current state of the application.

  • Views are representations of a model or a set of models on the screen.

  • The controller handles client interactions, possibly changing the model. For instance, if you star a project on GitHub, the controller will update the relevant models. Controllers normally carry very...

Single page applications


The client-server duality adds a degree of complication to the elegant MVC architecture. Where should the model reside? What about the controller? Traditionally, the model and the controller ran almost entirely on the server, which just pushed the relevant HTML view to the client.

The growth in client-side JavaScript frameworks, such AngularJS, has resulted in a gradual shift to putting more code in the client. Both the controller and a temporary version of the model typically run client-side. The server just functions as a web API: if, for instance, the user updates the model, the controller will send an HTTP request to the server informing it of the change.

It then makes sense to think of the program running server-side and the one running client-side as two separate applications: the server persists data in databases, for instance, and provides a programmatic interface to this data, usually as a web service returning JSON or XML data. The client-side program maintains...

Building an application


In this chapter and the next, we will build a single-page application that relies on an API written in Play. We will build a webpage that looks like this:

The user enters the name of someone on GitHub and can view a list of their repositories and a chart summarizing what language they use. You can find the application deployed at app.scala4datascience.com. Go ahead and give it a whirl.

To get a glimpse of the innards, type app.scala4datascience.com/api/repos/odersky. This returns a JSON object like:

[{"name":"dotty","language":"Scala","is_fork":true,"size":14653},
{"name":"frontend","language":"JavaScript","is_fork":true,"size":392},
{"name":"legacy-svn-scala","language":"Scala","is_fork":true,"size":296706},
...

We will build the API in this chapter, and write the front-end code in the next chapter.

The Play framework


The Play framework is a web framework built on top of Akka. It has a proven track record in industry, and is thus a reliable choice for building scalable web applications.

Play is an opinionated web framework: it expects you to follow the MVC architecture, and it has a strong opinion about the tools you should be using. It comes bundled with its own JSON and XML parsers, with its own tools for accessing external APIs, and with recommendations for how to access databases.

Web applications are much more complex than the command line scripts we have been developing in this book, because there are many more components: the backend code, routing information, HTML templates, JavaScript files, images, and so on. The Play framework makes strong assumptions about the directory structure for your project. Building that structure from scratch is both mind-numbingly boring and easy to get wrong. Fortunately, we can use Typesafe activators to bootstrap the project (you can also download...

Dynamic routing


Routing, as we saw, is the mapping of HTTP requests to Scala handlers. Routes are stored in conf/routes. A route is defined by an HTTP verb, followed by the end-point, followed by a Scala function:

// verb   // end-point              // Scala handler
GET       /                         controllers.Application.index

We learnt to add new routes by just adding lines to the routes file. We are not limited to static routes, however. The Play framework lets us include wild cards in routes. The value of the wild card can be passed as an argument to the controller. To see how this works, let's create a controller that takes the name of a person as argument. In the Application object in app.controllers, add:

// app/controllers/Application.scala

class Application extends Controller {

  ...

  def hello(name:String) = Action {
    Ok(s"hello, $name")
  }
}

We can now define a route handled by this controller:

// conf/routes
GET  /hello/:name             controllers.Application.hello(name...

Actions


We have talked about routes, and how to pass parameters to controllers. Let's now talk about what we can do with the controller.

The method defined in the route must return a play.api.mvc.Action instance. The Action type is a thin wrapper around the type Request[A] => Result, where Request[A] identifies an HTTP request and Result is an HTTP response.

Composing the response

An HTTP response, as we saw in Chapter 7, Web APIs, is composed of:

  • the status code (such as 200 for a successful response, or 404 for a missing page)

  • the response headers, a key-value list indicating metadata related to the response

  • The response body. This can be HTML for web pages, or JSON, XML or plain text (or many other formats). This is generally the bit that we are really interested in.

The Play framework defines a play.api.mvc.Result object that symbolizes a response. The object contains a header attribute with the status code and the headers, and a body attribute containing the body.

The simplest way to generate...

Interacting with JSON


JSON, as we discovered in previous chapters, is becoming the de-facto language for communicating structured data over HTTP. If you develop a web application or a web API, it is likely that you will have to consume or emit JSON, or both.

In Chapter 7, Web APIs, we learned how to parse JSON through json4s. The Play framework includes its own JSON parser and emitter. Fortunately, it behaves in much the same way as json4s.

Let's imagine that we are building an API that summarizes information about GitHub repositories. Our API will emit a JSON array listing a user's repositories when queried about a specific user (much like the GitHub API, but with just a subset of fields).

Let's start by defining a model for the repository. In Play applications, models are normally stored in the folder app/models, in the models package:

// app/models/Repo.scala

package models

case class Repo (
  val name:String,
  val language:String,
  val isFork: Boolean,
  val size: Long
)

Let's add a route...

Querying external APIs and consuming JSON


So far, we have learnt how to provide the user with a dummy JSON array of repositories in response to a request to /api/repos/:username. In this section, we will replace the dummy data with the user's actual repositories, dowloaded from GitHub.

In Chapter 7, Web APIs, we learned how to query the GitHub API using Scala's Source.fromURL method and scalaj-http. It should come as no surprise that the Play framework implements its own library for interacting with external web services.

Let's edit the Api controller to fetch information about a user's repositories from GitHub, rather than using dummy data. When called with a username as argument, the controller will:

  1. Send a GET request to the GitHub API for that user's repositories.

  2. Interpret the response, converting the body from a JSON object to a List[Repo].

  3. Convert from the List[Repo] to a JSON array, forming the response.

We start by giving the full code listing before explaining the thornier parts in detail...

Creating APIs with Play: a summary


In the last section, we deployed an API that responds to GET requests. Since this is a lot to take in, let's summarize how to go about API creation:

  1. Define appropriate routes in /conf/routes, using wildcards in the URL as needed.

  2. Create Scala case classes in /app/models to represent the models used by the API.

  3. Create Write[T] methods to write models to JSON or XML so that they can be returned by the API.

  4. Bind the routes to controllers. If the controllers need to do more than a trivial amount a work, wrap the work in a future to avoid blocking the server.

There are many more useful components of the Play framework that you are likely to need, such as, for instance, how to use Slick to access SQL databases. We do not, unfortunately, have time to cover these in this introduction. The Play framework has extensive, well-written documentation that will fill the gaping holes in this tutorial.

Rest APIs: best practice


As the Internet matures, REST (representational state transfer) APIs are emerging as the most reliable design pattern for web APIs. An API is described as RESTful if it follows these guiding principles:

  • The API is designed as a set of resources. For instance, the GitHub API provides information about users, repositories, followers, etc. Each user, or repository, is a specific resource. Each resource can be addressed through a different HTTP end-point.

  • The URLs should be simple and should identify the resource clearly. For instance, api.github.com/users/odersky is simple and tells us clearly that we should expect information about the user Martin Odersky.

  • There is no world resource that contains all the information about the system. Instead, top-level resources contain links to more specialized resources. For instance, the user resource in the GitHub API contains links to that user's repositories and that user's followers, rather than having all that information embedded...

Summary


In this chapter, we introduced the Play framework as a tool for building web APIs. We built an API that returns a JSON array of a user's GitHub repositories. In the next chapter, we will build on this API and construct a single-page application to represent this data graphically.

References


lock icon
The rest of the chapter is locked
You have been reading a chapter from
Scala for Data Science
Published in: Jan 2016Publisher: ISBN-13: 9781785281372
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Pascal Bugnion

Pascal Bugnion is a data engineer at the ASI, a consultancy offering bespoke data science services. Previously, he was the head of data engineering at SCL Elections. He holds a PhD in computational physics from Cambridge University. Besides Scala, Pascal is a keen Python developer. He has contributed to NumPy, matplotlib and IPython. He also maintains scikit-monaco, an open source library for Monte Carlo integration. He currently lives in London, UK.
Read more about Pascal Bugnion