Packt+ | Advance your knowledge in tech

You're reading from Scala for Data Science

Product typeBook

Published inJan 2016

Reading LevelIntermediate

Publisher

ISBN-139781785281372

Edition1st Edition

Languages

Scala

Concepts

Application Development

Author (1)

Pascal Bugnion

Chapter 13. Web APIs with Play

In the first 12 chapters of this book, we introduced basic tools and libraries for anyone wanting to build data science applications: we learned how to interact with SQL and MongoDB databases, how to build fast batch processing applications using Spark, how to apply state-of-the-art machine learning algorithms using MLlib, and how to build modular concurrent applications in Akka.

In the last chapters of this book, we will branch out to look at a web framework: Play. You might wonder why a web framework would feature in a data science book; surely such topics are best left to software engineers or web developers. Data scientists, however, rarely exist in a vacuum. They often need to communicate results or insights to stakeholders. As compelling as an ROC curve may be to someone well versed in statistics, it may not carry as much weight with less technical people. Indeed, it can be much easier to sell insights when they are accompanied by an engaging visualization...

Client-server applications

A website works through the interaction between two computers: the client and the server. If you enter the URL www.github.com/pbugnion/s4ds/graphs in a web browser, your browser queries one of the GitHub servers. The server will look though its database for information concerning the repository that you are interested in. It will serve this information as HTML, CSS, and JavaScript to your computer. Your browser is then responsible for interpreting this response in the correct way.

If you look at the URL in question, you will notice that there are several graphs on that page. Unplug your internet connection and you can still interact with the graphs. All the information necessary for interacting with the graphs was transferred, as JavaScript, when you loaded that webpage. When you play with the graphs, the CPU cycles necessary to make those changes happen are spent on your computer, not a GitHub server. The code is executed client-side. Conversely, when you request...

Introduction to web frameworks

This section is a brief introduction to how modern web applications are designed. Go ahead and skip it if you already feel comfortable writing backend code.

Loosely, a web framework is a set of tools and code libraries for building web applications. To understand what a web framework provides, let's take a step back and think about what you would need to do if you did not have one.

You want to write a program that listens on port 80 and sends HTML (or JSON or XML) back to clients that request it. This is simple if you are serving the same file back to every client: just load the HTML from file when you start the server, and send it to clients who request it.

So far, so good. But what if you now want to customize the HTML based on the client request? You might choose to respond differently based on part of the URL that the client put in his browser, or based on specific elements in the HTTP request. For instance, the product page on amazon.com is different to the...

Model-View-Controller architecture

Many web frameworks impose program architectures: it is difficult to provide wires to bind disparate components together without making some assumptions about what those components are. The Model-View-Controller (MVC) architecture is particularly popular on the Web, and it is the architecture the Play framework assumes. Let's look at each component in turn:

The model is the data underlying the application. For example, I expect the application underlying GitHub has models for users, repositories, organizations, pull requests and so on. In the Play framework, a model is often an instance of a case class. The core responsibility of the model is to remember the current state of the application.
Views are representations of a model or a set of models on the screen.
The controller handles client interactions, possibly changing the model. For instance, if you star a project on GitHub, the controller will update the relevant models. Controllers normally carry very...

Single page applications

The client-server duality adds a degree of complication to the elegant MVC architecture. Where should the model reside? What about the controller? Traditionally, the model and the controller ran almost entirely on the server, which just pushed the relevant HTML view to the client.

The growth in client-side JavaScript frameworks, such AngularJS, has resulted in a gradual shift to putting more code in the client. Both the controller and a temporary version of the model typically run client-side. The server just functions as a web API: if, for instance, the user updates the model, the controller will send an HTTP request to the server informing it of the change.

It then makes sense to think of the program running server-side and the one running client-side as two separate applications: the server persists data in databases, for instance, and provides a programmatic interface to this data, usually as a web service returning JSON or XML data. The client-side program maintains...

Building an application

In this chapter and the next, we will build a single-page application that relies on an API written in Play. We will build a webpage that looks like this:

The user enters the name of someone on GitHub and can view a list of their repositories and a chart summarizing what language they use. You can find the application deployed at app.scala4datascience.com. Go ahead and give it a whirl.

To get a glimpse of the innards, type app.scala4datascience.com/api/repos/odersky. This returns a JSON object like:

[{"name":"dotty","language":"Scala","is_fork":true,"size":14653},
{"name":"frontend","language":"JavaScript","is_fork":true,"size":392},
{"name":"legacy-svn-scala","language":"Scala","is_fork":true,"size":296706},
...

We will build the API in this chapter, and write the front-end code in the next chapter.

The Play framework

The Play framework is a web framework built on top of Akka. It has a proven track record in industry, and is thus a reliable choice for building scalable web applications.

Play is an opinionated web framework: it expects you to follow the MVC architecture, and it has a strong opinion about the tools you should be using. It comes bundled with its own JSON and XML parsers, with its own tools for accessing external APIs, and with recommendations for how to access databases.

Web applications are much more complex than the command line scripts we have been developing in this book, because there are many more components: the backend code, routing information, HTML templates, JavaScript files, images, and so on. The Play framework makes strong assumptions about the directory structure for your project. Building that structure from scratch is both mind-numbingly boring and easy to get wrong. Fortunately, we can use Typesafe activators to bootstrap the project (you can also download...

Dynamic routing

Routing, as we saw, is the mapping of HTTP requests to Scala handlers. Routes are stored in conf/routes. A route is defined by an HTTP verb, followed by the end-point, followed by a Scala function:

// verb   // end-point              // Scala handler
GET       /                         controllers.Application.index

We learnt to add new routes by just adding lines to the routes file. We are not limited to static routes, however. The Play framework lets us include wild cards in routes. The value of the wild card can be passed as an argument to the controller. To see how this works, let's create a controller that takes the name of a person as argument. In the Application object in app.controllers, add:

// app/controllers/Application.scala

class Application extends Controller {

  ...

  def hello(name:String) = Action {
    Ok(s"hello, $name")
  }
}

We can now define a route handled by this controller:

// conf/routes
GET  /hello/:name             controllers.Application.hello(name...

Actions

We have talked about routes, and how to pass parameters to controllers. Let's now talk about what we can do with the controller.

The method defined in the route must return a play.api.mvc.Action instance. The Action type is a thin wrapper around the type Request[A] => Result, where Request[A] identifies an HTTP request and Result is an HTTP response.

Composing the response

An HTTP response, as we saw in Chapter 7, Web APIs, is composed of:

the status code (such as 200 for a successful response, or 404 for a missing page)
the response headers, a key-value list indicating metadata related to the response
The response body. This can be HTML for web pages, or JSON, XML or plain text (or many other formats). This is generally the bit that we are really interested in.

The Play framework defines a play.api.mvc.Result object that symbolizes a response. The object contains a header attribute with the status code and the headers, and a body attribute containing the body.

The simplest way to generate...

Interacting with JSON

JSON, as we discovered in previous chapters, is becoming the de-facto language for communicating structured data over HTTP. If you develop a web application or a web API, it is likely that you will have to consume or emit JSON, or both.

In Chapter 7, Web APIs, we learned how to parse JSON through json4s. The Play framework includes its own JSON parser and emitter. Fortunately, it behaves in much the same way as json4s.

Let's imagine that we are building an API that summarizes information about GitHub repositories. Our API will emit a JSON array listing a user's repositories when queried about a specific user (much like the GitHub API, but with just a subset of fields).

Let's start by defining a model for the repository. In Play applications, models are normally stored in the folder app/models, in the models package:

// app/models/Repo.scala

package models

case class Repo (
  val name:String,
  val language:String,
  val isFork: Boolean,
  val size: Long
)

Let's add a route...

Querying external APIs and consuming JSON

So far, we have learnt how to provide the user with a dummy JSON array of repositories in response to a request to /api/repos/:username. In this section, we will replace the dummy data with the user's actual repositories, dowloaded from GitHub.

In Chapter 7, Web APIs, we learned how to query the GitHub API using Scala's Source.fromURL method and scalaj-http. It should come as no surprise that the Play framework implements its own library for interacting with external web services.

Let's edit the Api controller to fetch information about a user's repositories from GitHub, rather than using dummy data. When called with a username as argument, the controller will:

Send a GET request to the GitHub API for that user's repositories.
Interpret the response, converting the body from a JSON object to a List[Repo].
Convert from the List[Repo] to a JSON array, forming the response.

We start by giving the full code listing before explaining the thornier parts in detail...

Creating APIs with Play: a summary

In the last section, we deployed an API that responds to GET requests. Since this is a lot to take in, let's summarize how to go about API creation:

Define appropriate routes in /conf/routes, using wildcards in the URL as needed.
Create Scala case classes in /app/models to represent the models used by the API.
Create Write[T] methods to write models to JSON or XML so that they can be returned by the API.
Bind the routes to controllers. If the controllers need to do more than a trivial amount a work, wrap the work in a future to avoid blocking the server.

There are many more useful components of the Play framework that you are likely to need, such as, for instance, how to use Slick to access SQL databases. We do not, unfortunately, have time to cover these in this introduction. The Play framework has extensive, well-written documentation that will fill the gaping holes in this tutorial.

Rest APIs: best practice

As the Internet matures, REST (representational state transfer) APIs are emerging as the most reliable design pattern for web APIs. An API is described as RESTful if it follows these guiding principles:

The API is designed as a set of resources. For instance, the GitHub API provides information about users, repositories, followers, etc. Each user, or repository, is a specific resource. Each resource can be addressed through a different HTTP end-point.
The URLs should be simple and should identify the resource clearly. For instance, api.github.com/users/odersky is simple and tells us clearly that we should expect information about the user Martin Odersky.
There is no world resource that contains all the information about the system. Instead, top-level resources contain links to more specialized resources. For instance, the user resource in the GitHub API contains links to that user's repositories and that user's followers, rather than having all that information embedded...

Summary

In this chapter, we introduced the Play framework as a tool for building web APIs. We built an API that returns a JSON array of a user's GitHub repositories. In the next chapter, we will build on this API and construct a single-page application to represent this data graphically.

References

This Wikipedia page gives information on semantic URLs: https://en.wikipedia.org/wiki/Semantic_URL and http://apiux.com/2013/04/03/url-design-restful-web-services/.
For a much more in depth discussion of the Play framework, I suggest Play Framework Essentials by Julien Richard-Foy.
REST in Practice: Hypermedia and Systems Architecture, by Jim Webber, Savas Parastatidis and Ian Robinson describes how to architect REST APIs.

The rest of the chapter is locked

You have been reading a chapter from

Scala for Data Science

Published in: Jan 2016Publisher: ISBN-13: 9781785281372

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Pascal Bugnion

Pascal Bugnion is a data engineer at the ASI, a consultancy offering bespoke data science services. Previously, he was the head of data engineering at SCL Elections. He holds a PhD in computational physics from Cambridge University. Besides Scala, Pascal is a keen Python developer. He has contributed to NumPy, matplotlib and IPython. He also maintains scikit-monaco, an open source library for Monte Carlo integration. He currently lives in London, UK.
Read more about Pascal Bugnion

Personalised recommendations for you

Based on your interests and search pattern

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

BookSep 2023288 pages

Expert C++

Discover advanced programming techniques, the latest features of C++17 and C++20, and best practices for memory management, debugging, testing, and large-scale application design with Expert C++. Ideal for experienced developers advancing to proficient programmers and building professional-grade C++ applications.

BookAug 2023604 pages

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

BookOct 2023604 pages4

Developer Career Masterplan

Written by industry experts that have spent the last 20+ years helping developers grow their career path towards senior developer positions and beyond. This book provides a comprehensive guide, sharing examples and stories from their global careers. By the end, you’ll have the knowledge to create a clear career progression plan as a technical professional.

BookSep 2023310 pages

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

BookNov 2023434 pages

Python Real-World Projects

Amplify your developer journey by curating a dynamic project portfolio that outshines traditional resumes. Delve into the Python realm through immersive projects, mastering core concepts while constructing comprehensive modules and applications. From data acquisition prowess to impactful data visualization, Python Real-World Projects arms you with essential skills to beat the competition.

BookSep 2023478 pages5

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

BookNov 2023386 pages

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Quantum Computing Algorithms

The book emphasizes intuitive ideas behind quantum algorithms in ways that other books don’t cover, striking a careful balance between no math and too much math. To get the most from this book, you should be comfortable with basic algebra and writing simple computer code. No prior understanding of quantum physics is needed to get started.

BookSep 2023342 pages

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5