Nowadays, topics such as cloud computing and mobile device service feeds, as well as other data sources driven by cutting-edge, scalable, stateless, and modern technologies such as RESTful web services, leave the impression that REST was invented recently. Well, to be honest, it definitely was not! In fact, REST has been here since the end of the 20th century.
This chapter will walk you through REST's fundamental principles, and it will explain how REST couples with the HTTP protocol. You will look into the five key principles that need to be considered while turning an HTTP application into a RESTful-service-enabled application. You will also look at the differences in describing RESTful and classic SOAP-based web services. Finally, you will learn how to utilize already existing infrastructure for your benefit.
In this chapter, we will cover the following topics:
- REST fundamentals
- REST with HTTP
- Essential differences in the description and discovery of RESTful services compared to classical SOAP-based services
- Taking advantage of existing infrastructure
It actually happened back in 1999, when a request for comments was submitted to the Internet Engineering Task Force (IETF: http://www.ietf.org/) via RFC 2616: "Hypertext Transfer Protocol-HTTP/1.1." One of its authors, Roy Fielding, later defined a set of principles built around the HTTP and URI standards. This gave birth to REST as we know it today.
Note
These definitions were given in Chapter 5, Representational State Transfer (REST), of Fielding's dissertation called Architectural Styles and the Design of Network-based Software Architectures. The dissertation is still available at http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm.
Let's look at the key principles around the HTTP and URI standards, sticking to which will make your HTTP application a RESTful-service-enabled application:
- Everything is a resource
- Each resource is identifiable by a unique identifier (URI)
- Use the standard HTTP methods
- Resources can have multiple representations
- Communicate statelessly
To understand this principle, one must conceive the idea of representing data by a specific format and not by a physical file. Each piece of data available on the Internet has a format that could be described by a content type. For example, JPEG images; MPEG videos; HTML, XML, and text documents; and binary data are all resources with the following content types: image/jpeg
, video/mpeg
, text/html
, text/xml
, and application/octet-stream
.
Since the Internet contains so many different resources, they all should be accessible via URIs and should be identified uniquely. Furthermore, the URIs can be in a human-readable format, despite the fact that their consumers are more likely to be software programs rather than ordinary humans.
Human-readable URIs keep data self-descriptive and ease further development against it. This helps you to reduce the risk of logical errors in your programs to a minimum.
Here are a few sample examples of such URIs:
These human-readable URIs expose different types of resources in a straightforward manner. In the example, it is quite clear that the media types of these resources are as follows:
- Images
- Videos
- XML documents
- Some kinds of binary archive documents
The native HTTP protocol (RFC 2616) defines eight actions, also known as HTTP verbs:
GET
POST
PUT
DELETE
HEAD
OPTIONS
TRACE
CONNECT
The first four of them feel just natural in the context of resources, especially when defining actions for resource data manipulation. Let's make a parallel with relative SQL databases where the native language for data manipulation is CRUD (short for Create, Read, Update, and Delete) originating from the different types of SQL statements: INSERT
, SELECT
, UPDATE
, and DELETE
, respectively. In the same manner, if you apply the REST principles correctly, the HTTP verbs should be used as shown here:
HTTP verb |
Action |
Response status code |
|
Requests an existing resource |
" |
|
Updates a resource or creates it as an identifier provided from the client |
" |
|
Creates a resource with an identifier generated at server side or updates a resource with an existing identifier provided from the client |
" |
|
Deletes a resource |
" |
Note that a resource can be created by either of POST
or PUT
HTTP verbs. When a resource has to be created under a specific URI with an identifier provided by the client, then PUT
is the appropriate action:
PUT /data/documents/balance/22082014 HTTP/1.1 Content-Type: text/xml Host: www.mydatastore.com <?xml version="1.0" encoding="utf-8"?> <balance date="22082014"> <Item>Sample item</Item> <price currency="EUR">100</price> </balance> HTTP/1.1 201 Created Content-Type: text/xml Location: /data/documents/balance/22082014
However, in your application, you may want to leave it up to the server REST application to decide where to place the newly created resource, and thus create it under an appropriate but still unknown or non-existing location.
For instance, in our example, we might want the server to create the date part of the URI based on the current date. In such cases, it is perfectly fine to use the POST
verb to the main resource URI and let the server respond with the location of the newly created resource:
POST /data/documents/balance HTTP/1.1 Content-Type: text/xml Host: www.mydatastore.com <?xml version="1.0" encoding="utf-8"?> <balance date="22082014"> <Item>Sample item</Item> <price currency="EUR">100</price> </balance> HTTP/1.1 201 Created Content-Type: text/xml Location: /data/documents/balance
A key feature of a resource is that it may be represented in a different form than the one it is stored. Thus, it can be requested or posted in different representations. As long as the specified format is supported, the REST-enabled endpoint should use it. In the preceding example, we posted an XML representation of a balance, but if the server supported the JSON format, the following request would have been valid as well:
POST /data/documents/balance HTTP/1.1 Content-Type: application/json Host: www.mydatastore.com { "balance": { "date": ""22082014"", "Item": "Sample item", "price": { "-currency": "EUR", "#text": "100" } } } HTTP/1.1 201 Created Content-Type: application/json Location: /data/documents/balance
Resource manipulation operations through HTTP requests should always be considered atomic. All modifications of a resource should be carried out within an HTTP request in isolation. After the request execution, the resource is left in a final state, which implicitly means that partial resource updates are not supported. You should always send the complete state of the resource.
Back to the balance example, updating the price field of a given balance would mean posting a complete JSON document that contains all of the balance data, including the updated price field. Posting only the updated price is not stateless, as it implies that the application is aware that the resource has a price field, that is, it knows its state.
Another requirement for your RESTful application is to be stateless; the fact that once deployed in a production environment, it is likely that incoming requests are served by a load balancer, ensuring scalability and high availability. Once exposed via a load balancer, the idea of keeping your application state at server side gets compromised. This doesn't mean that you are not allowed to keep the state of your application. It just means that you should keep it in a RESTful way. For example, keep a part of the state within the URI.
The statelessness of your RESTful API isolates the caller against changes at the server side. Thus, the caller is not expected to communicate with the same server in consecutive requests. This allows easy application of changes within the server infrastructure, such as adding or removing nodes.
Tip
Remember that it is your responsibility to keep your RESTful APIs stateless, as the consumers of the API would expect it to be.
Now that you know that REST is around 15 years old, a sensible question would be, "why has it become so popular just quite recently?" My answer to the question is that we as developers usually reject simple, straightforward approaches, and most of the time, prefer spending more time on turning already complex solutions into even more complex and sophisticated ones.
Take classical SOAP web services for example. Their various WS-* specifications are so many and sometimes so loosely defined, that in order to make different solutions from different vendors interoperable, a separate specification WS-Basic Profile has been introduced.It defines extra interoperability rules in order to ensure that all WS-* specifications in SOAP-based web services can work together.
When it comes to transporting binary data with classical web services over HTTP, things get even more complex—as SOAP-based web services provide different ways of transporting binary data. Each way is defined in other sets of specifications such as SOAP with Attachment References (SwaRef) and Message Transmission Optimization Mechanism (MTOM). All this complexity was caused mainly because the initial idea of the web service was to execute business logic remotely, not to transport large amounts of data.
Well, I personally think that when it comes to data transfer, things should not be that complex. This is where REST comes into play, by introducing the concept of resources and a standard means for manipulating them.
Now that we've covered the main REST principles, let's dive deeper into what can be achieved when they are followed:
- Separation of the representation and the resource
- Visibility
- Reliability
- Scalability
- Performance
A resource is just a set of information, and as defined by Principle 4, it can have multiple representations; however, its state is atomic. It is up to the caller to specify the desired media type with the accept header in the HTTP request, and then it is up to the server application to handle the representation accordingly and return the appropriate content type of the resource and a relevant HTTP status code:
HTTP 200 OK
in the case of successHTTP 400 Bad Request
if an unsupported content type is requested or for any other invalid requestHTTP 500 Internal Server Error
when something unexpected happens during the request processing
For instance, let's assume that at server side, we have balance resources stored in an XML format. We can have an API that allows a consumer to request the resource in various formats, such as application/json
, application/zip
, application/octet-stream
, and so on.
It would be up to the API itself to load the requested resource, transform it into the requested type (for example, JSON or XML), and either use ZIP to compress it or directly flush it to the HTTP response output.
The caller can make use of the Accept
HTTP header to specify the expected media type of the response data. So, if we want to request our balance data inserted in the previous section in XML format, the following request should be executed:
GET /data/balance/22082014 HTTP/1.1 Host: my-computer-hostname Accept: text/xml HTTP/1.1 200 OK Content-Type: text/xml Content-Length: 140 <?xml version="1.0" encoding="utf-8"?> <balance date="22082014"> <Item>Sample item</Item> <price currency="EUR">100</price> </balance>
To request the same balance in JSON format, the Accept
header needs to be set to application/json
:
GET /data/balance/22082014 HTTP/1.1 Host: my-computer-hostname Accept: application/json HTTP/1.1 200 OK Content-Type: application/json Content-Length: 120 { "balance": { "date": "22082014", "Item": "Sample item", "price": { "-currency": "EUR", "#text": "100" } } }
REST is designed to be visible and simple. Visibility of the service means that every aspect of it should self-descriptive and follow the natural HTTP language according to principles 3, 4, and 5.
Visibility in the context of the outer world would mean that monitoring applications would be interested only in the HTTP communication between the REST service and the caller. Since the requests and responses are stateless and atomic, nothing more is needed to flow the behavior of the application and to understand whether anything has gone wrong.
Tip
Remember that caching reduces the visibility of your restful applications and in general should be avoided, unless needed for serving resources subject to large amounts of callers. In such cases, caching may be an option, after carefully evaluating the possible consequences of serving obsolete data.
Before talking about reliability, we need to define which HTTP methods are safe and which are idempotent in the REST context. So, let's first define what safe and idempotent methods are:
- An HTTP method is considered to be safe provided that when requested, it does not modify or cause any side effects on the state of the resource
- An HTTP method is considered to be idempotent if its response is always the same, no matter how many times it is requested
The following table lists shows you which HTTP method is safe and which is idempotent:
HTTP Method |
Safe |
Idempotent |
|
Yes |
Yes |
|
No |
No |
|
No |
Yes |
|
No |
Yes |
So far, I have often stressed the importance of having stateless implementation and stateless behavior for a RESTful web application. The World Wide Web (WWW) is an enormous universe, containing a huge amount of data and a lot of users eager to get that data. The evolution of the WWW has brought the requirement that applications should scale easily as their load increases. Scaling applications that have a state is hardly possible, especially when zero or close-to-zero downtime is needed.
That's why being stateless is crucial for any application that needs to scale. In the best-case scenario, scaling your application would require you to put another piece of hardware for a load balancer. There would be no need for the different nodes to sync between each other, as they should not care about the state at all. Scalability is all about serving all your clients in an acceptable amount of time. Its main idea is to keep your application running and to prevent Denial of Service (DoS) caused by a huge amount of incoming requests.
Scalability should not be confused with performance of an application. Performance is measured by the time needed for a single request to be processed, not by the total number of requests that the application can handle. The asynchronous non-blocking architecture and event-driven design of Node.js make it a logical choice for implementing an application that scales and performs well.
If you are familiar with SOAP web services, you may have heard of the Web Service Definition Language (WSDL). It is an XML description of the interface of the service and defines an endpoint URL for invocation. It is mandatory for a SOAP web service to be described by such a WSDL definition.
Similar to SOAP web services, RESTful services can also make use of a description language, called WADL. WADL stands for Web Application Definition Language. Unlike WSDL for SOAP web services, a WADL description of a RESTful service is optional, that is, consuming the service has nothing to do with its description.
Here is a sample part of a WADL file that describes the GET
operation of our balance service:
<application xmlns="http://wadl.dev.java.net/2009/02" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:service="http://localhost:8080/data/balance"> <grammer> <include href="balance.xsd"/> <include href="error.xsd"/> </grammer> <resources base="http://localhost:8080/data/balance/"> <resource path="{date}"> <method name="GET"> <request> <param name="date" type="xsd:string" style="template"/> </request> <response status="200"> <representation mediaType="application/xml" element="service:balance"/> <representation mediaType="application/json" /> </response> <response status="404"> <representation mediaType="application/xml" element="service:balance"/> </response> </method> </resource> </resources> </application>
This extract of a WADL file shows how application-exposing resources are described. Basically, each resource must be a part of an application. The resource provides the URI where it is located with the base attribute, and describes each of its supported HTTP methods in a method. Additionally, an optional doc element can be used at resource and application to provide additional documentation about the service and its operations.
Though WADL is optional, it significantly reduces the efforts of discovering RESTful services.
The best part of developing and distributing RESTful applications is that the infrastructure needed is already out there waiting restlessly for you. As RESTful applications use the existing web space heavily, you need to do nothing more than follow the REST principles when developing. In addition, there are plenty of libraries available out there for any platform, and I do mean any given platform. This eases development of RESTful applications, so you just need to choose the preferred platform for you and start developing.
In this chapter, you learned about foundation of a REST, looking at the five key principles that turn a web application into a REST-enabled application. We made a slight comparison between RESTful services and classical SOAP web services, and finally took a look at how RESTful services are described and how we can simplify the discovery of the services we develop.
Now that you know the basics, we are ready to dive into the Node.js way of implementing RESTful services. In the next chapter, you will learn about the essentials of Node.js and the accompanying tools that are necessary to use and understand in order to build a real-life fully-fledged web service.