Mastering Akka

5 (3 reviews total)
By Christian Baxter
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Building a Better Reactive App

About this book

For a programmer, writing multi-threaded applications is critical as it is important to break large tasks into smaller ones and run them simultaneously. Akka is a distributed computing toolkit that uses the abstraction of the Actor model, enabling developers to build correct, concurrent, and distributed applications using Java and Scala with ease.

The book begins with a quick introduction that simplifies concurrent programming with actors. We then proceed to master all aspects of domain-driven design. We’ll teach you how to scale out with Akka remoting/clustering. Finally, we introduce Conductr as a means to deploy to and manage microservices across a cluster.

Publication date:
October 2016
Publisher
Packt
Pages
436
ISBN
9781786465023

 

Chapter 1. Building a Better Reactive App

This book is meant to be geared towards the more experienced Scala and Akka developers looking to build reactive applications on top of the Akka platform.

This book is written for an engineer who has already leveraged Akka in the 2.3.x series and below to build reactive applications. You have a firm understanding of the actor model and how the Akka framework leverages actors to build highly scalable, concurrent, asynchronous, event-driven, and fault-tolerant applications. You've seen the new changes rolled out in Akka 2.4.2 and are curious about how some of these new features such as Akka Streams and Akka HTTP can be leveraged within your reactive applications.

This book will serve as a guide for an engineer who wants to take a functional but flawed reactive application and, through a series of refactors, make improvements to it. It will help you understand what some of the common pitfalls are when building Akka applications. Throughout the various chapters in the book, you will learn how to use Akka and some of the newer features to address the following shortcomings:

  • Building a more domain-centric model using domain-driven design

  • Using event sourcing and Akka Persistence for high throughput persistence

  • Understanding reactive streams and how Akka makes use of them in Akka Streams and Akka HTTP

  • Decomposing a monolith into a set of fully decoupled and independent services

 

Understanding the initial example app


Imagine you woke up one morning and decided that you were going to take down the mighty Amazon.com. They've spread themselves too thin in trying to sell anything and everything the world has to offer. You see an opportunity back in their original space of online book selling and have started a company to challenge them in that area.

Over the past few months, you and your team of engineers have built out a simple Minimum Viable Product (MVP) reactive bookstore application build on top of the Akka 2.3.x series. As this is an MVP, it's pretty basic, but it has served its purpose of getting something to the market quickly to establish a user base and get good feedback to iterate on. The current application covers the following subdomains within the overall domain of a bookstore application:

  • User management

  • Book (inventory) management

  • Credit card processing

  • Sales order processing

The code that represents the initial application can be found in the initial-example-app folder within the code distribution for this book. The application is an sbt multi-project build with the following individual projects:

  • common: Common utilities and a shared domain model

  • user-services: User management related services

  • book-services: Book management related services

  • credit-services: Credit card processing services

  • sales-services: Sales order processing services

  • server: A single project that aggregates the individual service projects and contains a main method to launch the server

If you were to look at the different projects from a dependency view, they would look like this:

The individual services subprojects were set up to avoid any direct code dependencies to each other. If services in different modules need to communicate with each other, then they use the shared domain model (entities and messages) from the common project as the protocol. The initial intention was to allow each module to eventually be built and deployed independent of each other, even though currently it's built together and deployed as a monolith.

Each service module is made up of HTTP endpoint classes, services, and, in some cases, Data Access Objects (DAOs). The endpoint classes are built on top of the Unfiltered library and allow inbound, REST-oriented HTTP requests to be serviced asynchronously, using Netty under the hood. The service business logic is modeled using the actor model and implemented with Akka actors that are called from the endpoints. Service actors that need to talk to the relational Postgres db do so via DAOs that reside within the same .scala files as the services that use them. The DAOs use Lightbend's Slick library with the Plain SQL approach to talk to Postgres.

The way the application is currently structured, an inbound request that talks to the db would be handled as follows:

  1. The HTTP request comes in and is handled by Netty's NIO channel handling code on top of its own thread pool.

  2. Netty passes the code off to the Unfiltered framework for handling, still using Netty's thread pool.

  3. Unfiltered looks up a service actor via actor selection and uses the ask pattern to send it a message, returning a Future that will hold the result of the service call.

  4. The service actor that receives the message is running on the actor system's main Fork/Join thread pool.

  5. The actor talks to the Postgres db via the Slick DAO. The SQL itself runs within Slick's AsycExecutor system, on top of another separate thread pool.

  6. The actor sends a response back to the sender (the Future from the endpoint) using the pipe pattern.

  7. The Future in the endpoint, which runs on the actor system's dispatcher, is completed, which results in a response being communicated through Unfiltered and Netty and then back into the wire.

As you can see from the preceding steps, there are a few different thread pools involved in the servicing of the request, but it's done completely asynchronously. The only real blocking done in this flow is the JDBC calls done via Slick (sadly, JDBC has yet to incorporate async calls into the API). Thankfully though, those blocking calls are isolated behind Slick's own thread pool.

If you've played with Akka long enough, you know it's taboo to block in the actor system's main Fork/Join pool. Akka is built around the concept of using very few threads to do a lot of work. If you start blocking the dispatcher threads themselves, then your actors can suffer from thread starvation and fall behind in the processing of their mailboxes. This will lead to higher latency in call times, upsetting end users, and nobody wins when that happens. We avoided such problems with this app, so we can pat ourselves on the back for that.

You should take a little time in going through the example code to understand how everything is wired together. Understanding this example app is critical as this serves as the foundation for our progressive refactoring. Spending a little time upfront getting familiar with things will help as different sections are discussed in the upcoming chapters.

The app itself is not perfect. It has intentional shortcomings to give us something to refactor. It was certainly beyond the scope of this book to have me build out a fully functioning, coherent, and production-ready storefront application. The code is just a medium in which to communicate some of the flawed ways in which an Akka reactive application could be put together. It was purposely built as a lead-in to discover some of the newer features in the Akka toolkit as a way to solve some common shortcomings. View it as such, with an open mind, and you will have already taken the first step in our refactoring journey.

Tip

Detailed steps to download the code bundle are mentioned in the Preface of this book. Please have a look. The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Mastering-Akka. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

 

Working with the example application


Now that you have an understanding of the initial code, we can build it and then get it up and running. It's assumed that you already have Scala and sbt installed. Assuming you have those two initial requirements installed, we can get started on getting the example app functional.

Setting up Docker

Throughout this book, we will be using Docker to handle setting up any additional applications (such as Postgres) and for running the bookstore application itself (within a container). For those unfamiliar with Docker, it is a containerization platform that will let you package and run your applications, ensuring that they run and behave the same no matter what the environment is that they are running on. This means that when you are testing things locally, on your Mac or Windows computer, the components will run and behave the same as when they eventually get deployed to whatever production environment you run (say some Linux distribution on Amazon's Elastic Compute Cloud). You package up all of the application components and their dependencies into a neat little container that can then be run anywhere Docker itself is running.

The decision to use Docker here should make set up simpler (as Docker will handle the majority of it). Also, you won't clutter up your computer with these applications as they will only run as Docker containers instead of being directly installed. When it comes to your Docker installation, you have two possible options:

  • Install Docker Toolbox, which will install the docker engine, docker-machine and docker-compose, which are necessary for running the bookstore application.

  • Install one of the native Docker apps (Docker for Windows or Docker for Mac), both of which will also work for running the bookstore application.

The biggest difference between these two options will be what local host address Docker uses when binding applications to ports. When using Docker Toolbox, docker-machine is used, which will by default bind applications to the local address of 192.168.99.100. When using one of the native Docker apps, the loopback address of 127.0.0.1 (localhost) will be used instead. 

If you already have Docker installed, then you can use that pre-existing installation. If you don't have Docker installed, and you are on a Mac, then please read through the link from below to help you decide between Docker for Mac and Docker Toolbox: https://docs.docker.com/docker-for-mac/docker-toolbox/.

For Windows users, you should check out the following link, reading through the section titled What to know before you install, to see if your computer can support the requirements of Docker for Windows. If so, then go ahead and install that flavor. If not, then callback to using Docker Toolbox: https://docs.docker.com/docker-for-windows/.

Adding the boot2docker hosts entry

Because we gave you a choice in which Docker flavor to run, and because each different flavor will bind to different local addresses, we need a consistent way to refer to the host address that is being used by Docker.  The easiest way to do this is to add an entry to your hosts file, setting up an alias for a host called boot2docker.  We can then use that alias going forward to when referring to the local Docker bind address, both in the scripts provided in the code content for this book and in any examples in the book content.

The entry we need to add to this file will be the same regardless of if you are on Windows or a Mac.  This is the format of the entry that you will need to add to that file:

<docker_ip>     boot2docker

You will need to replace the <docker_ip> portion of that line with whatever local host your Docker install is using.  So for example, if you installed the native Docker app, then the line would look like this:

127.0.0.1       boot2docker

And if you installed Docker Toolkit and are thus using docker-machine, then the line would look like this:

192.168.99.100  boot2docker

The location of that file will be different depending on if you are running Windows or are on a Mac.  If you are on Windows, then the file an be found at the following location: C:\Windows\System32\Drivers\etc\hosts.

If you are running in a Mac, then the file can be found here: /etc/hosts.

Understanding the bookstore Postgres schema

The initial example app uses Postgres for its persistence needs. It's a good choice for a relational database as it's lightweight and fast (for a relational database at least). It also has fantastic support for JSON fields, so much so that people have been using it for a document store too.

We will use Docker to handle setting up Postgres locally for us, so no need to go out and install it yourself. There are also setup scripts provided as part of the code for this chapter that will handle setting up the schema and database tables that the bookstore application needs. I've included an ERD diagram of that schema below for reference as I feel it's important in understanding the table relationships between the entities for the initial version of the bookstore app.

If you are interested in the script that was used to create the tables from this diagram, then you can find it in the sql directory under the intial-example-app root folder from the code distribution, in a file called example-app.sql.

Running bash scripts on Windows

If you are using a Mac, you can skip reading this section. It only pertains to running the .sh scripts used to start up the app on Windows.

As part of the code content for each chapter, there are some bash scripts that handle building and running the bookstore application. As bash is not native to the Windows operating system, you will have to decide how you want to build and start the bookstore application, choosing from one of the the following possibilities:

  1. If you are using Git for Windows, then you have Git BASH installed locally and you should be able to use that tool to run these fairly simple scripts.

  2. If you are on Windows 10, then you can use the new Windows Subsystem for Linux and install a bash shell. Check out this link for instructions: http://www.howtogeek.com/249966/how-to-install-and-use-the-linux-bash-shell-on-windows-10.

  3. You can install cygwin.

  4. As a last resort, if none of the above options work then you can look at the .sh files referenced and just run the commands (which are a mix of sbt and Docker commands) individually yourself. There's not a lot of them per file, so this is not a bad last resort.

Starting up the example application

Now that the database is up and running, we can get the Scala code built and then packaged into a Docker container (along with Java8 and Postgres, via docker-compose) so we can run and play with it locally. First, make sure that you have Docker up and running locally. If you are running one of the native Docker apps, then look for the whale in your system tray. If it's not there, then go and start it up and make sure it shows there before continuing. If you are running Docker Toolbox, then fire up the Docker Quickstart Terminal, which will start up a local docker-machine session within a terminal window with a whale as ASCII art at the top of it. Stay in that window for the remainder of the rest of the following commands as that's the only window where you can run Docker-related commands.

From a terminal window within the root of the initial-example-app folder run the following command to get the app all packaged up into a Docker container:

docker-build.sh

This script will instruct sbt to build and package the application. The script will then build a docker image, tag it and store it in the local docker repository. This script could take a while to run initially, as it will need to download a bunch of Docker-related dependencies, so be patient. Once that completes, you can then run the following command in that same terminal window:

launch.sh

This command will also take a while initially as it pulls down all of the components of our container, including Postgres. Once this command completes, you will have the bookstore initial example application container up and running locally, which you can verify by running the following command:

docker ps

That will print out a process list for the containers running under Docker. You should see two rows in that list, one for the bookstore and one for Postgres. If you want to log into Postgres via the psql client, to maybe look at the db before and after interacting with the app, then you can do so by executing the following command:

docker run -it --rm --network initialexampleapp_default postgres psql -h postgres -U docker

When prompted for the password, enter docker. Once in the database, you can switch to the schema used by the example app by running the following command from within psql:

\c akkaexampleapp

From there, you can interact with any of the tables described in the ERD diagram shown earlier.

Tip

If you want to stop a Docker container, use the docker stop command, supplying the name of the container you want to stop. Then, use the docker rm command to remove the stopped container or docker restart if you want to start it up again.

Interacting with the example application endpoints

Once the app is up and running, we can start interacting with its REST-like API in an effort to see what it can do. The interactions will be broken down by subdomain within the app (represented by the -services projects), detailing the capabilities of the endpoint(s) within that subdomain. We will use the httpie utility to execute our HTTP requests. Here are the installation instructions for each platform.

Installing httpie on Mac OS X

You can install httpie on your Mac via homebrew. The command to install is as follows:

$ brew install httpie

Installing httpie on Windows

The installation on Windows is going to be a bit more complicated as you will need Python, curl, and pip. The full instructions are too long to include directly in this book and can be found at: http://jaspreetchahal.org/setting-up-httpie-and-curl-on-windows-environment/.

Interacting with the user endpoint

The first thing we can do when playing with the app's endpoints is to create a new BookstoreUser entity that will be stored in the StoreUser Postgres table. If you cd into the json folder under the initial-example-app root, there will be a user.json file that contains the following json object:

{ 
  "firstName": "Chris", 
  "lastName": "Baxter", 
  "email": "[email protected]" 
} 

In order to create a user with these fields, you can execute the following httpie command when in the json folder:

http -v POST boot2docker:8080/api/user   < user.json

Here, you can see that we are making use of the hosts file alias we created in section Adding the boot2docker hosts entry.  This let us make HTTP calls to the bookstore app container that is running in Docker regardless of what local address it is bound to.

The -v option supplied in that command will allow you to see the entire request that was sent (headers, body, path, and params), which can be helpful if it becomes necessary to debug issues. We won't supply this param on the remainder of the example requests, but you can if you feel you want to see the full request and response. The < symbol implies that we want to send the contents of the user.json file as the POST body. The resulting user.json will look like the following:

{ 
  "meta": { 
    "statusCode": 200 
  },  
  "response": { 
    "createTs": "2016-04-13T00:00:00.000Z", 
    "deleted:":false, 
    "email": "[email protected]",  
    "firstName": "Chris",  
    "id": 1,  
    "lastName": "Baxter",  
    "modifyTs": "2016-04-13T00:00:00.000Z" 
  } 
} 

This response structure is going to be the standard for endpoint responses. The "meta" section mirrors the HTTP status code and can optionally contain error information if the request was not successful. The "response" section will be there if the request was successful and can contain either a single object as JSON or an array of objects. Notice that the ID of the new user is also returned in case you want to look that user up later.

You should add a few more JSON files of your own to that directory representing more users to create and run the same command referenced earlier (albeit with a different file name) to create the additional users. If you happen to try and create a user with the same e-mail as an existing user, you will get an error.

If you want to view a user that you have created, as long as you know the ID, you can run the following command to do so, using user ID 1 as the example:

http boot2docker:8080/api/user/1

Notice on this request that we don't include an explicit HTTP request verb. That's because httpie assumes a GET request if you do not include a verb.

You can also look up a user by e-mail address with the following request:

http boot2docker:8080/api/user [email protected]

The httpie client uses the param==value convention to supply query params for requests. In this example, the query string would be: ?email=chris%40masteringakka.com.

You can make changes to a user's basic info (firstName, lastName, email) by executing the following command:

http PUT boot2docker:8080/api/user/1 < user-edit.json

The included user-edit.json file contains a set of request json to change the initially created user's e-mail address. As with the creation, if you pick an e-mail here that is already in use by another user, you will get an error.

If at any time you decide you want to delete a user, you can do so with the following request, using user ID 1 as the example:

http DELETE boot2docker:8080/api/user/1

This will perform a soft delete against the database, so the record will still exist but it won't come back on lookups anymore.

Interacting with the Book endpoint

The Book endpoint is for taking actions against the Book entity. Books are what are added to sales orders, so in order to test sales orders, we will need to create a few books first. To create a Book, you can run the following command:

http POST boot2docker:8080/api/book < book.json

As with the BookstoreUser entity, you should create a few more book.json files of your own and run the command to create those books too. Once you are satisfied, you can view a book that you have created by running the following command (using book ID 1 as the example):

http boot2docker:8080/api/book/1

Books support the concept of tags. These tags represent categories that the book is associated with. For example, the 20000 Leagues Under the Sea book that is represented in the book.json file is initially tagged as fiction and sci-fi. The Book endpoint allows you to add additional tags to the book, and it also allows you to remove a tag. Adding the tag ocean to book ID 1 can be done with the following command:

http PUT boot2docker:8080/api/book/1/tag/ocean

If you decide that you would like to remove that tag, then you can execute the following command to do so:

http DELETE boot2docker:8080/api/book/1/tag/ocean

If you want to look up books that match a set of input tags, you can run the following command:

http boot2docker:8080/api/book tag==fiction

This endpoint request supports supplying the tag param multiple times. The query on the backend uses an AND condition across all of the tags supplied, so if you supply multiple tags, then the books that match must have each of the tags supplied. An example of supplying multiple tags would be as follows:

http boot2docker:8080/api/book tag==fiction tag==scifi

You can also look up a book by author by executing a request like this:

http boot2docker:8080/api/book author==Verne

This request supports partial matching on the author, so you don't have to supply the complete author name to get matches.

The last concept that we can test out related to book management is allocating inventory to the book once it's been created. Books get created initially with a 0 inventory amount. If a book does not have any available inventory, it can not be included on any sale orders. Since not being able to sell books would be bad for business, we need the ability to allocate available inventory for a book in the system.

To indicate that we have five copies of book ID 1 in stock, the request would be as follows:

http PUT boot2docker:8080/api/book/1/inventory/5

Like the BookstoreUser entity, you can also perform a soft delete for a book. You can do so by executing the following request, using book ID 1 as the example:

http DELETE boot2docker:8080/api/book/1

Now that we have users and books created and we have a book with inventory, we can move on to pushing sales orders through the system.

Interacting with the Order endpoint

If you want to run a profitable business, at some point, you need to start taking in money. For our bookstore app, this is done by accepting sales orders for the books that we are keeping in inventory. All of the playing with the user and book-related endpoints was done so that we could create SalesOrder entities in the system. A SalesOrder is tied to a BookstoreUser (by user ID) and has 1-n line items, each for a book (by book ID).

The request to create a SalesOrder also contains the credit card info so that we can first charge that card, which is where our money will come from. In the OrderManager code, before moving forward with creating the order, we first call over to the CreditCardTransactionHandler service to charge the card and keep a persistent record of the transaction. As we don't actually own the logic for charging the card ourselves, we call out over HTTP to a fake third-party service (implemented in PretentCreditCardService in the server project) to simulate this interaction.

Within the JSON directory, there is an order.json file that has the valid JSON in it for creating a new SalesOrder within the system. This file assumes that we have already created a BookstoreUser with ID of 1 and a book with an ID of 1, and we have added inventory to that book, which we did in the previous two sections. To create the new order, execute the following command:

http POST :8080/api/order < order.json

As long as the userId supplied and bookId supplied exist and that Book has inventory available, then the order should be successfully created. Each SalesOrderLineItem will draw down inventory (atomically) for the book that item is for in the amount tied to the quantity input for that line item. If you run that same command enough times, you should eventually exhaust all of the available inventory on that book, and you should start getting errors on the creation. This can be fixed by adding more inventory back on the book.

If you want to view a previously created SalesOrder, as long as you know the ID (which is returned in the create response JSON), then you can make the following request (using order ID 1 as the example):

http boot2docker:8080/api/order/1

If you want to lookup all of the orders for a particular user ID, then you can execute the following request:

http boot2docker:8080/api/order userId==1

The Order endpoint also supports looking up SalesOrders that contains a line item for a particular book by its ID value. Using book ID 1 as the example, that request looks like this:

http boot2docker:8080/api/order bookId==1

Lastly, you can also look up SalesOrders that have line items for books with a particular tag. That kind of request, using fiction as the tag, would be as follows:

http boot2docker:8080/api/order bookTag==fiction

Unlike the search books by tag functionality, this request only supports supplying a single bookTag param.

 

So what's wrong with this application?


By this point, you've had some time to interact with the example app to see what it can do. You've also looked at the code enough to see how everything is coded. In fact, maybe you've coded something similar to this yourself when building reactive apps on top of Akka. So now, the million dollar question is, "What's actually wrong with this app?"

The short answer is probably nothing. Wrong is a very black and white word, and when it comes to coding and application design, you're dealing more with shades of gray. This app may suit some needs perfectly well. For example, if high scalability is not a concern, you have a small development team and/or if the app's functionality doesn't need to expand much more.

This wouldn't be much of a book if we left it at that though. The long answer is that while nothing is absolutely wrong, there is a lot that we can improve upon to help our app and team continue to grow and scale. I'll break down some of the areas that I think can be made better in the following sections. This will help serve as a primer for some of the refactors that we will do in the upcoming chapters.

Understanding the meanings of scalability

To me, scalability is a nebulous term. I think a lot of people, when they hear scalability, immediately begin to think of things such as performance, throughput, queries per second, and the likes. These types of areas address the runtime characteristics of whatever application you have deployed. This is certainly a big aspect of the scalability umbrella and very important to your app's growth, but it's not the only thing you should be thinking of.

Another key area of scalability, that I think of when discussing the topic, is how well your application codebase will scale to the growth of both in-app functionality and to the growth of the development team. When your team is small (like me as the single developer of this example app) and the feature set of the application is minimal (again, like this example app), then codebase scalability is probably not the first thing on your mind. However, if you expect your company to be successful and grow, then your codebase needs to grow along with it; or else you run the risk of becoming impediment to the business as opposed to an enabler of the business. Therefore, there are some decisions that you can make earlier on in the growth process to help enable the codebase to scale with the growth of the business and development team.

There's a lot to discuss related to these two areas of application scalability, and we will break them down in more detail in the subsequent sections.

The scalability cube

If you haven't encountered Martin L. Abbot and Michael T. Fisher's excellent book, The Art of Scalability, then you should give it a look some time. This book covers all aspects of scaling a business and the technology that goes along with it. In many a meeting, I've referenced materials from this book when discussing how to architect software components. There's a ton of valuable lessons in here for beginners all the way up to the more seasoned technologists.

In the book, the authors discuss the concept of the the Three Dimensions of Scalability for a running application. Those dimensions are represented in the following diagram of the cube:

X axis scaling

The X axis scaling is the one that most people are familiar with. You run multiple copies of the application code on different servers and put a load balancer in front to partition inbound traffic. This kind of scaling gives you high availability, in that, if one server dies, the app can still serve traffic as the load balancer will redirect that traffic to the other nodes. This technique also gives your better per-node throughput as each node is only handling a percentage of the traffic. If you see your nodes are struggling to handle the current traffic rate, simply add another node to ease the burden a little on the existing nodes.

This is the kind of scaling that a monolith, like the example app, is most likely to use. It's pretty simple to keep scaling out by just adding more application nodes, but it seems a little inefficient in terms of resource usage. For example, if in your monolith, it's really only one of the services that is receiving the bulk of the additional load, you still need to deploy every other service into the new server even though they do not need the additional headroom.

In this kind of deployment, your services can't really have their own individual scaling profile. They are all scaling out together because they are co-deployed in a monolith. When you are sizing out the new instance node to deploy into in terms of CPU and RAM, you are stuck with a more generic profile as this node will handle traffic for all of the services. If certain services were more CPU heavy and others were more RAM heavy, you end up having to pick a node that has both a lot of RAM and a high number of CPUs as opposed to being able to choose between either one. These kinds of decisions can be cost prohibitive in the cloud-based world where changes in either of those two areas cost more money and more do when they need to be coupled together.

Microservices and Y axis scaling

The Y axis scaling approach addresses this exact kind of problem. With y axis scaling, you break up the application around functional boundaries and then deploy these different functional areas separately and independent of one another. This way, if functional area A needs high CPU and functional area B needs high RAM, you can select individual node instances that are the best suited to those needs. In addition, if functional area A receives the bulk of the traffic, then you can increase the number of nodes that handle that area without having to do the same thing for functional areas that receive less traffic.

If you've heard of the Microservices approach to building software (and it would be hard not to have, given how much technical literature on the Web is dedicated to it), then you are familiar with an example of using Y axis scaling to build software. This kind of approach achieves the goal of small independent services that can scale independent of each other, but it comes with additional complexity around the deployment and management of those components. In a simple monolithic, X axis style deployment, you know that every component is on every node, so the load balancer can send traffic to any of the available nodes.

In a Microservice deployment, you need to know where in the node set service A lives (and it should be in multiple nodes to give high availability) so that you can route traffic accordingly. This service location concern complicates these kinds of deployments and can involve bringing in another moving part (software component) to handle it, which further increases the complexity. Many times though, the benefits gained from decoupled, independent services outweigh these additional complexities enough to make this kind of approach worth pursuing.

Z axis scaling

In Z axis scaling, you take the same component and duplicate it across an entire set of nodes, but you make each node responsible for only a subset of the requests or data. This is commonly referred to as sharding, and it is quite often seen in database-related technologies. If you are running in-memory caching on the nodes, then this kind of approach eliminates duplicate cached data in each node (which can occur with X axis scaling). Only the node responsible for each data set (defined by your partitioning scheme) will receive traffic for that data.

When using Akka Cluster Sharding, you can be sure only one actor instance is receiving requests for a particular entity or piece of data, and this is an example of z axis scaling.

Monolith versus microservices

Our initial example app is a monolith, and even though that approach now carries negative connotations, it's not necessarily a bad thing. When starting out on a new application, a monolith-first kind of approach may actually be the right choice. It's simpler to code, build, deploy, and deal with when it's in production as opposed to a more sophisticated (but complicated) microservices deployment. Sometimes, trying to start out with something like microservices can lead to the team and application collapsing under the weight of the additional complexity of that pattern.

Within an Agile development approach, getting the software in the hands of users quickly so you can get feedback and iterate further is going to be way more important then having a fancy, new fangled architecture. The product team is not going to want to hear that they have deployment impediments, because you still can't figure out how to get all of your decoupled services to communicate together. At the end of the day, as an engineer, you're there to build and deploy a product. If your initial architecture prevents doing this easily, then it will be hard for you and your business to be successful.

In the beginning stages of a new business, it's important to be able to iterate and change features easily in response to feedback. Take too long (due to a complicated initial architecture) and you risk being passed by a competitor, or dropped by those very users who provided that valuable feedback. That's why it's not necessarily bad to start out with something like a monolith. It's simple to deploy and scale initially with an Xx axis style approach, and that can suit most needs just fine in the beginning.

Scale issues with our monolith

In the case of our bookstore application, we followed the more simple monolith-first approach, but now, the rubber is finally hitting the road. We have traction with users. Our product feature set is growing and so is the team and the complexity of the application. Our monolith is starting to become an impediment to our agility and that's going to be a problem. Because of this, we are going to embrace the microservices style of small, independent, decoupled services. The subsections to follow will detail how we arrived at this decision.

Issues with the size of deployments

Currently, we need to build and deploy the entire codebase even if we are touching a single line within a single service. The less code you have to deploy, the less risky the deployment is. Also, as the monolith continues to grow, so does the build and test cycle associated with it. If all you have to do is change a single line buried deep down within a service, but the ensuing build and test cycle takes 30 minutes because it's rebuilding everything, this will eventually become a problem. If you have an issue in production and need to get a hotfix out as soon as possible, you don't want this monolith-side effect to get in the way of that. This is where a microservices-like approach will help alleviate that problem.

Supporting different runtime characteristics of services

Now that our app has been in production for a while, we are starting to better understand the runtime usage patterns of our different sets of services. We know which services are being hit the most in the normal app flows, and we would like to be able to have more instances of these critical services available compared to other less important services. Our current monolith does not support us in doing this, but switching to a microservices-like system will enable it.

The pain of a shared domain model

The shared domain model in the app (in the common project) is going to be an issue when it comes to isolating the deployments. Changes to this shared model will necessitate full deployments, and that will prevent our goal of smaller isolated deployments. When we initially designed the example app structure, we thought we were being forward looking in separating services into different projects and then allowing them to communicate via the shared domain model. We eventually wanted to package and deploy these projects separately, but now, in hindsight, the shared domain model is actually going to make this harder as opposed to enabling it to happen.

If we do end up packaging and separating the services fully for deployments, each one will have to have a copy of the shared domain library code available to it at runtime in order to run. If we change the domain model and then only deploy one of the services, you run the risk of having issues when that service communicates with another service that was not rebuilt and redeployed after the model change.

If we had decided to use Akka remoting to handle remote communication between actor services, then we could run into issues with Java serialization when deserializing the messages and result types exchanged by the services.

We could work around this by using a different serialization scheme (such as protobuf), but this is certainly more work, and there are more flexible ways to communicate between our services.

We should try and decouple our components and modules as much as possible. We need an approach that allows them to communicate indirectly, outside of the normal request/response cycle of a user interaction with the application. This most likely means some form of event based interaction between our modules, with schemas and versioning for those events, so that other modules can consume them safely even as the model continues to evolve.

We can fall back on direct communication over HTTP (with versioning of endpoints), if necessary to support some interactions. We should only use that as a last resort though if an indirect approach just won't fit a certain situation well enough. Direct communications like this create the kind of coupling that we are trying to avoid, so they should used sparingly.

Issues with our relational database usage

When we built out our bookstore application, the team decided to use a relational database to store data, selecting Postgres as one to use. As far as relational databases go, Postgres is a solid choice. As I mentioned earlier, it's fast and has a lot of great additional features such as JSON column type support. But now that we are moving towards a microservices approach, is it going to still be the right choice for our application? I see the following shortcomings with our current usage of Postgres that will likely lead to us moving away from it as we evolve our application.

Sharing the same schema across all services

The Microservices approach promotes the shared nothing model of software development. This means that each microservice should not share any of its code models or database schemas with any other component in the system. Sharing creates coupling and we are trying to decouple or components as much as possible. If you try and have fully decoupled services, but they end up sharing a single database and schema underneath them, then you're going to end up in trouble pretty quickly. If you make a change to the schema, it's highly possible that change is going to ripple through multiple services, causing you to have to recode and redeploy more than you intended. When this happens, you're right back to where you were with your monolith and don't actually have service independence.

A traditional relational database model is designed around having a highly related and normalized model that will span all subdomains within your business. You will more than likely have database entities from one subdomain related (via foreign key) to an entity from another subdomain. We see it in the example app's schema where we have foreign keys from SalesOrderHeader to StoreUser and from SalesOrderLineItem to Book. These cross-subdomain relationships will end up causing problems if we are trying to do a share nothing microservices type model. If we are going to go down this path, then we will need to consider alternatives to a relational model.

Fixing a single point of failure

When building out complex systems, you are only ever as strong as your weakest link. In our current application, that weak link is our Postgres database because it's not highly available; in fact, it's a single point of failure.

With our current deployment model, the app itself is highly available because we have duplicated it across a set of nodes and put a load balancer in front (X axis scaling). We can survive the failure of a node because we have others that can pick up the slack for it until we get it back online. Unfortunately, we cannot say the same for Postgres. Currently, we are only rolling out a single Postgres instance, and so, if that goes down, it's game over for our application.

Postgres certainly supports techniques to eliminate it being a single point of failure. You can start by setting it up with a node as a hot standby using log shipping. In the case of a failure in the master, you can cut over to the secondary node with only minimal data loss. You can't write to that secondary node (it's master/slave, not master/master). However, if you put a little work into your application layer, you could leverage that secondary for reads and ease the burden on the master node a little as long as you can deal with potential replication lag (stale data) when performing reads.

This kind of model is better than the single point of failure we had before, but it still seems prone to the database itself having to deal with a lot of activity as our user base grows, especially the master node. We need to make sure we size that instance correctly (vertically scaled) to allow it to handle the load we expect to happen as our user base grows.

We could try and ease this burden by sharding (Zz axis scaling) the data in Postgres, but as this is not natively supported, we would need to roll out our own solution. If we distribute the data to a bunch of Postgres instances, we can no longer rely on the auto-generated keys in the tables to be globally unique across all of the database nodes. Because of this, we would have to do something like generating the keys in the application layer (as GUIDs perhaps) as opposed to letting the database generate them. We would then have to write our own shard-routing logic in the application layer to consistently hash the key to determine which node to store it or retrieve it from. In addition, for queries that look up more than one record, we would have to write out our own logic to distribute that query across all shards (a global query) as the matching records will likely be in multiple shards.

A custom sharding solution like this could certainly work, but this seems like a lot more complexity being put on our code base. If we don't get this shard routing logic right, then the consequences are pretty bad as we could miss data, and the app will act as if it didn't exist even though it might. There must be something we can more easily do to give us high availability and avoid having all of the data stored in one single location.

Avoiding cross-domain transactions

Another potential problem that has crept up with our usage of a relational database is that we are performing a database transaction that crosses service-domain boundaries. You can see this transaction within the OrderManager service when it's creating a new SalesOrder. If you look at the code in the DAO class, you can see these three steps being executed in a single transaction:

  1. Insert the SalesOrderHeader record.

  2. Insert each SalesOrderLineItem record for the order.

  3. For each book (on each line item), decrement the inventory for that book.

We coded this using a transaction because we felt it was required to have strong consistency between the number of sales for a book and the remaining inventory for that book. The code does check to make sure that the inventory is available before attempting to write to the database, but that inventory could be sold out from underneath us after we checked it and before we commit it.

The statement to decrement inventory uses an optimistic concurrency-checking technique to ensure this does not happen. That statement looks like this:

update Book set inventoryAmount = inventoryAmount - $  {item.quantity} where id = ${item.bookId} and inventoryAmount >=  ${item.quantity}

The key there is the where clause, where we are checking to make sure that the row we are about to apply our atomic decrement to still has at least the quantity we plan on deducting from it. If it doesn't, then our code fails the transaction explicitly by applying a filter on the result, making sure the number of rows updated is 1 and not 0. The code that handles that responsibility is as follows:

insert. 
andThen(decrementInv). 
filter(_ == 1) 

This is all coded soundly and works as expected, but is this the best way that we can be handling the requirement of keeping inventory aligned with sales quantities?

I think the main issue here is the fact that we are executing a transaction that really spans two separate subdomains within our application; sales order processing and inventory management. We did this because we thought that the strong consistency gained from an atomicity, consistency, isolation, and durability (ACID) transaction was the only way to make this work properly. The problem with this approach though is that it's not going to scale, both from a performance perspective and from a code design and deployment perspective.

These kinds of ACID transactions are heavy weight for the database and can start to cripple it if they are happening at a high frequency. We obviously expect and want sales to be happening at a very high frequency, so there's a clear conflict of interest here. Also, currently, we are sort of benefitting from the fact that there is a shared single database under the app. What if we decided to keep using a relational database, but separated it out so that each service had its own schema or db instance? How would we make something like this multitable, cross-domain transaction work then?

If we were faced with such a problem, we'd probably need to look into getting distributed transactions (XA transactions) working across the different databases, and that's not a good direction to be forced into. While it would allow us to keep our strong consistency guarantees and ACID compliance, XA transactions can be initially difficult to set up and get working correctly in your code. In addition, they are a big performance drain, as the two-phase commit involves longer lock durations than the same transaction would in a local only mode, also increasing the possibility of deadlocks. Distributed transactions are also tied to the availability of multiple systems (databases in our case), so if either of those systems is not available, you cannot proceed with your transaction.

So, we need to be able to support high throughput handling of sales orders while at the same time be able to properly keep inventory in sync with the sales of our books. Also, we want to avoid crossing over into another domain's responsibility when processing that sales order. There must be a technique that will allow us to do this and fit well into our proposed microservices model for developing our services.

Understanding the CAP theorem

When designing the way sales orders were handled, we were sure that ACID level consistency was the proper way to handle things. Now that we are faced with the issues discussed in the previous section, that decision is starting to look more like a problem than a solution. We do want some level of consistency in the data, but strong consistency is not the only game in town, and there is another model that can help us avoid being burned by ACID.

We also have realized that our current Postgres deployment sets the db up as a single point of failure. Ideally, we need to embrace a model where the data is distributed across a set of nodes (with replication) so that we can get both high availability and be able to deal with the temporary loss of a node within that cluster.

The CAP theorem, also known as Brewer's theorem after the University of California Berkley computer scientist Eric Brewer, is a way to think about consistency guarantees within a distributed system. The theorem states that it's not possible for a distributed system to supply all three of the following simultaneously:

  • Consistency: Do all my nodes see (on a read) the same exact data after a write has occurred?

  • Availability: Do all my requests get a response?

  • Partition tolerance: Will my system continue to operate in the face of arbitrary loss of parts of the system?

We all want a system that is consistent, highly available, and has partition failure tolerance, but this theorem states that the best you can do is two out of the three. There are a few databases out there now that can give us high availability and partition tolerance, sacrificing strong consistence for an eventually consistent model instead. These kinds of databases do not support the atomic, consistent, isolated, and durable guarantees that an ACID compliant database will give you. Instead, these databases give you guarantees of basically available, soft state, eventually consistent, or BASE.

It's a bit of a mental shift to embrace this new model of eventual consistency, but the tradeoff of a highly available system with partition failure tolerance can mitigate that change. We need to find a database that fits in this space as that will best support our shift to a share nothing microservices like model.

Cassandra to the rescue

Apache's Cassandra is a distributed key/value document store that also supports queries of that data using secondary indexes. From the CAP theorem, Cassandra's model gives you both high availability and partition tolerance. Cassandra achieves this by having a cluster of nodes where a specific range of keys is assigned to multiple nodes in the cluster. This way, if you need to look up a key, there will be multiple possible nodes that the request could be serviced by. If one of the nodes goes down, then you will still be able to have that request serviced by another cluster member that also handles that range of keys.

The thing to be careful of in Cassandra is that the data in the cluster is only eventually consistent. If nodes A, D, and F in my cluster house the key foo, and I update that key in node A, and then I read it again, and the read goes to node D, it's not guaranteed yet that my update has been received by that node D, which can lead to a stale read. Cassandra offers the ability to tune the consistency model to make it more consistent, but this comes at the price of latency, so be careful if you decide to go this route.

So, how can we apply Cassandra and its model to our initial cross-domain transaction problem? Well, we could use Akka Persistence (which works with Cassandra) to first write the SalesOrder into the system in an event-sourced manner, with an initial status of pending as it's awaiting inventory allocation. The Book subsystem could be listening on the event stream for SalesOrder activity, and when it sees a new one, it can see what books and quantities are on it and reserve inventory for it (if available), resulting in a new InventoryReserved event for that SalesOrder ID. The Order subsystem is in turn listening on the event stream for inventory-related activity and will update its status to approved and start the process of packing and shipping the order once it sees that inventory is available.

So, using Cassandra here, we get a database that is very fast in writing SalesOrder into the system. It's also a database that is highly available and can handle node failure, which are guarantees that our current Postgres database can't make. Then, leveraging Akka Persistence and using an event-sourced model on top of Cassandra, we can use an eventually consistent approach to get the SalesOrder and book inventory systems working together.

This approach eliminates direct interaction between those services and also does away with that nasty cross-domain transaction. It allows us to better scale the runtime performance of the app and the codebase itself, which are both big wins for the future health of our app.

Assessing the application's domain model

If you have looked at the example app's code, you have probably seen the following structure related to entities and services:

  • Entities (such as Book, BookstoreUser,and SalesOrder) are modeled as very simple case classes without any business logic

  • Services (ending with the *Manager suffix) are set up to handle the business logic for those entities

This is a common paradigm seen in software development, so it's not like we were going off the rails with this approach. It's pretty simple to develop and understand, and can be a good approach to use when your problem domain and codebase are small and simple. The only problem is that its modeling of our problem domain is not entirely representative of how things work in the real world of book sales. In fact, models like this have been referred to as being anemic, in that, they only weakly resemble the problem domain they are trying to represent.

The domain-driven design (DDD) is a newer approach to software modeling that aims to have a more representative modeling of software components. The term was coined by Eric Evans in his book of the same name. The goal of a DDD approach is to model the software components after representations within the domain. These domain representations will encapsulate the business logic and functions of those business entities entirely. In doing so, you have the business entities in your system that are much richer representations of their real-life counterparts.

In our current example app, this means something such as Book, which is a very simple case class, becomes an actor that accepts messages that allow you to do things to that book as part of the user interactions with our app. The DDD approach has its own set of building blocks, such as aggregates and bounded contexts, that we can use to remodel our current app into something that better represents the business domain.

This kind of approach is a bit more complex than the simpler model the app currently uses. Its true benefit is realized within complex business domains as the one-to-one relationship between the software and the domain concepts eases the development burden of that complex domain. This is a bit of a stretch for the relatively simple mode in our example app, but we're going to give it a shot anyway as part of one of our refactors. At the very least, we can explore a different way of modeling software, one that might really benefit our app if it starts to get more complex in what it's doing.

Recognizing poorly written actors

If you happened to look at the logic within the SalesOrderManager actor, you will notice a fairly complicated actor, at least in terms of the other actors in this app. This actor needs to work with a bunch of the other services in the app to first gather some data (to perform validations) before it talks to the database to create the order. The bulk of the work is laid out in the createOrder method and is as follows:

val bookMgrFut = lookup(BookMgrName) 
val userMgrFut = lookup(UserManagerName) 
val creditMgrFut = lookup(CreditHandlerName) 
for{ 
  bookMgr <- bookMgrFut 
  userMgr <- userMgrFut 
  creditMgr <- creditMgrFut 
  (user, lineItems) <- loadUser(request, userMgr). 
  zip(buildLineItems(request, bookMgr)) 
  total = lineItems.map(_.cost).sum 
  creditTxn <- chargeCreditCard(request, total, creditMgr) 
  order = SalesOrder(0, user.id, creditTxn.id,  
  SalesOrderStatus.InProgress, total,  
  lineItems, new Date, new Date) 
  daoResult <- dao.createSalesOrder(order) 
} yield daoResult 

This all looks nice and neat, and the code is pretty well organized. Readability is enhanced by delegating a lot of the work into separate methods instead of directly in the body of the for comprehension. So what's wrong with an approach like this?

Mixing actors and Futures is a topic that has received much chatter out there on the Internet. A lot of people call it anti-pattern, and I agree mostly. I think a little Future usage, like how the other actors use the DAO and then pipe the result back to the sender, is okay, but this clearly crosses the line.

One of the biggest concerns with mixing Futures and actors is the fear that you might accidentally close over mutable scope (variables) and access them in an unsafe way. Once the actor code execution hits the first Future callback, you are done processing that message and the actor moves on to the next one in the mailbox. If you close over something that is mutable (sender() being the classic example), you run the risk of you trying to access it while another thread (the one processing the next message in the mailbox) is also accessing it. This basically eliminates one of the biggest benefits of actors, in that, you get serialized mailbox handling and thus don't have to worry about concurrent modifications to internal state. This particular actor doesn't have issues with mishandling of mutable state, but that alone doesn't mean it's a good use of actors. So, much work is being done outside of the context of message handling via the mailbox, that coded as is, it's not really worth using an actor.

On top of that, there is a fair amount of using the ask pattern here (the ? operator). This pattern involved making a request/response semantic out of what is normally a one way messaging pattern with tell (!). The Akka team pushes you to try and limit as it leads more into mixing Futures with your actors, which can lead to undesired behavior. In addition, a short-lived actor instance is created behind the Future so that the receiver has an actual sender() ActorRef to send a response back to. This creation of short-lived instance is inefficient, especially when you consider the total number of times it's happening within the servicing of a CreateOrder request.

We need to clean up this actor so that we don't use it to set a precedent for Future actors that also have complex workflows. For me, when faced with a rather involved flow like this one, I use a pattern of an actor per request in combination with using Akka's finite-state machine (FSM). Using FSM, we can design all of the different aspects of the flow as states and then progress through them as we react to the different data we're loading and processing.

An approach like this makes the code more intuitive as you just need to understand the different states and what triggers are allowing you to move between them, eventually reaching a termination point. As a side effect, this approach also allows me to get rid of ask and focus completely on tell when communicating with other actors. Having cleaner, more intuitive, and idiomatic actor code is a big win, and we will jump right into this refactor in Chapter 2, Simplifying Concurrent Programming with Actors.

Replacing our HTTP libraries

Within the example app, we have the need for both inbound and outbound HTTP handling. These two needs are met by unfiltered and dispatch, respectively. These two sister libraries have personally accomplished a lot for me in my Scala development projects. Unfortunately, neither is as actively maintained as we would like and that can be a problem moving forward. For instance, if you suddenly needed a new feature within this app, such as HTTP/2 support, you might be stuck waiting a while to get it. If you are going all in on using a third-party library, it's always a good practice to use one that is very actively maintained and with a lot of people using it. This means that things such as bugs will be fixed early and often, and there is also a lot of community information out there to help you if and when you get stuck.

Fortunately, starting with release version 2.4.2, Akka now includes full support for both inbound and outbound non-blocking HTTP handling. The HTTP support is based on the excellent spray library and is fully integrated with Akka's Reactive Streams project. These modules (Akka Streams and Akka HTTP) had been available separately before, with their own versioning scheme, but now they are available and versioned along with the other core Akka projects. Also, before being folded into the core Akka repo, these modules had been tagged as experimental. Now these modules are no longer tagged as experimental, with akka-http being the only exception as of the writing of this book.

Being an Akka library, and thus built on top of actors, Akka HTTP will be a much better fit with our current use of actors then either the Unfiltered or Dispatch libraries were. We can eliminate some extra thread pools that unfiltered and dispatch were using, using the actor system's dispatcher(s) instead, which should lead to better use of our CPU (less total threads to deal with).

As with all Akka libraries, the HTTP module is very actively maintained, and there is great community support out there. As we are all in on Akka already with this app, getting rid of two third-party libraries and replacing them with something from Akka can also be considered a big win. Depending on too many third-party libraries from many different sources can be an impediment to upgrading Scala (due to binary incompatibility) when that need arises, and that's not a boat we want to be in.

All in all, this seems like a great decision and will be part of our ongoing refactor process in the upcoming chapters.

 

Summary


Hopefully, at this point, you have a good understanding of what the example is. This includes knowing how the code is structured and works as well as how to interact with it using its endpoints. You also know the big shortcomings with the app and how we plan to go about fixing them. Armed with that knowledge, you are officially ready to start our progressive refactor to building a better reactive application.

Over the next set of chapters, we will take a tour through Akka's different feature offerings above and beyond just actors. We will incorporate these features into the example app one by one in an effort to resolve the shortcomings called out in this chapter. At the end of our journey, the hope is that you will have a much deeper understanding of these newer features and be well on your way to mastering Akka.

About the Author

  • Christian Baxter

    Christian Baxter, from an early age, has always had an interest in understanding how things worked. Growing up, he loved challenges and liked to tinker with and fix things. This inquisitive nature was the driving force that eventually led him into computer programming. While his primary focus in college was life sciences, he always set aside time to study computers and to explore all aspects of computer programming. When he graduated from college during the height of the Internet boom, he taught himself the necessary skills to get a job as a programmer. He’s been happily programming ever since, working across diverse industries such as insurance, travel, recruiting, and advertising. He loves building out high-performance distributed systems using Scala on the Akka platform.

    Christian was a long time Java programmer before making the switch over to Scala in 2010. He was looking for new technologies to build out high throughput and asynchronous systems and loved what he saw from Scala and Akka. Since then, he's been a major advocate for Akka, getting multiple ad tech companies he’s worked for to adopt it as a means of building out reactive applications. He's also been an occasional contributor to the Akka codebase, identifying and helping to fix issues. When he’s not hacking away on Scala and Akka, you can usually find him answering questions on Stackoverflow as cmbaxter.

    Browse publications by this author

Latest Reviews

(3 reviews total)
I love the books of Packt because are specific, croncrete, actual and cutting edge.
Very good book. Looking something similar to it with Java examples
Buenos libros aunque recien los estoy leyendo

Recommended For You

Scala Reactive Programming

Build fault-tolerant, robust, and distributed applications in Scala

By Rambabu Posa
Scala Microservices

Design, build, and run Microservices using Scala elegantly

By Jatin Puri and 1 more