Akka is one of the most popular Actor Model frameworks that provide a complete toolkit and runtime for designing and building highly concurrent, distributed, and fault-tolerant, event-driven applications on the JVM. This chapter will walk you through the motivation and need for building an Akka toolkit.
As Java/Scala developers, we will see the usage of creating applications using the Akka Actor Model, which scales up and scales out seamlessly, and provides levels of concurrency, which is simply difficult to achieve with the standard Java libraries.
Before we delve into what Akka is, let us take a step back to understand how the concept of concurrent programming has evolved in the application development world. The applications have always been tied to the underlying hardware resource capacity. The whole concept of building large, scalable, distributed applications needs to be looked at from the perspective of the underlying hardware resources where the application runs and the language support provided for concurrent programming.
The advancement of the microprocessor architecture meant the CPU kept becoming faster and faster with doubling of the transistors every 18 months (Moore's law). But soon, the chip design hit the physical limits in terms of how many transistors could be squeezed on to the printed circuit board (PCB). Subsequently, we moved to multicore processor architecture that has two or more identical processors or processor cores physically close to each other, sharing the underlying bus interface and the cache.
These microprocessors having two or more cores effectively increased the processor's performance by the same factor as the number of cores, limited only by the amount of serial code (Amdahl's law).
The preceding diagram from wiki, http://en.wikipedia.org/wiki/Transistor_count shows how the transistor count was doubled initially over the period of 18 months (following the Moore's Law) and how the multiprocessor architecture for consumer machines has evolved over the last 6-7 years.
When writing large concurrent systems, the traditional model of shared state concurrency makes use of changing shared memory locations. The system uses multithreaded programming coupled with synchronization monitors to guard against potential deadlocks. The entire multithreading programming model is based on how to manage and control the concurrent access to the shared, mutable state.
Manipulating shared, mutable state via threads makes it hard at times to debug problems. Usage of locks may guarantee the correct behavior, but it is likely to lead to the effect of threads running into a deadlock problem, with each acquiring locks in a different order and waiting for each other, as shown in the following diagram:
Working with threads requires a much higher level of programming skills and it is very difficult to predict the behavior of the threads in a runtime environment.
Java provides shared memory threads with locks as the primary form of concurrency abstractions. However, shared memory threads are quite heavyweight and incur severe performance penalties from context-switching overheads.
A newer Java API around fork/join, based on work-stealing algorithms, makes the task easier, but it still takes a fair bit of expertise and tuning to write the application.
Java Platform, Enterprise Edition (JEE) was introduced as a platform to develop and run distributed multitier Java applications. The entire multitier architecture is based on the concept of breaking down the application into specialized layers that process the smaller pieces of logic. These multitier applications are deployed in containers (called application servers) provided by vendors, such as IBM or Oracle, which host and provide the infrastructure to run the application. The application server is tuned to run the application and utilize the underlying hardware.
In case of runtime failures, the entire request call fails. It is very difficult to retry any method execution or recovery from failures.
The application scalability is tagged to the underlying application container settings. An application cannot make use of different threading models to account for different workloads within the same application.
Using the container-based model to scale out the applications requires a large set of resources, and overheads of managing the application across the application server nodes are very high.
Container-based applications are bounded by the rules of the container's ability to scale up and scale out, resulting in suboptimal performance.
The JEE programming model of writing distributed applications is not the best fit for a scale-out application model.
Given that the processors are becoming more parallel, the applications are getting more distributed, and traditional JVM programming techniques are not helpful. So, there is a need for a different paradigm to solve the problem.
In 1973, Carl Hewitt, Peter Bishop, and Richard Steiger wrote a paperâA Universal Modular ACTOR Formalism for Artificial Intelligence, which introduced the concept of Actors. Subsequently, the Actor Model was implemented in the Erlang language by Joe Armstrong and Ericsson implemented the AXD 301 telecom switch that went onto achieve reliability of 99.9999999 percent (nine 9's).
The Actor Model takes a different approach to solving the problem of concurrency, by avoiding the issues caused by threads and locks. In the Actor Model, all objects are modeled as independent, computational entities that only respond to the messages received. There is no shared state between actors, as follows:
The immutable messages are used to communicate between actors. Actors do not share state, and if any information is shared, it is done via message only. Actors control the access to the state and nobody else can access the state. This means there is no shared, mutable state.
Each actor has a queue attached where the incoming messages are enqueued. Messages are picked from the queue and processed by the actor, one at a time. An actor can respond to the received message by sending immutable messages to other actors, creating a new set of actors, updating their own state, or designating the computational logic to be used when the next message arrives (behavior change).
Messages are passed between actors asynchronously. It means that the sender does not wait for the message to be received and can go back to its execution immediately. Any actor can send a message to another actor with no guarantee on the sequence of the message arrival and execution.
Communication between the sender and receiver is decoupled and asynchronous, allowing them to execute in different threads. By having invocation and execution in separate threads coupled with no shared state, allows actors to provide a concurrent and scalable model.
The Akka framework has taken the Actor Model concept to build an event-driven, middleware framework that allows building concurrent, scalable, distributed systems. Akka uses the Actor Model to raise the abstraction level that decouples the business logic from low-level constructs of threads, locks, and non-blocking I/O.
Concurrency: Akka Actor Model abstracts the concurrency handling and allows the programmer to focus on the business logic.
Scalability: Akka Actor Model's asynchronous message passing allows applications to scale up on multicore servers.
Fault tolerance: Akka borrows the concepts and techniques from Erlang to build a "Let It Crash" fault-tolerance model using supervisor hierarchies to allow applications to fail fast and recover from the failure as soon as possible.
Event-driven architecture: Asynchronous messaging makes Akka a perfect platform for building event-driven architectures.
Transaction support: Akka implements transactors that combine actors and software transactional memory (STM) into transactional actors. This allows composition of atomic message flows with automatic retry and rollback.
Location transparency: Akka treats remote and local process actors the same, providing a unified programming model for multicore and distributed computing needs.
Scala/Java APIs: Akka supports both Java and Scala APIs for building applications.
The Akka framework is envisaged as a toolkit and runtime for building highly concurrent, distributed, and fault-tolerant, event-driven applications on the JVM.
Akka is open source and available under the Apache License, Version 2 at http://akka.io.
Akka was originally created by Jonas BonÃ©r and is currently available as part of the open source Typesafe Stack.
Next, we will see all the key constructs provided by Akka that are used to build a concurrent, fault-tolerant, and scalable application.
Actor is an independent, concurrent computational entity that responds to messages. Before we jump into actor, we need to understand the role played by the actor in the overall scheme of things. Actor is the smallest unit in the grand scheme of things. Concurrent programs are split into separate entities that work on distinct subtasks. Each actor performs his quota of tasks (subtasks) and when all the actors have finished their individual subtasks, the bigger task gets completed.
Let's take an example of an IT project that needs to deliver a defined functionality to the business. The project is staffed with people who bring different skill sets to the table, mapped for the different phases of the project as follows:
The whole task of building something is divided into subtasks/activities that are handled by specialized actors adept in that subtask. The overall supervision is provided by another actorâproject manager or architect.
In the preceding example, the project needs to exist and it should provide the structure for the various actors (project manager, architect, developer, and so on) to start playing their roles. In the absence of the project, the actor roles have no meaning and existence. In Akka world, the project is equivalent to the actor system.
Actors can change their state and behavior based on the message passed. This allows them to respond to changes in the messages coming in. An actor has the constituents that are listed in the following sections.
The actor objects hold instance variables that have certain state values or can be pure computational entities (stateless). These state values held by the actor instance variable define the state of the actor. The state can be characterized by counters, listeners, or references to resources or state machine. The actor state is changed only as a response to a message. The whole premise of the actor is to prevent the actor state getting corrupted or locked via concurrent access to the state variables.
Akka implements actors as a reactive, event-driven, lightweight thread that shields and protects the actor's state. Actors provide the concurrent access to the state allowing us to write programs without worrying about concurrency and locking issues.
When the actors fail and are restarted, the actors' state is reinitialized to make sure that the actors behave in a consistent manner with a consistent state.
Behavior is nothing but the computation logic that needs to be executed in response to the message received. The actor behavior might include changing the actor state. The actor behavior itself can undergo a change as a reaction to the message. It means the actor can swap the existing behavior with a new behavior when a certain message comes in. The actor defaults to the original behavior in case of a restart, when encountering a failure:
An actor responds to messages. The connection wire between the sender sending a message and the receiver actor receiving the message is called the mailbox. Every actor is attached to exactly one mailbox. When the message is sent to the actor, the message gets enqueued in its mailbox, from where the message is dequeued for processing by the receiving actor. The order of arrival of the messages in the queue is determined in runtime based on the time order of the send operation. Messages from one sender actor to another definite receiver actor will be enqueued in the same order as they are sent:
Akka provides multiple mailbox implementations. The mailboxes can be bounded or unbounded. A bounded mailbox limits the number of messages that can be queued in the mailbox, meaning it has a defined or fixed capacity for holding the messages.
At times, applications may want to prioritize a certain message over the other. To handle such cases, Akka provides a priority mailbox where the messages are enqueued based on the assigned priority. Akka does not allow scanning of the mailbox. Messages are processed in the same order as they are enqueued in the mailbox.
Akka makes use of dispatchers to pass the messages from the queue to the actors for processing. Akka supports different types of dispatchers. We will cover more about dispatchers and mailboxes in Chapter 5, Dispatchers and Routers.
Every actor that is defined and created has an associated lifecycle. Akka providesoks such as
preStart that allow the actor's state and behavior to be initialized. When the actor is stopped, Akka disables the message queuing forctor before
PostStoked. In the
postStop hook, any persistence of the state or clean up of any hold-up resources can be done:
Akka follows the premise of the actor hierarchy where we have specialized actors that are adept in handling or performing an activity. To manage these specialized actors, we have supervisor actors that coordinate and manage their lifecycle. As the complexity of the problem grows, the hierarchy also expands to manage the complexity. This allows the system to be as simple or as complex as required based on the tasks that need to be performed:
The whole idea is to break down the task into smaller tasks to the point where the task is granular and structured enough to be performed by one actor. Each actor knows which kind of message it will process and how he reacts in terms of failure. So, if the actor does not know how to handle a particular message or an abnormal runtime behavior, the actor asks its supervisor for help. The recursive actor hierarchy allows the problem to be propagated upwards to the point where it can be handled. Remember, every actor in Akka has one and only one supervisor.
This actor hierarchy forms the basis of the Akka's "Let It Crash" fault-tolerance model. Akka's fault-tolerance model is built using the actor hierarchy and supervisor model. We will cover more details about supervision in Chapter 6, Supervision and Monitoring.
For a distributed application, all actor interactions need to be asynchronous and location transparent. Meaning, location of the actor (local or remote) has no impact on the application. Whether we are accessing an actor, or invoking or passing the message, everything remains the same.
To achieve this location transparency, the actors need to be identifiable and reachable. Under the hood, Akka uses configuration to indicate whether the actor is running locally or on a remote machine. Akka uses the actor hierarchy and combines it with the actor system address to make each actor identifiable and reachable.
Akka uses the same philosophy of the
World Wide Web (WWW) to identify and locate resources on the Web. WWW makes use of the uniform resource locator (URL) to identify and locate resources on the Web. The URL consists ofâ
scheme defines the protocol (HTTP or FTP),
domain defines the server name or the IP address,
port defines the port where the process listens for incoming requests, and
path specifies the resource to be fetched.
Akka uses the similar URL convention to locate the actors. In case of an Akka application, the default values are
akka://hostname:2552/ depending upon whether the application uses remote actors or not, to identify the application. To identify the resource within the application, the actor hierarchy is used to identify the location of the actor:
The actor hierarchy allows the unique path to be created to reach any actor within the actor system. This unique path coupled with the address creates a unique address that identifies and locates an actor.
Within the application, each actor is accessed using an
ActorRef class, which is based on the underlying actor path.
ActorRef allows us to transparently access the actors without knowing their locations. Meaning, the location of the actor is transparent for the application. The location transparency allows you to build applications without worrying how the actors communicate underneath.
To provide transaction capabilities to actors, Akka transactors combine actors with STM to form transactional actors. This allows actors to compose atomic message flows with automatic retry and rollback.
Working with threads and locks is hard and there is no guarantee that the application will not run into locking issues. To abstract the threading and locking hardships, STM, which is a concurrency control mechanism for managing access to shared memory in a concurrent environment, has gained a lot of acceptance.
STM is modeled on similar lines of database transaction handling. In the case of STM, the Java heap is the transactional data set with begin/commit and rollback constructs. As the objects hold the state in memory, the transaction only implements the characteristicsâatomicity, consistency, and isolation.
For actors to implement a shared state model and provide a consistent, stable view of the state across the calling components, Akka transactors provide the way. Akka transactors combine the Actor Model and STM to provide the best of both worlds allowing you to write transactional, asynchronous, event-based message flow applications and gives you composed atomic arbitrary, deep message flows. We will cover transactors in more details in the Chapter 7, Software Transactional Memory.
As we move ahead and delve deep into the constructs provided by the Akka framework, we need to make sure that we keep in mind the following concepts:
An actor is a computation unit with state, behavior, and its own mailbox
There are two types of actorsâuntyped and typed
Communication between actors can be asynchronous or synchronous
Message passing to the actors happens using dispatchers
Actors are organized in a hierarchy via the actor system
Actors are proxied via
Supervisor actors are used to build the fault-tolerance mechanism
Actor path follows the URL scheme, which enables location transparency
STM is used to provide transactional support to multiple actor state updates
Now that we have seen what Akka is and the key features of Akka, let's delve into the use cases where Akka fits in best. Any business use case that requires the application to scale up and scale out, be fault tolerant, or provide High Availability (HA), requires massive concurrency/parallelism, which is a prime target for use of the Akka Actor Model. The following are the use cases for Akka:
Transaction processing: This includes processing large data streams, where the incoming data is either time series or transactional data. The stream pumps in large amount of data that needs to be processed in parallel and concurrently. The output of the data processing might be used in real time or might be fed into analytical systems. Finance, banking, securities, trading, telecom, social media, analytics, and online gaming are some of the domain enterprises that deal with large data coming in from multiple sources, which needs to be processed, analyzed, and reported.
Service providers: Another area is where the application provides services to various other clients via variety of service means such as SOAP, REST, Cometd, or WebSockets. The application generally caters to a massive amount of stateless requests that need to be processed fast and concurrently.
Batch processing: Batch processing used across enterprise domains is another area where Akka shines very well. Dealing with large data, applying paradigms such as divide and conquer, map-reduce, master-worker, and grid computing allows massive data to be processed. The data might be coming in via real-time feeds, or it might be unstructured data (coming via logfiles) or data read from existing data stores.
Data mining/analytics/Business Intelligence: Most enterprises generate large amounts of dataâstructured as well as unstructured. Applications that mine this data from existing transactional stores or data warehouses can use Akka to process and analyze these massive sets of data.
Service gateways/hubs: Service gateways or hubs connect multiple systems or applications together, and provide mediation and transformation services. An Akka-based application can provide those scale-up and scale-out options along with high availability for applications in this space.
Apps requiring concurrency/parallelism: Any application that needs to process data in parallel or provide/support concurrency can make use of Akka. Akka provides a faster time to market for such applications, as writing and testing such applications is far easier and less error-prone compared to traditional thread-based concurrent applications. Akka JARs can be easily dropped into existing Java or Scala applications and the applications can start making use of the Actor Model.
At times, Akka needs to be used in conjunction with other frameworks or libraries to build the complete application. Some of the common frameworks that work very well with Akka are Play framework, ZeroMQ, Apache Camel, and Spring framework, among others. We will explore the usage of Play framework and ZeroMQ with Akka.
This completes the introduction to Akka, where we saw the evolution of the microprocessors, the problems with writing/using the Java concurrency models for distributed applications, and how Akka's Actor Model provides an answer to the two problems. We also learned the key constructs that define the Akka framework and sample use cases where Akka is a prime candidate for use.
In the next chapter, we will get started with Akka, we will go through the motions of installing the development environment and write our first Akka application.