Distributed Computing in Java 9

Quick Start to Distributed Computing

Distributed computing is the process of accomplishing a bigger task through splitting it into multiple subtasks, which can be performed by multiple components that are located in a network of computers termed as distributed systems. These distributed systems have the capability to communicate and coordinate their activities by exchanging the information and/or status of their individual processes. Having such distributed systems allows organizations to maintain comparatively smaller and cheaper computers in a network rather than having to maintain one large server with bigger capacity.

In this chapter, we will cover the following topics:

Evolution of computing models
Parallel computing
Amdahl's law
Distributed computing
Parallel versus distributed computing
Design considerations for distributed systems
Java support

Let's begin our discussion by remembering the great Charles Babbage, considered to be the "father of the computer", who originated the concept of a programmable computer. He, who was an English mechanical engineer and polymath, conceptualized and invented the first mechanical computer in the early 19th century. While Alan Turing introduced the principle of the modern computer in 1936, modern digital computers were heralded to the world in the 1940s, and the Electronic Numerical Integrator and Computer (ENIAC) was among the earliest electronic general-purpose computers made. From there on, computers have evolved to be faster and cheaper at an astonishing rate, along with the operating systems, programming languages, and so on. The computers with such faster processing capacity were called supercomputers and used to occupy more than one big room years ago. Today, we have multicore processing capacity computers such as minicomputers and mobiles/smart phones, which can be carried in a pocket and are able to do most of jobs humans need in day-to-day life.

While a computer may be regarded as executing one gigantic program stored in its main memory, in some computers, it is necessary to have the capacity of executing several programs concurrently. This is achieved through multitasking; that is, the computer is enabled to switch rapidly between multiple executing programs to show them running simultaneously.

Next-generation computers are designed to distribute their process across numerous CPUs in a multiprocessing configuration. This technique was earlier available in huge and commanding computers, such as supercomputers, servers, and mainframe computers. Nowadays, such multiprocessor and multicore capabilities are extensively available on personal computers and laptops.

Although such high-speed computers are demonstrating delightful processing abilities, the next serious invention that transformed the world of processing was high-speed computer networking. This technique permitted an enormous number of computers to interact and established the next level of processing. The incredible fact about networked computers is that they can be placed geographically either within the same location connected as Local Area Network (LAN) or be situated across continents and connected as Wide Area Network (WAN).

Today, a new computer/smartphone is definitely expected to have multiprocessor/multicore capacity at an affordably low cost. Besides, the trend has changed from CPU to Graphics Processing Unit (GPU), also called as Visual Processing Unit (VPU), which can be installed in personal computers, mobile phones, workstations, embedded systems, and gaming consoles. Recent GPUs are very capable of computer graphics manipulation and image processing, and they are more efficient than general-purpose CPUs due to their highly parallel assembly.

The following diagram represents the evolution of computing models from mainframe to cloud, how each concern like availability SLA, Scaling, Hardware, HA Type, Software and Consumption are varied over the time with technology.

Early computing was a uniprocessor computing that was performed on a single processor, which can be called centralized computing. Later, parallel computing with more than one processor simultaneously executing a single program helped middleware processing. Parallel processing was achieved through either a single computer with multiple CPUs or multiple network connected computers (with the help of software).

Let us now learn in detail about parallel computing and how the trend moved toward distributed computing.

Parallel computing

A parallel system contains more than one processor having direct memory access to the shared memory that can form a common address space. Usually, a parallel system is of a Uniform Memory Access (UMA) architecture. In UMA architecture, the access latency (processing time) for accessing any particular location of a memory from a particular processor is the same. Moreover, the processors are also configured to be in a close proximity and are connected in an interconnection network. Conventionally, the interprocess processor communication between the processors is happening through either read or write operations across a shared memory, even though the usage of the message-passing capability is also possible (with emulation on the shared memory). Moreover, the hardware and software are tightly coupled, and usually, the processors in such network are installed to run on the same operating system. In general, the processors are homogeneous and are installed within the same container of the shared memory. A multistage switch/bus containing a regular and symmetric design is used for greater efficiency.

The following diagram represents a UMA parallel system with multiple processors connecting to multiple memory units through network connection.

A multicomputer parallel system is another type of parallel system containing multiple processors configured without having a direct accessibility to the shared memory. Moreover, a common address space may or may not be expected to be formed by the memory of the multiple processors. Hence, computers belonging to this category are not expected to contain a common clock in practice. The processors are configured in a close distance, and they are also tightly coupled in general with homogeneous software and hardware. Such computers are also connected within an interconnected network. The processors can establish a communication with either of the common address space or message passing options. This is represented in the diagram below.

A multicomputer system in a Non-Uniform Memory Access (NUMA) architecture is usually configured with a common address space. In such NUMA architecture, accessing different memory locations in a shared memory across different processors shows different latency times.

Array processor exchanges information by passing as messages. Array processors have a very small market owing to the fact that they can perform closely synchronized data processing, and the data is exchanged in a locked event for applications such as digital signal processing and image processing. Such applications can also involve large iterations on the data as well.

Compared to the UMA and array processors architecture, NUMA as well as message-passing multicomputer systems are less preferred if the shared data access and communication much accepted. The primary benefit of having parallel systems is to derive a better throughput through sharing the computational tasks between multiple processors. The tasks that can be partitioned into multiple subtasks easily and need little communication for bringing synchronization in execution are the most efficient tasks to execute on parallel systems. The subtasks can be executed as a large vector or an array through matrix computations, which are common in scientific applications. Though parallel computing was much appreciated through research and was beneficial on legacy architectures, they are observed no more efficient/economic in recent times due to following reasons:

They need special configuration for compilers
The market for such applications that can attain efficiency through parallel processing is very small
The evolution of more powerful and efficient computers at lower costs made it less likely that organizations would choose parallel systems.

Amdahl's law

Amdahl's law is frequently considered in parallel computing to forecast the improvement in process speedup when increasing the use of multiple system processors. Amdahl's Law is named after the famous computer scientist Gene Amdahl; it was submitted at the American Federation of Information Processing Societies (AFIPS) during the Spring Joint Computer Conference in the year 1967.

The standard formula for Amdahl's Law is as follows:

where:

S_latency is the calculated improvement of the latency (execution) of the complete task.
s is the improvement in execution of the part of the task that benefits from the improved system resources.
p is the proportion of the execution time that the part benefiting from improved resources actually occupies.

Let's consider an example of a single task that can be further partitioned into four subtasks: each of their execution time percentages are p1 = 0.11, p2 = 0.18, p3 = 0.23, and p4 = 0.48, respectively. Then, it is observed that the first subtask is improved in speed, so s1 = 1. The second subtask is observed to be improved in speed by five times, so s2 = 5. The third subtask is observed to be improved in speed by 20 times, so s3 = 20. Finally, the fourth subtask is improved in speed by 1.6 times, so s4 = 1.6.

By using Amdahl's law, the overall speedup is follows:

Notice how the 20 times and 5 times speedup on the second and third parts, respectively, don't have much effect on the overall speedup when the fourth part (48% of the execution time) is sped up only 1.6 times.

The following formula demonstrates that the theoretical speedup of the entire program execution improves with the increase of the number/capacity of resources in the system and that, regardless with the magnitude of the improvement, the calculated improvement of the entire program is always expected to be limited by that particular task that cannot benefit from the resource improvement.

Consider if a program is expected to need about 20 hours to complete the processing with the help of a single processor. A specific sub task of the entire program that is expected to consume an hour to execute cannot be executed in parallel, while the remaining program of about 19 hours processing (p = 0.95) of the total execution time can be executed in parallel. In such scenarios, regardless of how many additional processors are dedicated to be executed in parallel of such program, the execution time of the program cannot be reduced to anytime less than that minimum 1 hour. Obviously, the expected calculated improvement of the execution speed is limited to, at most, 20 times (calculated as 1/(1 − p) = 20). Hence, parallel computing is applicable only for those processors that have more scope for having the capability of splitting them into subtasks/parallel programs as observed in the diagram below.

However, Amdahl's law is applicable only to scenarios where the program is of a fixed size. In general, on larger problems (larger datasets), more computing resources tend to get used if they are available, and the overall processing time in the parallel part usually improves much faster than the by default serial parts.

Distributed computing

Distributed computing is the concurrent usage of more than one connected computer to solve a problem over a network connection. The computers that take part in distributed computing appear as single machines to their users.

Distributing computation across multiple computers is a great approach when these computers are observed to interact with each other over the distributed network to solve a bigger problem in reasonably less latency. In many respects, this sounds like a generalization of the concepts of parallel computing that we discussed in the previous section. The purpose of enabling distributed systems includes the ability to confront a problem that is either bigger or longer to process by an individual computer.

Distributed computing, the latest trend, is performed on a distributed system, which is considered to be a group of computers that do not stake a common physical clock or a shared memory, interact with the information exchanged over a communication (inter/intra) network, with each computer having its own memory, and runs on its own operating system. Usually, the computers are semi-autonomous, loosely coupled and cooperate to address a problem collectively.

Examples of distributed systems include the Internet, an intranet, and a Network of Workstations (NOW), which is a group of networked personal workstations connected to server machines represented in the diagram above. Modern-day internet connections include a home hub with multiple devices connected and operating on the network; search engines such as Google and Amazon services are famous distributed systems. Three-dimensional animation movies from Pixar and DreamWorks are other trendy examples of distributed computing.

Given the number of frames to condense for a full-length feature (30 frames per second on a 2-hour movie, which is a lot!), movie studios have the requirement of spreading the full-rendering job to more computers.

In the preceding image, we can observe a web application, another illustration of a distributed application where multiple users connect to the web application over the Internet/intranet. In this architecture, the web application is deployed in a web server, which interacts with a DB server for data persistence.

The other aspects of the application requiring a distributed system configuration are instant messaging and video conferencing applications. Having the ability to solve such problems, along with improved performance, is the reason for choosing distributed systems.

The devices that can take part in distributed computing include server machines, work stations, and personal handheld devices.

Capabilities of distributed computing include integrating heterogeneous applications that are developed and run on different technologies and operating systems, multiple applications sharing common resources, a single instance service being reused by multiple clients, and having a common user interface for multiple applications.

Parallel versus distributed computing

While both distributed computing and parallel systems are widely available these days, the main difference between these two is that a parallel computing system consists of multiple processors that communicate with each other using a shared memory, whereas a distributed computing system contains multiple processors connected by a communication network.

In parallel computing systems, as the number of processors increases, with enough parallelism available in applications, such systems easily beat sequential systems in performance through the shared memory. In such systems, the processors can also contain their own locally allocated memory, which is not available to any other processors.

In distributed computing systems, multiple system processors can communicate with each other using messages that are sent over the network. Such systems are increasingly available these days because of the availability at low price of computer processors and the high-bandwidth links to connect them.

The following reasons explain why a system should be built distributed, not just parallel:

Scalability: As distributed systems do not have the problems associated with shared memory, with the increased number of processors, they are obviously regarded as more scalable than parallel systems.
Reliability: The impact of the failure of any single subsystem or a computer on the network of computers defines the reliability of such a connected system. Definitely, distributed systems demonstrate a better aspect in this area compared to the parallel systems.
Data sharing: Data sharing provided by distributed systems is similar to the data sharing provided by distributed databases. Thus, multiple organizations can have distributed systems with the integrated applications for data exchange.
Resources sharing: If there exists an expensive and a special purpose resource or a processor, which cannot be dedicated to each processor in the system, such a resource can be easily shared across distributed systems.
Heterogeneity and modularity: A system should be flexible enough to accept a new heterogeneous processor to be added into it and one of the processors to be replaced or removed from the system without affecting the overall system processing capability. Distributed systems are observed to be more flexible in this respect.
Geographic construction: The geographic placement of different subsystems of an application may be inherently placed as distributed. Local processing may be forced by the low communication bandwidth more specifically within a wireless network.
Economic: With the evolution of modern computers, high-bandwidth networks and workstations are available at low cost, which also favors distributed computing for economic reasons.

Design considerations for distributed systems

Following are some of the characteristics of distributed systems that should be considered in designing a project in a distributed environment:

No global clock: Being distributed across the world, distributed systems cannot be expected to have a common clock, and this gives a chance for the intrinsic asynchrony between the processors performing the computing. Distributed system coordination usually depends on a shared idea of the time at which the programs or business state occurs. However, with distributed systems, having no global clock, it is a challenge to attain the accuracy with which the computers in the network can synchronize their clocks to reflect the time at which the expected program execution happened. This limitation expects the systems in the network to communicate through messages instead of time-based events.

Geographical distribution: The individual systems taking a part in distributed system are expected to be connected through a network, previously through a Wide-Area Network (WAN), and now with a Network Of Workstations/Cluster Of Workstations (NOW/COW). An in-house distributed system is expected to be configured within a LAN connectivity. NOW is becoming widespread in the market with its low-cost, high-speed, off-the-shelf processing capability. Most popular NOW architectures include the Google search engine, Amazon.

No shared memory: An important and key feature of distributed computing and the message-passing model of communication is having no shared memory, which also infers the nonexistence of a common physical clock.

Independence and heterogeneity: The distributed system processors are loosely coupled so that they have their own individual capabilities in terms of speed and method of execution with versatile operating systems. They are not expected to be part of a dedicated system; however, they cooperate with one another by exposing the services and/or executing the tasks together as subtasks.

Fail-over mechanism: We often see computer systems failing, and it is the design responsibility of setting the expected behavior with the consequence of possible failures. Distributed systems are observed to be failed in integration as well as the individual sub systems. A fault in the network can result in the isolation of an individual or a group of computers in the distributed system; however, they might still be executing the programs they are expected to execute. In reality, the individual programs may not be able to detect such network failures or timeouts. Similarly, the failure of a particular computer, a system being terminated abruptly with an abrupt program or system failure, may not immediately be known by the other systems/components in the network with which the failed computer usually communicates with. The consequences of this characteristic of distributed systems has to be captured in the system design.

Security concerns: Distributed systems being set up on a shared Internet are prone to more unauthorized attacks and vulnerabilities.

Distributed systems are becoming increasingly popular with their ability to allow the polling of resources, including CPU cycles, data storage, devices and services becoming increasingly economical. Distributed systems are more reliable as they allow replication of resources and services, which reduces service outages due to individual system failures. Cost, speed, and availability of Internet are making it a decent platform on which to maintain distributed systems.

Java support

From a standalone application to web applications to the sophisticated cloud integration of enterprise, Java has been updating itself to accommodate various features that support the change. Especially, frameworks like Spring have come up with modules like Spring Boot, Batch, and Integration, which comply with most of the cloud integration features. As a language, Java has a great support for programs to be written using multithreaded distributed objects. In this model, an application contains numerous heavyweight processes that communicate using messaging or Remote Method Invocations (RMI). Each heavyweight process contains several lightweight processes, which are termed as threads in Java. Threads can communicate through the shared memory. Such software architecture reflects the hardware that is configured to be extensively accessible.

By assuming that there is, at most, one thread per process or by ignoring the parallelism within one process, it is the usual model of a distributed system. The purpose of making the logically simple is that the distributed program is more object-oriented because data in a remote object can be accessed only through an explicit messaging or a remote procedure call (RPC).

The object-orientated model promotes reusability as well as design simplicity. Furthermore, a large shared data structure has the requirement of shared processing, which is possible through object orientation and letting the process of execution be multithreaded. The programming should carry the responsibility of splitting the larger data structure across multiple heavyweight processes.

Programming language, which wants to support concurrent programming, should be able to instruct the process structure, and how several processes communicate with each other and synchronize. There are many ways the Java program can specify the process structure or create a new process. For example, UNIX processes are tree structured containing a unique process ID (pid) for each process. fork and wait are the commands to create and synchronize the processes. The fork command creates a child process from a parent process with a parent process address space copy:

pid = fork();
if (pid != 0 ) {
  cout << "This is a parent process";
}
else {
  cout << "This is a child process"; 
}

Java has a predefined class called Thread to enable concurrency through creating thread objects. A class can extend the Thread class if it should be executed in a separate thread, override the run() method, and execute the start() method to launch that thread:

public class NewThread extends Thread {
  public void run() {
    System.out.println("New Thread executing!");
  }
  public static void main(String[] args) {
    Thread t1 = new NewThread();
    t1.start();
  }
}

In the cases where a class has to extend another class and execute as a new thread, Java supports this behavior through the interface Runnable, as shown in the following example:

public class Animal {
  String name;
  public Animal(String name) {
    this.name = name;
  }
  public void setName(String name) {
    this.name = name;
  }
  public String getName() {
    return this.name;
  }
}
public class Mammal extends Animal implements Runnable {
  public Mammal(String name) {
    super(name);
  }
  public void run() {
    for (int i = 0; i < 100; i++) {
      System.out.println("The name of the Animal is : " + this.getName());
    }
  }
  public static void main(String[] args) {
    Animal firstAnimal = new Mammal("Tiger");
    Thread threadOne = new Thread((Runnable) firstAnimal);
    threadOne.start();
    Animal secondAnimal = new Mammal("Elephant");
    Thread threadTwo = new Thread((Runnable) secondAnimal);
    threadTwo.start();
  }
}

In the following example of Fibonacci numbers, a thread waits for completing the execution of other threads using the Join mechanism. Threads can carry the priority as well to set the importance of one thread over the other to execute before:

public class Fib extends Thread
{
  private int x;
  public int answer;
  public Fib(int x) {
    this.x = x;
  }
  public void run() {
    if( x <= 2 )
    answer = 1;
    else {
      try {
        Fib f1 = new Fib(x-1);
        Fib f2 = new Fib(x-2);
        f1.start();
        f2.start();
        f1.join();
        f2.join();
        answer = f1.answer + f2.answer;
      }
      catch(InterruptedException ex) { }
    }
  }
  public static void main(String[] args)  throws Exception
  {
    try {
      Fib f = new Fib( Integer.parseInt(args[0]) );
      f.start();
      f.join();
      System.out.println(f.answer);
    }
    catch(Exception ex) {
      System.err.println("usage: java Fib NUMBER");
    }
  }
}

With the latest Java version, a Callable interface is introduced with a @FunctionalInterface annotation. With the help of this feature, we can create Callable objects using lambda expressions as follows:

Callable<Integer> callableObject = () -> { return 5 + 9; };

The preceding expression is equivalent to the following code:

Callable<Integer> callableObject = new Callable<Integer>() {
  @Override
  public Integer call() throws Exception {
    return 5 + 6;
  }
};

Following is the complete example with Callable and Future interfaces and lambda expressions for handing concurrent processing in Java 9:

package threads;

import java.util.Arrays;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

public class JavaCallableThreads {

  public static void main(String[] args) {
    final List<Integer> numbers = Arrays.asList(1,2,3,4,5);
    Callable<Integer> callableObject = () -> {
      int sum = numbers.stream().mapToInt(i -> i.intValue()).sum();
      return sum;
    };
    ExecutorService exService = Executors.newSingleThreadExecutor();
    Future<Integer> futureObj = exService.submit(callableObject);
    Integer futureSum=0;
    try {
    futureSum = futureObj.get();
  } catch (InterruptedException e) {
    e.printStackTrace();
  } catch (ExecutionException e) {
    e.printStackTrace();
  }
    System.out.println("Sum returned = " + futureSum);
  }

}

Modern Java enterprise applications have evolved through messaging (through message queue), web services, and writing microservices based distributed application like docker with applications deployed on cloud computing services like RedHat OpenShift, Amazon Web Services (AWS), Google App Engine and Kubernetes.

We will discuss the Java 9 support for such application development and deployment in detail in the coming chapters.