Reader small image

You're reading from  Practical Site Reliability Engineering

Product typeBook
Published inNov 2018
PublisherPackt
ISBN-139781788839563
Edition1st Edition
Right arrow
Authors (3):
Pethuru Raj Chelliah
Pethuru Raj Chelliah
author image
Pethuru Raj Chelliah

 Pethuru Raj Chelliah (PhD) works as the chief architect at the Site Reliability Engineering Center of Excellence, Reliance Jio Infocomm Ltd. (RJIL), Bangalore. Previously, he worked as a cloud infrastructure architect at the IBM Global Cloud Center of Excellence, IBM India, Bangalore, for four years. He also had an extended stint as a TOGAF-certified enterprise architecture consultant in Wipro Consulting services division and as a lead architect in the corporate research division of Robert Bosch, Bangalore. He has more than 17 years of IT industry experience.
Read more about Pethuru Raj Chelliah

Shreyash Naithani
Shreyash Naithani
author image
Shreyash Naithani

Shreyash Naithani is currently a site reliability engineer at Microsoft R&D. Prior to Microsoft, he worked with both start-ups and mid-level companies. He completed his PG Diploma from the Centre for Development of Advanced Computing, Bengaluru, India, and is a computer science graduate from Punjab Technical University, India. In a short span of time, he has had the opportunity to work as a DevOps engineer with Python/C#, and as a tools developer, site/service reliability engineer, and Unix system administrator. During his leisure time, he loves to travel and binge watch series.
Read more about Shreyash Naithani

Shailender Singh
Shailender Singh
author image
Shailender Singh

Shailender Singh is a principal site reliability engineer and a solution architect with around 11 year's IT experience who holds two master's degrees in IT and computer application. He has worked as a C developer on the Linux platform. He had exposure to almost all infrastructure technologies from hybrid to cloud-hosted environments. In the past, he has worked with companies including Mckinsey, HP, HCL, Revionics and Avalara and these days he tends to use AWS, K8s, Terraform, Packer, Jenkins, Ansible, and OpenShift.
Read more about Shailender Singh

View More author details
Right arrow

Reactive systems 


We have seen how reliable systems are being realized through the service mesh concept. This is another approach for bringing forth reliable software systems. A reactive system is a new concept based on the widely circulated reactive manifesto. There are reactive programming models and techniques to build viable reactive systems. As described previously, any software system is comprised of multiple modules. Also, multiple components and applications need to interact with each other reliably to accomplish certain complex business functionality. In a reactive system, the individual systems are intelligent. However, the key differentiator is the interaction between the individual parts. That is, the ability to operate individually yet act in concert to achieve the intended outcome clearly differentiates reactive systems from others. A reactive system architecture allows multiple individual applications to co-exist and coalesce as a single unit and react to its surroundings adaptively. This means that they are able to scale up or down based on user and data loads, load balance, and act intelligently to be extremely sensitive and royally responsive.

It is possible to write an application in a reactive style using the proven reactive programming processes, patterns, and platforms. However, for working together to achieve evolving business needs quickly, it needs a lot more. In short, it is not that easy making a system reactive. Reactive systems are generally designed and built according to the tenets of the highly popular Reactive Manifesto. This manifesto document clearly prescribes and promotes the architecture that is responsive, resilient, elastic, and message driven. Increasingly, microservices and message-based service interactions become the widely used standard for having flexible, elastic, resilient, and loosely coupled systems. These characteristics, without an iota of doubt, are the central and core concepts of reactive systems.

Reactive programming is a subset of asynchronous programming. This is an emerging paradigm where the availability of new information (events and messages) drives the processing logic forward. Traditionally, some action gets activated and accomplished using threads of execution based on control and data flows. 

This unique programming style intrinsically supports decomposing the problem into multiple discrete steps, and each step can be executed in an asynchronous and non-blocking fashion. Then, those steps can be composed to produce a composite workflow possibly unbounded in its inputs or outputs. Asynchronous processing means the processing of incoming messages or events happen sometime in the future. The event creators and message senders need not wait for the processing and the execution to get done to proceed with their responsibilities. This is generally called non-blocking execution. The threads of execution need not compete for a shared resource to get things done immediately. If the resource is not available immediately, then the threads need not wait for the unavailable resource and instead continue with other tasks at hand, using their respective resources. The point is that they can do their work without any stoppage while waiting for appropriate resources for a particular task at a particular point in time. In other words, they do not prevent the thread of execution from performing other work until the current work is done. They can perform other useful work while the resource is being occupied.

In the future, software applications have to be sensitive and responsive. The futuristic and people-centric applications, therefore, have to be capable of receiving events to be adaptive. Event capturing, storing, and processing are becoming important for enterprise, embedded, and cloud applications. Reactive programming is emerging as an important concept for producing event-driven software applications. There are simple as well as complex events. Events are primarily being streamed continuously, and hence the event-processing feature is known as streaming analytics these days. There are several streaming analytics platforms, such as Spark Streams, Kafka Streams, Apache Flink, Storm, and so on, for extricating actionable insights out of streams.

In the increasingly event-driven world, EDAs and programming models acquire more market and mind shares. And thus reactive programming is a grandiose initiative to provide a standard solution for asynchronous stream processing with non-blocking back pressure. The key benefits of reactive programming include the increased utilization of computing resources on multi-core and multi-processor hardware. There are several competent event-driven programming libraries, middleware solutions, enabling frameworks, and architectures to carefully capture, cleanse, and crunch millions of events per second. The popular libraries for facilitating event-driven programming include Akka Streams, Reactor, RxJava, and Vert.x. 

Reactive programming versus reactive systems: There is a huge difference between reactive programming and reactive systems. As indicated previously, reactive programming is primarily event-driven. Reactive systems, on the other hand, are message-driven and focus on creating resilient and elastic software systems. Messages are the prime form of communication and collaboration. Distributed systems coordinate by sending, receiving, and processing messages. Messages are inherently directed, whereas events are not. Messages have a clear direction and destination. Events are facts for others to observe and act upon with confidence and clarity. Messaging is typically asynchronous with the sender and the reader is decoupled. In a message-driven system, addressable recipients wait for messages to arrive. In an event-driven system, consumers are integrated with sources of events and event stores.

In a reactive system, especially one that uses reactive programming, both events and messages will be present. Messages are a great tool for communication, whereas events are the best bet for unambiguously representing facts. Messages ought to be transmitted across the network and form the basis for communication in distributed systems. Messaging is being used to bridge event-driven systems across the network. Event-driven programming is therefore a simple model in a distributed computing environment. That is not the case with messaging in distributed computing environments. Messaging has to do a lot of things because there are several constraints and challenges in distributed computing. That is, messaging has to tackle things such as partial failures, failure detection, dropped/duplicated/reordered messages, eventual consistency, and managing multiple concurrent realities. These differences in semantics and applicability have intense implications in the application design, including things such as resilience, elasticity, mobility, location transparency, and management complexities of distributed systems.

Reactive systems are highly reliable

Reactive systems fully comply with the reactive manifesto (resilient, responsive, elastic, and message-driven), which was contemplated and released by a group of IT product vendors. A variety of architectural design and decision principles are being formulated and firmed up for building most modernized and cognitive systems that are innately capable of fulfilling todays complicated yet sophisticated requirements. Messages are the most optimal unit of information exchange for reactive systems to function and facilitate. These messages create a kind of temporal boundary between application components. Messages enable application components to be decoupled in time (this allows for concurrency) and in space (this allows for distribution and mobility). This decoupling capability facilitates the much-needed isolation among various application services. Such a decoupling ultimately ensures the much-needed resiliency and elasticity, which are the most sought-after needs for producing reliable systems. 

Resilience is about the capability of responsiveness even under failure and is an inherent functional property of the system. Resilience is beyond fault-tolerance, which is all about graceful degradation. It is all about fully recovering from any failure. It is empowering systems to self-diagnose and self-heal. This property requires component isolation and containment of failures to avoid failures spreading to neighboring components. If errors and failure are allowed to cascade into other components, then the whole system is bound to fail.

So, the key to designing, developing, and deploying resilient and self-healing systems is to allow any type of failure to be proactively found and contained, encoded as messages, and sent to supervisor components. These can be monitored, measured, and managed from a safe distance. Here, being message-driven is the greatest enabler. Moving away from tightly coupled systems to loosely and lightly coupled systems is the way forward. With less dependency, the affected component can be singled out, and the spread of errors can be nipped in the bud.

The elasticity of reactive systems

Elasticity is about the capability of responsiveness under a load. Systems can be used by many users suddenly, or a lot of data can be pumped by hundreds of thousands of sensors and devices into the system. To tackle this unplanned rush of users and data, systems have to automatically scale up or out by adding additional resources (bare metal servers, virtual machines, and containers). The cloud environments are innately enabled to be auto-scaling based on varying resource needs. This capability makes systems to use their expensive resources in an optimized manner. When resource utilization goes up, the capital and operational costs of systems comes down sharply.

Systems need to be adaptive enough to perform auto-scaling, replication of state, and behavior, load-balancing, fail-over, and upgrades without any manual intervention, instruction, and interpretation. In short, designing, developing, and deploying reactive systems through messaging is the need of the hour.

Previous PageNext Page
You have been reading a chapter from
Practical Site Reliability Engineering
Published in: Nov 2018Publisher: PacktISBN-13: 9781788839563
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (3)

author image
Pethuru Raj Chelliah

 Pethuru Raj Chelliah (PhD) works as the chief architect at the Site Reliability Engineering Center of Excellence, Reliance Jio Infocomm Ltd. (RJIL), Bangalore. Previously, he worked as a cloud infrastructure architect at the IBM Global Cloud Center of Excellence, IBM India, Bangalore, for four years. He also had an extended stint as a TOGAF-certified enterprise architecture consultant in Wipro Consulting services division and as a lead architect in the corporate research division of Robert Bosch, Bangalore. He has more than 17 years of IT industry experience.
Read more about Pethuru Raj Chelliah

author image
Shreyash Naithani

Shreyash Naithani is currently a site reliability engineer at Microsoft R&D. Prior to Microsoft, he worked with both start-ups and mid-level companies. He completed his PG Diploma from the Centre for Development of Advanced Computing, Bengaluru, India, and is a computer science graduate from Punjab Technical University, India. In a short span of time, he has had the opportunity to work as a DevOps engineer with Python/C#, and as a tools developer, site/service reliability engineer, and Unix system administrator. During his leisure time, he loves to travel and binge watch series.
Read more about Shreyash Naithani

author image
Shailender Singh

Shailender Singh is a principal site reliability engineer and a solution architect with around 11 year's IT experience who holds two master's degrees in IT and computer application. He has worked as a C developer on the Linux platform. He had exposure to almost all infrastructure technologies from hybrid to cloud-hosted environments. In the past, he has worked with companies including Mckinsey, HP, HCL, Revionics and Avalara and these days he tends to use AWS, K8s, Terraform, Packer, Jenkins, Ansible, and OpenShift.
Read more about Shailender Singh