System reliability is defined as the combination of system resiliency and elasticity. With the proliferation of web-scale, data-intensive, and process-intensive applications across industry verticals, the application reliability has to be ensured at any cost to fulfil varying business expectations. Similarly, cloud environments emerge as the one-stop IT solution for business process and operation automations. All kinds of personal, professional, and social applications are being meticulously modernized and moved to cloud centers to reap all the originally expressed benefits of the software-defined cloud infrastructures. Thus, cloud reliability is also guaranteed through the leverage of highly pioneering technologies and tools. Thus, the reliability of applications and IT infrastructures is very important to encourage customers to retain their confidence and continuity on the various innovations and improvisations happening in the...
You're reading from Practical Site Reliability Engineering
Businesses across the globe mandate for reliability. IT reliability is the foundation for enabling business reliability. IT pundits have released a series of steps to be followed to arrive at reliable systems. There are architectural and design patterns, best practices, platform solutions, technologies and tools, methodologies, and so on, to produce reliable systems that are resilient and elastic. Let's discuss them in detail in the subsequent sections. Before doing that, let's focus on the various noteworthy advancements happening in the IT space.
MSA is being viewed as the next-generation application architecture style and pattern. There are several proven techniques for faster software development through a host of agile programming methodologies, such as pair and extreme programming, Scrum, and so on. However, there is a lacuna on accelerated design of enterprise-class applications. MSA is being presented as the new agile application design method. Furthermore, developing applications is also sped up through the careful partitioning of legacy as well as modern applications into a number of easily implementable and manageable application components and services. That is, every software application gets segmented into a set of interactive microservices. Building microservices can be independently accomplished. Applications can be quickly formed out of distributed microservices through composition (orchestration and choreography) platforms. In other words, the era of software development from the ground up...
It is an indisputable truth that the resiliency of microservices leads to reliable systems. For crafting process-aware, business-centric, and composite applications, several microservices have to be fused together. Resilient and scalable microservices are collaborating with one another, leading to the realization of reliable software systems.
The service resiliency is being achieved through the leverage of service mesh solutions, such as Istio, Linkerd, and Conduit. Forming service meshes is the way forward for ensuring the much-demanded service resiliency while services interact with one another. The faster maturity and stability of service mesh-enabling solutions goes a long way in establishing resilient microservices, which, when composed together, form reliable systems. Thus, containerized microservices, container orchestration platforms such as Kubernetes, and the incorporation of service mesh solutions blend well to put a robust and versatile foundation for producing...
As microservices get established and elevated as the next-generation application building block, microservices design has to be done leveraging the various patterns, practices, and platforms. This section throws some light on some of the best practices recommended by highly accomplished and acclaimed software architects. There are articles and blogs explaining the various best practices for the efficient design of microservices.
Precisely speaking, with the unprecedented adoption of microservices architecture and the steady growth of the tool ecosystem, the risk-free realization of modular, service-oriented, extensible, event-driven, cloud-hosted, process-centric, business-critical, insights-filled, scalable, and reliable applications is gaining momentum.
It is a widely accepted fact that MSA guarantees the much needed agility in application design, development, and deployment. However, there are a few challenges. Microservices can be weighed down due...
Here are a few popular asynchronous messaging patterns that enable the faster realization of event-driven and asynchronous messaging microservices. Let's refer to the following points:
- Event sourcing: Today, events are penetrative and pervasive, and occur in large numbers due to the broader and deeper proliferation of multi-faceted sensors, actuators, drones, robots, electronics, digitized elements, connected devices, factory machineries, social networking sites, integrated applications, decentralized microservices, distributed data sources, stores, and so on. Thus, events from varied and geographically distributed sources get streamed into an event store, which is termed as a database of events. This event store provides an API to enable various consuming services to subscribe and use authorized events. The event store primarily operates as a message broker. Event sourcing persists the state of a business entity such as an order...
These are also event-driven applications. Predominantly, instead of service orchestration, service choreography is preferred for building event-driven applications. As per the Reactive manifesto, reactive applications have to have the following characteristics. They have to be responsive, resilient, elastic, and message-driven. Reactive systems are bound to respond instantaneously to any kind of stimulus. This is just opposite to the traditional request and response (R and R) model, which is generally blocking. This pattern turns out to be an excellent way for using the available resources in a better manner. Also, the system responsiveness gets a strong boost. Instead of blocking and waiting for computations to be finished, the application starts to handle other user requests in an asynchronous manner to make use of all the available resources and threads.
As indicated at the beginning of the chapter, to arrive at reliable systems, we need to have reliable applications and infrastructures. We have discussed the various ways and means of bringing forth reliable applications already. Now, we need to dig deeper and detail the best practices to be followed to craft and use reliable infrastructures.
Regarding redundancy toward higher availability, the first and foremost tip is to architect software applications to be redundant. Redundancy is the duplication of any system to substantially increase its availability. If a system goes down due to any reason, the duplicated system comes to the rescue. That is why we often hear and read that software applications are being generally deployed in multiple regions, as indicated in the following diagram. Lately, applications are being constructed out of distributed and duplicated application components. Thus, if one component or service goes down, then its duplication...
The widely quoted benefit of infrastructure as code (IaC) is repeatability and reproducibility. There are a number of components (server, network, security, storage, and so on) in a data center that need to be configured to deploy applications. In cloud environments, there are thousands of such components to be configured. If all is being done manually, the time taken is very huge and error-prone. There are possibilities for the creeping in of configuration differences and drifts. Humans aren't great at undertaking repetitive and manual tasks with 100% accuracy. But machines are very good at doing repetitive, redundant, and routine tasks in scale and speed. If we produce a template and input it into a machine, the machine can execute the template thousand times without any errors. The template-centric approach for infrastructure provisioning, configuration and application deployment gains wider attraction and attention these days. Infrastructure optimization and management...
With the role and responsibility of IT continuously rising in elevating business operations and people tasks, the complexity induced by the multiplicity and heterogeneity of IT systems is on the rise consistently. There are a number of noteworthy advancements in IT, and these have resulted in a variety of business processes getting optimized, simplified, and automated. Business agility is being fulfilled through the IT agility mechanisms. Business deployment and service models have gone through a few transitions in the recent past with the faster maturity and stability of the cloud paradigm. Business transformations are directly enabled through IT transformation. However, with the faster adoption of digital technologies, the new concept of digital transformation is becoming the new normal.
The goal of business reality through IT reliability technologies and tools is to attain the sustainable digital transformation. Reliable systems with resiliency and elasticity characteristics are...