Reader small image

You're reading from  Practical Site Reliability Engineering

Product typeBook
Published inNov 2018
PublisherPackt
ISBN-139781788839563
Edition1st Edition
Right arrow
Authors (3):
Pethuru Raj Chelliah
Pethuru Raj Chelliah
author image
Pethuru Raj Chelliah

 Pethuru Raj Chelliah (PhD) works as the chief architect at the Site Reliability Engineering Center of Excellence, Reliance Jio Infocomm Ltd. (RJIL), Bangalore. Previously, he worked as a cloud infrastructure architect at the IBM Global Cloud Center of Excellence, IBM India, Bangalore, for four years. He also had an extended stint as a TOGAF-certified enterprise architecture consultant in Wipro Consulting services division and as a lead architect in the corporate research division of Robert Bosch, Bangalore. He has more than 17 years of IT industry experience.
Read more about Pethuru Raj Chelliah

Shreyash Naithani
Shreyash Naithani
author image
Shreyash Naithani

Shreyash Naithani is currently a site reliability engineer at Microsoft R&D. Prior to Microsoft, he worked with both start-ups and mid-level companies. He completed his PG Diploma from the Centre for Development of Advanced Computing, Bengaluru, India, and is a computer science graduate from Punjab Technical University, India. In a short span of time, he has had the opportunity to work as a DevOps engineer with Python/C#, and as a tools developer, site/service reliability engineer, and Unix system administrator. During his leisure time, he loves to travel and binge watch series.
Read more about Shreyash Naithani

Shailender Singh
Shailender Singh
author image
Shailender Singh

Shailender Singh is a principal site reliability engineer and a solution architect with around 11 year's IT experience who holds two master's degrees in IT and computer application. He has worked as a C developer on the Linux platform. He had exposure to almost all infrastructure technologies from hybrid to cloud-hosted environments. In the past, he has worked with companies including Mckinsey, HP, HCL, Revionics and Avalara and these days he tends to use AWS, K8s, Terraform, Packer, Jenkins, Ansible, and OpenShift.
Read more about Shailender Singh

View More author details
Right arrow

Highly reliable IT infrastructures


So far in this chapter, we have concentrated on the application side to ensure IT reliability. But there is a crucial role to play by the underlying IT infrastructures. We have clusters, grids, and clouds to achieve high availability and scalability at the infrastructures level. Clouds are being touted as the best-in-class IT infrastructure for digital innovation, disruption, and transformation. For simplifying, streamlining, and speeding up the cloud setup and sustenance, there came a number of enabling tools for automating the repetitive and routine cloud scheduling, software deployment, cloud administration, and management. Automation and orchestration are being pronounced as the way forward for the ensuing cloud era. Most of the manual activities in running cloud centers are being precisely defined and automated through scripts and other methods. With the number of systems, databases, and middleware, network and storage professionals manning cloud environments has come down drastically. The total cost of ownership (TCO) of cloud IT is declining, whereas the return on investment (RoI) is increasing. The cost benefits of the cloud-enablement of conventional IT environments is greatly noticeable. Cloud computing is typically proclaimed as the marriage between mainframe and modern computing and is famous for attaining high performance while giving cost advantages. The customer-facing applications with varying loads are extremely fit for public cloud environments. Multi-cloud strategies are being worked out by worldwide enterprises to embrace this unique technology without any vendor lock-in. 

However, for attaining the much-needed reliability, there are miles to traverse. Automated tools, policies, and other knowledge bases, AI-inspired log and operational analytics, acquiring the capability of preventive, predictive, and prescriptive maintenance through machine and deep learning algorithms and models, scores of reusable resiliency patterns, and preemptive monitoring, are the way forward.

A resilient application is typically highly available, even in the midst of failures and faults. If there is any internal or external attack on a single application component/microservice, the application still functions and delivers its assigned functionality without any delay or stoppage. The failure can be identified and contained within the component so that the other components of the application aren't affected. Typically, multiple instances of the application components are being run in different and distributed containers or VMs so that one component loss does not matter much to the application. Also, application state and behavior information gets stored in separate systems. Any resilient application has to be designed and developed accordingly to survive in any kind of situation. Not only applications but also the underlying IT or cloud infrastructure modules have to be chosen and configured intelligently to support the unique resilient goals of software applications. The first and foremost thing is to fully leverage the distributed computing model. The deployment topology and architecture have to purposefully use the various resiliency design, integration, deployment, and patterns. Plus, the following tips and techniques ought to be employed as per the infrastructure scientists, experts, architects, and specialists:

  • Use the various network and security solutions, such as firewalls, load balancers, and application delivery controllers (ADCs). Network access control systems (NACLs) also contribute to security. These help out in intelligently investigating every request message to filter out any ambiguous and malevolent message at the source. Furthermore, load balancers continuously probe and monitor servers and distribute traffic to servers that are not fully loaded. Also, they can choose the best server to handle certain requests.  
  • Employ multiple servers at different and distributed cloud centers. That is, disaster recovery capability needs to be part of any IT solution.
  • Attach a robust and resilient storage solution for data recovery and stateful applications.
  • Utilize software infrastructure solutions, such as API gateways and management suites, service mesh solutions, additional abstraction layers, to ensure the systems resiliency. 
  • Leveraging the aspects of compartmentalization (virtualization and containerization) have to be incorporated to arrive at virtualized and containerized cloud environments, which intrinsically support the much-needed flexibility, extensibility, elasticity, infrastructure operations automation, distinct maneuverability, and versatility. The software-defined environments are more conducive and constructive for application and infrastructure resiliency. 
  • Focus on log, operational, performance, and scalability analytics to proactive and preemptively monitor, measure, and manage various infrastructure components (software, as well as hardware). 

Thus, reliable software applications and infrastructure combine well in rolling out reliable systems that, in turn, guarantee reliable business operations. 

In summary, we can say the following: 

  • Reliability = resiliency + elasticity
  • Automation and orchestration are the key requirements for a reliable IT infrastructure
  • IT reliability fulfilment—resiliency is to survive under attacks, failures, and faults, whereas elasticity is to auto-scale (vertical and horizontal scalability) under a load
  • IT infrastructure operational, log, and performance/scalability analytics through AI-inspired analytics platforms
  • Patterns, processes, platforms, and practices for having reliable IT Infrastructure
  • System and application monitoring are also significant

The emergence of serverless computing

Serverless computing allows for the building and running of applications and services without thinking about server machines, storage appliances, and arrays and networking solutions. Serverless applications don't require developers to provision, scale, and manage any IT resources to handle serverless applications. It is possible to build a serverless application for nearly any type of applications or backend services. The scalability aspect of serverless applications is being taken care of by cloud service and resource providers. Any spike in load is being closely monitored and acted upon quickly. The developer doesn't need to worry about the infrastructure portions. That is, developers just focus on the business logic and the IT capability being delegated to the cloud teams. This hugely reduced overhead empowers designers and developers to reclaim the time and energy wasted on the IT infrastructure plumping part. Developers typically can focus on other important requirements, such as resiliency and reliability.

The surging popularity of containers comes in handy when automating the relevant aspects to have scores of serverless applications. That is, a function is developed and deployed quickly without worrying about the provisioning, scheduling, configuration, monitoring, and management. Therefore, the new buzzword, FaaS, is gaining a lot of momentum these days. We are moving toward NoOps. That is, most of the cloud operations get automated through a host of technology solutions and tools, and this transition comes in handy for institutions, individuals, and innovators to deploy and deliver their software applications quickly.

On the cost front, users have to pay for the used capacity. Through the automated and dynamic provisioning of resources, the resource utilization goes up significantly. Also, the cost efficiency is fully realized and passed on to the cloud users and subscribers. 

Precisely speaking, serverless computing is another and additional abstraction toward automated computing and analytics. 

Previous PageNext Page
You have been reading a chapter from
Practical Site Reliability Engineering
Published in: Nov 2018Publisher: PacktISBN-13: 9781788839563
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (3)

author image
Pethuru Raj Chelliah

 Pethuru Raj Chelliah (PhD) works as the chief architect at the Site Reliability Engineering Center of Excellence, Reliance Jio Infocomm Ltd. (RJIL), Bangalore. Previously, he worked as a cloud infrastructure architect at the IBM Global Cloud Center of Excellence, IBM India, Bangalore, for four years. He also had an extended stint as a TOGAF-certified enterprise architecture consultant in Wipro Consulting services division and as a lead architect in the corporate research division of Robert Bosch, Bangalore. He has more than 17 years of IT industry experience.
Read more about Pethuru Raj Chelliah

author image
Shreyash Naithani

Shreyash Naithani is currently a site reliability engineer at Microsoft R&D. Prior to Microsoft, he worked with both start-ups and mid-level companies. He completed his PG Diploma from the Centre for Development of Advanced Computing, Bengaluru, India, and is a computer science graduate from Punjab Technical University, India. In a short span of time, he has had the opportunity to work as a DevOps engineer with Python/C#, and as a tools developer, site/service reliability engineer, and Unix system administrator. During his leisure time, he loves to travel and binge watch series.
Read more about Shreyash Naithani

author image
Shailender Singh

Shailender Singh is a principal site reliability engineer and a solution architect with around 11 year's IT experience who holds two master's degrees in IT and computer application. He has worked as a C developer on the Linux platform. He had exposure to almost all infrastructure technologies from hybrid to cloud-hosted environments. In the past, he has worked with companies including Mckinsey, HP, HCL, Revionics and Avalara and these days he tends to use AWS, K8s, Terraform, Packer, Jenkins, Ansible, and OpenShift.
Read more about Shailender Singh