Packt+ | Advance your knowledge in tech

You're reading from Practical Site Reliability Engineering

Product typeBook

Published inNov 2018

PublisherPackt

ISBN-139781788839563

Edition1st Edition

Tools

Kubernetes Docker

Concepts

Configuration Management

Authors (3):

Pethuru Raj Chelliah

Shreyash Naithani

Shailender Singh

View More author details

Chapter 11. Post-Production Activities for Ensuring and Enhancing IT Reliability

Business automation, augmentation, and acceleration get neatly accomplished through a variety of microservices-based software applications in conjunction with integrated platforms and optimized IT infrastructures. In short, IT is the best and biggest enabler of businesses across the globe. That is, business offerings and outputs are being deftly and decisively enabled by scores of distinct IT advancements. The evolving business expectations are being duly automated through a host of delectable developments in the IT space. These improvements elegantly empower business houses to deliver newer and premium business offerings fast. With intuitive, informative, and inspiring interfaces, software applications are being presented to their customers and consumers to be used in an easy and error-free fashion. Furthermore, this continuous empowerment in the IT space, in turn, facilitates accomplishing more with less,...

Modern IT infrastructure

Today, software-defined cloud centers are very popular and profoundly leveraged for business agility, affordability, and productivity. The cloud idea fulfils the infrastructure's automation, optimization, and utilization requirements. The faster maturity and stability of the virtualization movement makes the hardware programming a grand reality. Therefore, infrastructure as code is the buzzword in the IT industry these days. IT infrastructure monitoring, measurement, and management are seeing a lot of delectable advancements with the rise of the cloud paradigm. A variety of IT infrastructure operations are being automated and accelerated through a host of advanced and standardized tools. The simultaneous rise of the DevOps concept, along with a flurry of powerful cloud technologies and tools, has brought in scores of strategic automation and optimization in the IT space. IT self-service, pay-per-usage, and elasticity have become the core IT capabilities.

Cloud service...

Monitoring clouds, clusters, and containers

The cloud centers are being increasingly containerized and managed. That is, there are going to be well-entrenched containerized clouds soon. The formation and managing of containerized clouds gets simplified through a host of container orchestration and management tools. There are both open source and commercial-grade container-monitoring tools. Kubernetes is emerging as the leading container orchestration and management platform. Thus, by leveraging the aforementioned toolsets, the process of setting up and sustaining containerized clouds is accelerated, risk-free, and rewarding.

The tool-assisted monitoring of cloud resources (both coarse-grained as well as fine-grained) and applications in production environments is crucial to scaling the applications and providing resilient services. In a Kubernetes cluster, application performance can be examined at many different levels: containers, pods, services, and clusters. Through a single pane of glass...

Cloud infrastructure and application monitoring

The cloud idea has disrupted, innovated, and transformed the IT world. Yet, the various cloud infrastructures, resources, and applications ought to be minutely monitored and measured through automated tools. The aspect of automation is gathering momentum in the cloud era. Every activity is getting automated through pioneering algorithms and technologically powerful tools. A slew of flexibilities in the form of customization, configuration, and composition are being enacted through cloud automation tools. A bevy of manual and semi-automated tasks are being fully automated through a series of advancements in the IT space. In this section, we are going to discuss the infrastructure monitoring toward infrastructure optimization and automation. There are processes, platforms, procedures, and products to enable cloud monitoring.

Enterprise-scale and mission-critical applications are being cloud-enabled to be deployed in various cloud environments...

The monitoring tool capabilities

The cloud paradigm brings the much-needed flexibility of assigning resources needed to support demand from cloud users. Establishing and enforcing appropriate policies and rules are important for assigning cloud resources to business applications and IT services. However, the effectiveness of policy management depends on the visibility that organizations have about their cloud resources. Organizations need to have the capability to create, modify, monitor, and update the policies. In short, cloud monitoring tools need to have the previously mentioned cloud-specific features, functionalities, and facilities to realize all the cloud-sponsored benefits.

As organizations deploying cloud computing services trust third-party providers to fulfil the quality of service (QoS) attributes and performance, as quoted previously, is the key QoS parameter. The monitoring tool has to monitor not only the actual levels of performance, as experienced by business users, but...

Prognostic, predictive, and prescriptive analytics

Any operational environment is in need of data analytics and machine learning capabilities to be intelligent in their everyday actions and reactions. The profoundly impacting environments include IT environments (traditional data centers or recent cloud-enabled data centers (CeDCs)), manufacturing and assembly floors, plant operations, maintenance, repair, and overhaul (MRO) facilities. Increasingly, a variety of important environments are being stuffed with scores of networked, embedded, and resource constrained, as well as intensive devices, toolsets, and microcontrollers. Hospitals have a growing array of medical instruments, and homes are blessed with a number of wares and utensils, such as connected coffee makers, dishwashers, microwave ovens, and consumer electronics. Manufacturing floors have powerful equipment, machinery, and robots. Workshops, mechanical shops, and flight maintenance garages are becoming more sophisticated and smarter...

Log analytics

Every software and hardware system generates a lot of log data (big data), and it is essential to do real-time log analytics to quickly understand whether there is any deviation or deficiency. This extracted knowledge helps administrators to consider countermeasures in time. Log analytics, if done systematically, facilitates preventive, predictive, and prescriptive maintenance. Workloads, IT platforms, middleware, databases, and hardware solutions all create a lot of log data when they are working together to complete business functionalities. There are several log analytics tools on the market.

Everyone knows that logs play an important role in the IT industry. Logs are used for various purposes such as IT operations, system, and application monitoring, security and compliance, and much more. Having a centralized and standardized logging system makes life easy for software developers. They are often being requested to troubleshoot the application, detect issues, enhance the...

IT operational analytics

We discussed log data and its analytics in the previous section. There are log-management tools and log analytics platforms to gain real-time information about all kinds of software and hardware systems. The insights emitted go a long way in stabilizing and strengthening various systems by proactively attending the systems issues. There is also operational data for all kinds of systems under operation. The data from IT systems contains valuable insights into system usage, the user's experience, and behavior patterns. There are operational analytics platforms and engines, such as Splunk software for monitoring, searching, analyzing, visualizing, and acting on massive streams of real-time and historical machine data, from any source, format, or location. The main advantages of operational analytics are listed here. Operational analytics helps with the following:

Extricating operational insights
Reducing IT costs and complexity
Improving employee productivity
Identifying...

IT performance and scalability analytics

There are typically big gaps between the theoretical and practical performance limits. The challenge is how to enable systems to attain their theoretical performance level under any circumstance. The performance level required can suffer due to various reasons. This includes the poor system design, bugs in software, network bandwidth, third-party dependencies, and I/O access. The middleware solutions such as adapter, connector, and driver also contribute to the unexpected performance degradation of the system. The system's performance has to be maintained under any loads (user, message, and data). There are several metrics such as request per second (RPS) and transaction per second (TPS). Performance testing is one way of recognizing the performance bottlenecks and adequately addressing them. The testing is performed in the pre-production phase.

Now, the software is functioning in production servers, and the thing to do here is to continuously and...

IT security analytics

IT infrastructure security, application security, and data (at rest, transit, and usage) security are the top three security challenges, and there are security solutions approaching the issues at different levels and layers. Access-control mechanisms, cryptography, hashing, digest, digital signature, watermarking, and steganography are the well-known and widely used aspects of ensuing impenetrable and unbreakable security. There's also security testing, and ethical hacking for identifying any security risk factors and eliminating them at the budding stage itself. All kinds of security holes, vulnerabilities, and threats are meticulously unearthed in to deploy defect-free, safety-critical, and secure software applications. During the post-production phase, the security-related data is being extracted out of both software and hardware products, to precisely and painstakingly spit out security insights that in turn goes a long way in empowering security experts and architects...

The importance of root-cause analysis

The cost of service downtime is growing up. There are reliable reports stating that the cost of downtime ranges from $100,000-$72,000 per minute. Identifying the root-cause (mean-time-to-identification (MTTI) generally takes hours. For a complex situation, the process may run into days. The MTTI is lengthy due to various reasons. There are not many tools to speed up the MTTI process. We need competent tools that enrich the value by correlating the data from different IT tools, such as APM, ITSM, SIEM, and ITOM with open API connectors. As microservices and their instances run on containers, IT teams need to manage millions of data points. This transition mandates for highly advanced and automated tools. The pioneering AI algorithms will be commonly used to automate for precisely finding the root-causes.

Root-cause analysis is being touted as an important post-deployment activity for exactly pinpointing bugs and their roots in any software applications...

Summary

There are several activities being strategically planned and executed to enhance the resiliency, robustness, and versatility of enterprise, edge, and embedded IT. It is overwhelmingly accepted that the domains of data analytics and machine learning are going to be the key differentiators for corporations in fulfilling the varying expectations of their customers, clients, and consumers. This chapter has described the various post-production data analytics to allow you to gain a deeper understanding of applications, middleware solutions, databases, and IT infrastructures to manage them effectively and efficiently. Machine-learning algorithms enable the formation of self-learning models to predict problems and prescribe the viable solutions to surmount them. Thus, data analytics methods and ML algorithms come in handy in realizing resilient IT. The other important facets include static and dynamic code analyzes to proactively identify bugs in software code to enhance application reliability...

Further Readings

The following are a few references:

Log Analytics by matomo: https://piwik.org/log-analytics/
Log Analytics by appdynamics: https://www.appdynamics.com/product/log-analytics/
The Fastest Way to Analyze Your Log Data: https://logentries.com/
Log analytics by Dynatrace: https://www.dynatrace.com/capabilities/log-analytics/
Autonomous Digital Intelligence: https://www.loomsystems.com/

The rest of the chapter is locked

You have been reading a chapter from

Practical Site Reliability Engineering

Published in: Nov 2018Publisher: PacktISBN-13: 9781788839563

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Authors (3)

Pethuru Raj Chelliah

Pethuru Raj Chelliah (PhD) works as the chief architect at the Site Reliability Engineering Center of Excellence, Reliance Jio Infocomm Ltd. (RJIL), Bangalore. Previously, he worked as a cloud infrastructure architect at the IBM Global Cloud Center of Excellence, IBM India, Bangalore, for four years. He also had an extended stint as a TOGAF-certified enterprise architecture consultant in Wipro Consulting services division and as a lead architect in the corporate research division of Robert Bosch, Bangalore. He has more than 17 years of IT industry experience.
Read more about Pethuru Raj Chelliah

Shreyash Naithani

Shreyash Naithani is currently a site reliability engineer at Microsoft R&D. Prior to Microsoft, he worked with both start-ups and mid-level companies. He completed his PG Diploma from the Centre for Development of Advanced Computing, Bengaluru, India, and is a computer science graduate from Punjab Technical University, India. In a short span of time, he has had the opportunity to work as a DevOps engineer with Python/C#, and as a tools developer, site/service reliability engineer, and Unix system administrator. During his leisure time, he loves to travel and binge watch series.
Read more about Shreyash Naithani

Shailender Singh

Shailender Singh is a principal site reliability engineer and a solution architect with around 11 year's IT experience who holds two master's degrees in IT and computer application. He has worked as a C developer on the Linux platform. He had exposure to almost all infrastructure technologies from hybrid to cloud-hosted environments. In the past, he has worked with companies including Mckinsey, HP, HCL, Revionics and Avalara and these days he tends to use AWS, K8s, Terraform, Packer, Jenkins, Ansible, and OpenShift.
Read more about Shailender Singh

Other recommended products

Related to this chapter

Hands-On RESTful API Design Patterns and Best Practices

REST architecture (style) is a pivot of distributed systems, simplify data integration amongst modern and legacy applications leverages through the RESTful paradigm. This book is fully loaded with many RESTful API patterns, samples, hands-on implementations and also discuss the capabilities of many REST API frameworks for Java, Scala, Python and Go

BookJan 2019378 pages

Architectural Patterns

Enterprise Architecture (EA) is typically an aggregate of the business, application, data, and infrastructure architectures of any forward-looking enterprise. Due to constant changes and rising complexities in the business and technology landscapes, producing sophisticated architectures is on the rise. Architectural patterns are gaining a lot of attention these days.

BookDec 2017468 pages

Learning Docker

Docker is an open source containerization engine that offers a simple and faster way for developing and running software. It helps enable flexibility and portability on where the application can run, whether on premises, public cloud, or private cloud. This book will show you the new features of Docker and help you get started with Docker by building and deploying a simple application.

BookMay 2017300 pages

Mastering Service Mesh

Service Mesh helps overcome the operational challenges of connecting, securing, controlling, and observing modern microservices deployment. This book shows you exactly how to use a Service Mesh architecture to manage and operationalize your microservices-based applications.

BookMar 2020626 pages

Hands-On Microservices - Monitoring and Testing

Microservices are the newest way of developing web applications. Once you've started down the microservice path, how do you make sure that your applications are still fully tested? This book focuses on the number of approaches for managing the additional testing complexity of multiple independently deployable components.

BookOct 2018160 pages5

Java EE 8 Microservices

There is a shift from monolithic applications to microservice-based ones as cloud-based applications are increasingly in demand. With this book, you will get to know Java EE 8's components and how they are used to implement microservices.

BookDec 2018260 pages

Getting Started with Kubernetes

Kubernetes has continued to grow and achieve broad adoption across various industries, helping you to orchestrate and automate container deployments on a massive scale. This book will give you a complete understanding of Kubernetes and how to get a cluster up and running.

BookOct 2018470 pages

Spring 5.0 Microservices

The Spring Framework is an application framework and inversion of the control container for the Java platform. Spring 5.0 is due to arrive with a myriad of new and exciting features. Written to the latest specifications of Spring, this book will help you implement the microservice architecture in Spring Framework, Spring Boot, and Spring Cloud.

BookJul 2017414 pages

Microservices with Azure

Microsoft Azure is rapidly evolving and is widely used as a platform on which you can build Microservices that can be deployed on heterogeneous environments using Microsoft Azure Service Fabric. This book will help you understand the concepts of the Microservice application architecture and help you build highly maintainable and scalable enterprise-grade applications using Microsoft Azure Service Fabric.

BookJun 2017360 pages

TypeScript Microservices

Microservices has evolved as one of the most tangible solutions to make effective and scalable applications. Due to its evolution from ES5 to ES6 stack, Typescript has become one of the most de facto solutions. This book will help you leverage microservices’ power to build robust architecture using reactive programming and Typescript in Node.js.

BookMay 2018404 pages

Architecting Cloud Computing Solutions

Cloud adoption is a core component of digital transformation. Scaling the IT environment, making it resilient, and reducing costs are what organizations want. Architecting Cloud Computing Solutions presents and explains the critical Cloud solution design considerations and technology decisions required to choose and deploy the right Cloud service and deployment models based on your business and technology service requirements.

BookMay 2018378 pages

Cloud-Native Applications in Java

Businesses today are evolving so rapidly that they are resorting to the elasticity of the cloud to provide a platform to build and deploy their highly scalable applications. This means developers now are faced with the challenge of building build applications that are native to the cloud. For this, they need to be aware of the environment, tools, and resources they’re coding against. If you’re a Java developers who wants to build secure, resilient, robust, and scalable applications that are targeted for cloud-based deployment, this is the book for you.

BookFeb 2018406 pages

Personalised recommendations for you

Based on your interests and search pattern

Designing and Implementing Microsoft Azure Networking Solutions

Designing and Implementing Microsoft Azure Networking Solutions Exam Ref AZ-700 is an all-encompassing guide to the AZ-700 exam and contains all the information you need to succeed in the world of virtual networking with Azure. With this book, you will be fully prepared for the exam and the world of cloud networking.

BookAug 2023524 pages

Microsoft 365 Security, Compliance, and Identity Administration

The Microsoft 365 Security, Compliance, and Identity Administration is a comprehensive guide that helps you employ Microsoft 365's robust suite of features and empowers you to optimize your administrative tasks.

BookAug 2023630 pages

Zero Trust Overview and Playbook Introduction

Get started on Zero Trust with this step-by-step playbook and learn everything you need to know for a successful Zero Trust journey with tailored guidance for every role, covering strategy, operations, architecture, implementation, and measuring success. This book will become an indispensable reference for everyone in your organization.

BookOct 2023240 pages

The Self-Taught Cloud Computing Engineer

This self-study book helps you master multiple clouds, including AWS, Azure, and GCP, and serves as a roadmap to becoming a certified cloud computing expert. The book will guide you to develop a professional cloud career by helping you build a broad cloud knowledge base, developing hands-on cloud computing skills, and getting cloud certified.

BookSep 2023472 pages

Technology Operating Models for Cloud and Edge

This book will help you build and create ownership of a technology operating model, as well as connect your leadership with engineering and operations, keeping your internal and external customers in mind. It provides practical tips on why, where, and how to make the cloud and edge platform paradigm sing for you, your team, and your organization.

BookAug 2023228 pages

Azure Architecture Explained

Azure is the preferred platform to build mission-critical and secure apps. This book provides comprehensive coverage of essential Azure products, services, and solutions vital for every solution architect's success. Elevate your knowledge and master the critical components of Azure to excel in your role with Azure Architecture Explained.

BookSep 2023446 pages

Pentesting Active Directory and Windows-based Infrastructure

This practical guide helps you explore the pentesting of Microsoft infrastructure in detail, and enhances your offensive skillset by showing you the different ways to perform security assessment. This book will help blue teamers and IT engineers get up to speed with possible security issues they may encounter in their Windows environments.

BookNov 2023360 pages

Practical Ansible

In Practical Ansible, you'll work with the latest release of Ansible and learn to solve complex issues quickly with the help of task-oriented scenarios. You'll start by installing and configuring Ansible to automate monotonous and repetitive IT tasks and get to grips with concepts such as playbooks, inventories, plugins, collections, and network modules.

BookSep 2023420 pages

Windows 11 for Enterprise Administrators

Microsoft’s launch of Windows 11 is a step toward satisfying the enterprise administrator’s needs for better management and enhanced user experience customization. This book provides the enterprise administrator with the knowledge needed to fully utilize the advanced feature set of Windows 11 Enterprise.

BookOct 2023286 pages

The Linux DevOps Handbook

This book is for software and IT professionals seeking knowledge on Linux systems and DevOps practices. This book will provide you with guidance and tools to learn and gain proficiency in managing Linux-based infrastructures and knowledge of DevOps.

BookNov 2023428 pages2