You're reading from Enterprise DevOps for Architects

Product typeBook

Published inNov 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781801812153

Edition1st Edition

Languages

Python

Tools

Ansible Terraform

Concepts

DevOps

Author (1)

Jeroen Mulder

Chapter 5: Architecting Next-Level DevOps with SRE

In previous chapters, we discussed the ins and outs of DevOps. It's called DevOps for a reason, but in practice, the Dev is typically emphasized: creating agility by speeding up the development. Site Reliability Engineering (SRE) addresses Ops very strongly. How does Ops survive under the ever-increasing speed and number of products that Dev delivers? The answer is SRE teams, working with error budgets and toil.

After completing this chapter, you will have learned the basic principles of SRE and how you can help an enterprise adopt and implement them. You will have a good understanding of how to define Key Performance Indicators (KPIs) for SRE and what benefits these will bring to the organization.

In this chapter, we're going to cover the following main topics:

Understanding the basic principles of SRE
Assessing the enterprise for SRE readiness
Architecting SRE using KPIs
Implementing SRE

Understanding the basic principles of SRE

In this section, we will briefly introduce SRE, originally invented by Google to overcome the problem of operations completely being swamped by all the new developments that Google launched. There are a lot of definitions of SRE, but in this book, we'll use the definition used by Google itself: the thing that happens if you allow a software engineer to design operations.

Basically, Google addressed the gap between development and operations. Developers changed code because of demand, while operations tried to avoid services breaking because of these changes. In other words, there was always some sort of tension between dev and ops teams. We will talk about this more in this chapter.

Now, is SRE the next-level DevOps? The answer to that question is: SRE forms a bridge between Dev and Ops. A logical, next question, in that case, would be: is a bridge necessary? In the next section, we will learn that putting developers and operations...

Assessing the enterprise for SRE readiness

In the previous section, we introduced SRE and discussed the basic principles, without the ambition of being comprehensive. Covering SRE as a whole would fill a book with well over 500 pages; we have merely given a quick overview of the most important parts. Now the question is: how do I know whether my company is ready for SRE? We will explore some criteria for SRE readiness in this section.

One of the common problems of companies implementing DevOps is that developers and operations are not really working together. They might sit in one team, but still there will be developers writing code and throwing it over the fence to operations when they think the code is done. The reason is that dev works with a different mindset than ops. Developers want to change. They get their assignments from business demand to improve or build new applications. Operators, on the other hand, don't want that change. Their main interest is to have stable...

Architecting SRE using KPIs

Before we dive into the definition of KPIs, we need to get back to the basic principles of SRE. SRE teams focus on reliability, scalability, availability, performance, efficiency, and response. These are all measurable items, so we can transform them into KPIs. In this section, we will learn how to do that using SLOs, Service-Level Indicators (SLIs), and the error budget.

The main KPIs that we use in SRE are as follows:

SLOs: In SRE, this is defined as how good a system should be. An SLO is much more precise than an SLA, which comprises a lot of different KPIs. You could also state that the SLA comprises a number of SLOs. However, an SLO is an agreement between the developers in the SRE team and the product owner of the service, whereas an SLA is an agreement between the service supplier and the end user.
The SLO is a target value. For example, the web frontend should be able to handle hundreds of requests per minute. Don't make it too complex...

Implementing SRE

So far, we have learned what SRE is and what the key elements are. In this section, we will learn how to start with SRE, but like DevOps, the advice is to start small. Then there are two major steps that will help you to implement SRE in a controlled way:

Agree on the standards and practices: This can be for just one SRE team or for the entire enterprise if the ambition reaches that level. In some workbooks this is called kitchen sink, meaning that everything is SRE. This can be a viable approach for companies with a limited set of applications, but for enterprises, it might be wiser to work with an SRE team charter.
Let's work with a very common example that we will also use in the next chapters. Enterprises usually have product teams working on applications and a platform team that is responsible for the infrastructure. It's good practice to have an SRE team bridging between one product team and the platform team, setting out standards and practices...

Summary

This chapter covered the basics of SRE. The original workbook contains well over 500 pages, so it's almost impossible to summarize the methodology in just a few pages. Yet, after completing this chapter you will have a good understanding of the founding principles of SRE, starting with the definition of SLOs to set requirements on how good a system should be. Subsequently, we measure the SLOs with indicators that tell us how good the system really is. We learned that by working with risk management, error budgets, and blameless post-mortems, SRE engineers can help DevOps teams to improve systems and make them more reliable.

The conclusion of the chapter was that SRE is not very easy to implement in an enterprise. We discussed the first steps of the implementation and learned that if done right, SRE will lead to benefits. Businesses will gain from SRE because a lot of manual work can be reduced, creating room to improve products or develop new ones.

This concludes...

Questions

What is the term that SRE uses to label repetitive, manual work that should be reduced?
What do the terms TTD and TTR mean?
What do we do when we transfer risk?

Multi-Cloud Architecture and Governance, by Jeroen Mulder, Packt Publishing, 2020
Practical Site Reliability Engineering, by Pethuru Raj Chelliah, Shreyash Naithani, and Shailender Singh, Packt Publishing, 2018
Do you have an SRE team yet? How to start and assess your journey: https://cloud.google.com/blog/products/devops-sre/how-to-start-and-assess-your-sre-journey

The rest of the chapter is locked

You have been reading a chapter from

Enterprise DevOps for Architects

Published in: Nov 2021Publisher: PacktISBN-13: 9781801812153

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Jeroen Mulder

Jeroen Mulder is a certified enterprise and security architect, and he works with Fujitsu (Netherlands) as a Principal Business Consultant. Earlier, he was a Sr. Lead Architect, focusing on cloud and cloud native technology, at Fujitsu, and was later promoted to become the Head of Applications and Multi-Cloud Services. Jeroen is interested in the cloud technology, architecture for cloud infrastructure, serverless and container technology, application development, and digital transformation using various DevOps methodologies and tools. He has previously authored “Multi-Cloud Architecture and Governance”, “Enterprise DevOps for Architects”, and “Transforming Healthcare with DevOps4Care”.
Read more about Jeroen Mulder

Personalised recommendations for you

Based on your interests and search pattern

Designing and Implementing Microsoft Azure Networking Solutions

Designing and Implementing Microsoft Azure Networking Solutions Exam Ref AZ-700 is an all-encompassing guide to the AZ-700 exam and contains all the information you need to succeed in the world of virtual networking with Azure. With this book, you will be fully prepared for the exam and the world of cloud networking.

BookAug 2023524 pages

Microsoft 365 Security, Compliance, and Identity Administration

The Microsoft 365 Security, Compliance, and Identity Administration is a comprehensive guide that helps you employ Microsoft 365's robust suite of features and empowers you to optimize your administrative tasks.

BookAug 2023630 pages

Zero Trust Overview and Playbook Introduction

Get started on Zero Trust with this step-by-step playbook and learn everything you need to know for a successful Zero Trust journey with tailored guidance for every role, covering strategy, operations, architecture, implementation, and measuring success. This book will become an indispensable reference for everyone in your organization.

BookOct 2023240 pages

The Self-Taught Cloud Computing Engineer

This self-study book helps you master multiple clouds, including AWS, Azure, and GCP, and serves as a roadmap to becoming a certified cloud computing expert. The book will guide you to develop a professional cloud career by helping you build a broad cloud knowledge base, developing hands-on cloud computing skills, and getting cloud certified.

BookSep 2023472 pages

Technology Operating Models for Cloud and Edge

This book will help you build and create ownership of a technology operating model, as well as connect your leadership with engineering and operations, keeping your internal and external customers in mind. It provides practical tips on why, where, and how to make the cloud and edge platform paradigm sing for you, your team, and your organization.

BookAug 2023228 pages

Azure Architecture Explained

Azure is the preferred platform to build mission-critical and secure apps. This book provides comprehensive coverage of essential Azure products, services, and solutions vital for every solution architect's success. Elevate your knowledge and master the critical components of Azure to excel in your role with Azure Architecture Explained.

BookSep 2023446 pages

Pentesting Active Directory and Windows-based Infrastructure

This practical guide helps you explore the pentesting of Microsoft infrastructure in detail, and enhances your offensive skillset by showing you the different ways to perform security assessment. This book will help blue teamers and IT engineers get up to speed with possible security issues they may encounter in their Windows environments.

BookNov 2023360 pages

Practical Ansible

In Practical Ansible, you'll work with the latest release of Ansible and learn to solve complex issues quickly with the help of task-oriented scenarios. You'll start by installing and configuring Ansible to automate monotonous and repetitive IT tasks and get to grips with concepts such as playbooks, inventories, plugins, collections, and network modules.

BookSep 2023420 pages

Windows 11 for Enterprise Administrators

Microsoft’s launch of Windows 11 is a step toward satisfying the enterprise administrator’s needs for better management and enhanced user experience customization. This book provides the enterprise administrator with the knowledge needed to fully utilize the advanced feature set of Windows 11 Enterprise.

BookOct 2023286 pages

The Linux DevOps Handbook

This book is for software and IT professionals seeking knowledge on Linux systems and DevOps practices. This book will provide you with guidance and tools to learn and gain proficiency in managing Linux-based infrastructures and knowledge of DevOps.

BookNov 2023428 pages2

You're reading from Enterprise DevOps for Architects

Chapter 5: Architecting Next-Level DevOps with SRE

Understanding the basic principles of SRE

Assessing the enterprise for SRE readiness

Architecting SRE using KPIs

Implementing SRE

Summary

Questions

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Designing and Implementing Microsoft Azure Networking Solutions

Microsoft 365 Security, Compliance, and Identity Administration

The Microsoft 365 Security, Compliance, and Identity Administration is a comprehensive guide that helps you employ Microsoft 365's robust suite of features and empowers you to optimize your administrative tasks.

Zero Trust Overview and Playbook Introduction

The Self-Taught Cloud Computing Engineer

Technology Operating Models for Cloud and Edge

Azure Architecture Explained

Pentesting Active Directory and Windows-based Infrastructure

Practical Ansible

Windows 11 for Enterprise Administrators

The Linux DevOps Handbook

This book is for software and IT professionals seeking knowledge on Linux systems and DevOps practices. This book will provide you with guidance and tools to learn and gain proficiency in managing Linux-based infrastructures and knowledge of DevOps.