Reader small image

You're reading from  Enterprise DevOps for Architects

Product typeBook
Published inNov 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781801812153
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Jeroen Mulder
Jeroen Mulder
author image
Jeroen Mulder

Jeroen Mulder is a certified enterprise and security architect, and he works with Fujitsu (Netherlands) as a Principal Business Consultant. Earlier, he was a Sr. Lead Architect, focusing on cloud and cloud native technology, at Fujitsu, and was later promoted to become the Head of Applications and Multi-Cloud Services. Jeroen is interested in the cloud technology, architecture for cloud infrastructure, serverless and container technology, application development, and digital transformation using various DevOps methodologies and tools. He has previously authored “Multi-Cloud Architecture and Governance”, “Enterprise DevOps for Architects”, and “Transforming Healthcare with DevOps4Care”.
Read more about Jeroen Mulder

Right arrow

Chapter 5: Architecting Next-Level DevOps with SRE

In previous chapters, we discussed the ins and outs of DevOps. It's called DevOps for a reason, but in practice, the Dev is typically emphasized: creating agility by speeding up the development. Site Reliability Engineering (SRE) addresses Ops very strongly. How does Ops survive under the ever-increasing speed and number of products that Dev delivers? The answer is SRE teams, working with error budgets and toil.

After completing this chapter, you will have learned the basic principles of SRE and how you can help an enterprise adopt and implement them. You will have a good understanding of how to define Key Performance Indicators (KPIs) for SRE and what benefits these will bring to the organization.

In this chapter, we're going to cover the following main topics:

  • Understanding the basic principles of SRE
  • Assessing the enterprise for SRE readiness
  • Architecting SRE using KPIs
  • Implementing SRE
  • ...

Understanding the basic principles of SRE

In this section, we will briefly introduce SRE, originally invented by Google to overcome the problem of operations completely being swamped by all the new developments that Google launched. There are a lot of definitions of SRE, but in this book, we'll use the definition used by Google itself: the thing that happens if you allow a software engineer to design operations.

Basically, Google addressed the gap between development and operations. Developers changed code because of demand, while operations tried to avoid services breaking because of these changes. In other words, there was always some sort of tension between dev and ops teams. We will talk about this more in this chapter.

Now, is SRE the next-level DevOps? The answer to that question is: SRE forms a bridge between Dev and Ops. A logical, next question, in that case, would be: is a bridge necessary? In the next section, we will learn that putting developers and operations...

Assessing the enterprise for SRE readiness

In the previous section, we introduced SRE and discussed the basic principles, without the ambition of being comprehensive. Covering SRE as a whole would fill a book with well over 500 pages; we have merely given a quick overview of the most important parts. Now the question is: how do I know whether my company is ready for SRE? We will explore some criteria for SRE readiness in this section.

One of the common problems of companies implementing DevOps is that developers and operations are not really working together. They might sit in one team, but still there will be developers writing code and throwing it over the fence to operations when they think the code is done. The reason is that dev works with a different mindset than ops. Developers want to change. They get their assignments from business demand to improve or build new applications. Operators, on the other hand, don't want that change. Their main interest is to have stable...

Architecting SRE using KPIs

Before we dive into the definition of KPIs, we need to get back to the basic principles of SRE. SRE teams focus on reliability, scalability, availability, performance, efficiency, and response. These are all measurable items, so we can transform them into KPIs. In this section, we will learn how to do that using SLOs, Service-Level Indicators (SLIs), and the error budget.

The main KPIs that we use in SRE are as follows:

  • SLOs: In SRE, this is defined as how good a system should be. An SLO is much more precise than an SLA, which comprises a lot of different KPIs. You could also state that the SLA comprises a number of SLOs. However, an SLO is an agreement between the developers in the SRE team and the product owner of the service, whereas an SLA is an agreement between the service supplier and the end user.

    The SLO is a target value. For example, the web frontend should be able to handle hundreds of requests per minute. Don't make it too complex...

Implementing SRE

So far, we have learned what SRE is and what the key elements are. In this section, we will learn how to start with SRE, but like DevOps, the advice is to start small. Then there are two major steps that will help you to implement SRE in a controlled way:

  • Agree on the standards and practices: This can be for just one SRE team or for the entire enterprise if the ambition reaches that level. In some workbooks this is called kitchen sink, meaning that everything is SRE. This can be a viable approach for companies with a limited set of applications, but for enterprises, it might be wiser to work with an SRE team charter.

    Let's work with a very common example that we will also use in the next chapters. Enterprises usually have product teams working on applications and a platform team that is responsible for the infrastructure. It's good practice to have an SRE team bridging between one product team and the platform team, setting out standards and practices...

Summary

This chapter covered the basics of SRE. The original workbook contains well over 500 pages, so it's almost impossible to summarize the methodology in just a few pages. Yet, after completing this chapter you will have a good understanding of the founding principles of SRE, starting with the definition of SLOs to set requirements on how good a system should be. Subsequently, we measure the SLOs with indicators that tell us how good the system really is. We learned that by working with risk management, error budgets, and blameless post-mortems, SRE engineers can help DevOps teams to improve systems and make them more reliable.

The conclusion of the chapter was that SRE is not very easy to implement in an enterprise. We discussed the first steps of the implementation and learned that if done right, SRE will lead to benefits. Businesses will gain from SRE because a lot of manual work can be reduced, creating room to improve products or develop new ones.

This concludes...

Questions

  1. What is the term that SRE uses to label repetitive, manual work that should be reduced?
  2. What do the terms TTD and TTR mean?
  3. What do we do when we transfer risk?

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Enterprise DevOps for Architects
Published in: Nov 2021Publisher: PacktISBN-13: 9781801812153
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Jeroen Mulder

Jeroen Mulder is a certified enterprise and security architect, and he works with Fujitsu (Netherlands) as a Principal Business Consultant. Earlier, he was a Sr. Lead Architect, focusing on cloud and cloud native technology, at Fujitsu, and was later promoted to become the Head of Applications and Multi-Cloud Services. Jeroen is interested in the cloud technology, architecture for cloud infrastructure, serverless and container technology, application development, and digital transformation using various DevOps methodologies and tools. He has previously authored “Multi-Cloud Architecture and Governance”, “Enterprise DevOps for Architects”, and “Transforming Healthcare with DevOps4Care”.
Read more about Jeroen Mulder