Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Multi-Cloud Strategy for Cloud Architects - Second Edition

You're reading from  Multi-Cloud Strategy for Cloud Architects - Second Edition

Product type Book
Published in Apr 2023
Publisher Packt
ISBN-13 9781804616734
Pages 470 pages
Edition 2nd Edition
Languages
Author (1):
Jeroen Mulder Jeroen Mulder
Profile icon Jeroen Mulder

Table of Contents (23) Chapters

Preface 1. Introduction to Multi-Cloud 2. Collecting Business Requirements 3. Starting the Multi-Cloud Journey 4. Service Designs for Multi-Cloud 5. Managing the Enterprise Cloud Architecture 6. Controlling the Foundation Using Well-Architected Frameworks 7. Designing Applications for Multi-Cloud 8. Creating a Foundation for Data Platforms 9. Creating a Foundation for IoT 10. Managing Costs with FinOps 11. Maturing FinOps 12. Cost Modeling in the Cloud 13. Implementing DevSecOps 14. Defining Security Policies 15. Implementing Identity and Access Management 16. Defining Security Policies for Data 17. Implementing and Integrating Security Monitoring 18. Developing for Multi-Cloud with DevOps and DevSecOps 19. Introducing AIOps and GreenOps in Multi-Cloud 20. Conclusion: The Future of Multi-Cloud 21. Other Books You May Enjoy
22. Index

Conclusion: The Future of Multi-Cloud

This book has dealt with designing, implementing, and controlling a multi-cloud platform. We talked about five major clouds—Azure, AWS, GCP, Oracle Cloud, and Alibaba Cloud—and discussed strategies to get the best out of these clouds for our businesses. We discovered that building and managing in the cloud can be complex. Yet, the cloud will definitively grow. We will look at the future of the cloud in this final chapter.

The cloud will grow and multi-cloud will grow. The biggest challenge is how organizations can stay in control of their applications in a multi-cloud setting since the cloud can become very complex. Maybe Google has the answer: Site Reliability Engineering (SRE). SRE incorporates aspects of software engineering and applies them to infrastructure and operations problems. We will also use this chapter to introduce the concept of SRE and its main principles.

In this chapter, we’re going to cover the...

The growth and adoption of multi-cloud

In recent years, multi-cloud has emerged as a popular approach for businesses to manage their cloud infrastructure. Let’s recap the definition of multi-cloud one more time: we speak about multi-cloud when we use two or more cloud service providers to host and run applications and services. As we look toward the near future, we can expect to see continued developments in multi-cloud as businesses seek to take advantage of its benefits while managing its risks. We’ll talk about managing risks later in this chapter when we explore the concept of SRE.

One of the primary reasons that businesses are looking more into multi-cloud is the need for flexibility and agility. Multi-cloud allows businesses to avoid vendor lock-in and take advantage of the unique features and capabilities offered by different cloud providers. This allows them to optimize their applications and services for specific use cases, such as high-performance computing...

Understanding the concept of SRE

Originally, SRE was meant for mission-critical systems, but overall, it can be used to drive the DevOps process in a more efficient way. The goal is to enable developers to deploy infrastructure quickly and without errors. To achieve this, the deployment is fully automated. In this way of working, operators will not be swamped with requests to constantly onboard and manage more systems.

The original description of SRE as invented by Google is well over 400 pages long. In the Further reading section, a good book is listed to give you a real deep dive into SRE. This chapter is merely an introduction.

Key terms in SRE are service-level indicators (SLIs), SLO, and the error budget, or the number of failures that lead to the unavailability of a system. The terms are explained in more detail in the next paragraphs.

SLI and SLO differ from SLA, the service-level agreement. The SLA is an agreement between the supplier of a service and the end user...

Working with risk analysis in SRE

The basis of SRE is that reliability is something that you can design as part of the architecture of applications and systems. Next to that, reliability is also something that one can measure. According to SRE, reliability is a measurable quality, and that quality can be influenced by design decisions. Engineers can take measures to decrease the detection, response, and repair time, and they can develop systems in such a way that changes can be executed safely without causing any downtime. Architects can design fault-tolerant systems; engineers can develop them.

The major issue is it all comes at a cost, and whether systems really need to be fault-tolerant is a business decision, based on a business case. Already, in Chapter 1, Introduction to Multi-Cloud, we’ve learned that business cases are driven by risks. Let’s go over risk management one more time.

The basic rule is that risk = probability x impact. Enterprises use risk...

Applying monitoring principles in SRE

Reliability is a measurable quality. To be able to measure the quality of the systems and their reliability, teams need real-time information on the status of these systems. As mentioned in the previous section, the TTD is a crucial driver in calculating risk and, subsequently, determining the SLO. Observability is therefore critical in SRE. However, SRE stands with the principle that monitoring needs to be as simple as possible. It uses the four golden signals:

  • Latency: The time that a system needs to return a response.
  • Traffic: The amount of traffic that is placed on the system.
  • Errors: The number of requests placed on a system that fail completely or partially.
  • Saturation: The utilization of the maximum load that a system can handle.

Based on these signals, monitoring rules are defined. As the starting point in SRE is avoiding too much work for operations or toil, the monitoring rules follow the same philosophy...

Applying principles of SRE to multi-cloud—building and operating distributed systems

This book exists because a majority of enterprises are moving or developing systems in cloud environments. Today’s enterprises are in a constant transformation mode. This also means a big change in operations. To put it simply, they have to keep up with the speed of change. Traditional operations can’t handle this. We need SRE in the future of multi-cloud. SRE teams create reliable systems in cloud environments.

There are a couple of important rules for SRE to enable this:

  • Automate everything: Automation leads to consistency, but automation also enables scaling. This requires a very well-thought-out architecture. Automation enables issues to be fixed faster since it only has to be fixed in one place: the code. Automation makes sure that the proper code is distributed over all systems involved. With large distributed systems spanning various cloud platforms, this...

Summary

Systems are getting more complex for many reasons: customers constantly demand more functionality in applications. At the same time, systems need to be available 24/7 without interruption. Cloud platforms are very suitable to facilitate development at high speed, and thus we foresee cloud providers growing fast. In other words, the cloud will definitively grow. This comes with challenges for a lot of businesses. Throughout this book, we discovered that building and managing cloud environments can be complex.

The cloud will grow, and likely the complexity of the cloud will grow too. To ensure reliability, especially with systems that are truly multi-cloud and distributed across different platforms, we should adopt the principles of SRE. The most important principles of SRE have been discussed in this chapter. You should have an understanding of the methodology, based on determining the SLO, measuring the SLI, and working with error budgets.

We’ve learned that...

Questions

  1. Risk analysis is important in SRE. What are the five risk strategies, often referred to as PRACT?
  2. SRE mentions four golden signals in applying monitoring rules. Latency and traffic are two of them. Name the remaining two.
  3. SRE has a specific term for manual work that is often repetitive and should be avoided. What’s that term?
  4. Postmortem analysis is a key principle in SRE. True or false: Postmortem analysis is about finding the root cause and finding out who’s to blame for the error.

Further reading

For more information on SRE, you can refer to Practical Site Reliability Engineering by Pethuru Raj, Packt Publishing.

Join us on Discord!

Read this book alongside other users, cloud experts, authors, and like-minded professionals.Ask questions, provide solutions to other readers, chat with the authors via. Ask Me Anything sessions and much more.

Scan the QR code or visit the link to join the community now.

https://packt.link/cloudanddevops

lock icon The rest of the chapter is locked
You have been reading a chapter from
Multi-Cloud Strategy for Cloud Architects - Second Edition
Published in: Apr 2023 Publisher: Packt ISBN-13: 9781804616734
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at AU $19.99/month. Cancel anytime}