Introducing … the Cloud
Welcome, you absolute legend!
If you consider how many people don’t read books, and fewer still read specifically to educate themselves further rather than to entertain themselves, it is a miraculous thing indeed that you are reading this sentence.
The fact that you have chosen this book – hopefully because of a recommendation by a peer either directly or in an online community you frequent – means we were destined to meet, and it means you are already in the top 1% of people in our field who love learning and staying on top of their game.
Thank you for taking the time to read through these pages. Hopefully, we can provide some of the education you seek and add a little entertainment along the way. Now, let us get right to the point as your time is precious – and the cloud won’t adopt itself. Let’s learn together about everything that needs to happen for an organization (your organization) to successfully adopt the cloud.
Throughout this chapter, you will learn about the authors and why this book exists. We will help you identify gaps in cloud foundational knowledge and guide you to resources that can help you better understand the cloud and ultimately this book.
In this chapter, we’re going to cover the following main topics:
- Who are you?
- Who are we?
- What is this book about?
- Are you ready?
- What are the cloud foundations?
- What is the Cloud Adoption Framework?
Who are you?
As any good editor or publisher will tell you, the success of any book hinges on the careful selection of the target audience. We’ve painstakingly chosen a goal for each and every section, recommendation, and suggestion outlined in this book. But wait, don’t worry! That’s where you come in. Simply engage your critical thinking skills and soak it all in. Piece of cake, right?
- An architect that is going to be leading their organization through an incredible and exciting change, through a digital transformation (warning: buzzwords).
Your job isn’t just to architect the platform, plan a landing zone, architect services that work together like clockwork, and ensure governance going forward – it is also to ensure everyone in the organization is pulling together toward the same goal: the cloud adoption and transformation of your business and even your industry.
You are new to this, and you are trying to find your way – and trying to avoid the common pitfalls of cloud adoption.
You will most likely read from start to finish, skipping only a few sections (for example, a section on migration if you and your organization have nothing to migrate). We’d love to hear from you!
- An architect that has been through the journey of cloud adoption and – there isn’t an easy way of saying this – things haven’t gone to plan. In fact, they have gone terribly wrong, and oh, boy (or girl, or however you want us to address you – hard in a book where we can’t hear your replies), do you have stories to tell. Excellent. If you don’t put yourself out there and try and give it your best, you won’t fail – but you won’t succeed either. So, kudos for your bravery.
You aren’t new to this and as some say, “this isn’t your first rodeo.” You have seen things that have worked and things that have failed. Sometimes you made things better and sometimes you were prevented from making meaningful change, but you must dust yourself off and get back in the saddle, partner.
You will most likely pick the most interesting chapter from the list and start reading there and you will be jumping back and forth muttering agreement or disagreement as you read. Either way, we’d love to hear from you!
- A person in an organization that is going through the transformation process. You could be the CFO, a QA engineer, or any other role, all this equally applies to your role – some parts more than others, of course. There may be an architect who is leading the transformation and you either have huge respect for their vast technical cloud knowledge or the opposite – you see them struggling and want to help.
You are interested in learning but need guidance. What are the questions you should be asking? What are the areas you should be focusing on? Who are the people you should be talking to?
You will read this book and go online every once in a while, to learn more about a particular topic. That is to be commended and encouraged.
Either way, your role in this organizational transformation will become a lot clearer after reading this book once, and by the time you go back and read it again to remind yourself of all the great ideas, you should have a chance to implement some of the suggestions. We’d love to hear from you throughout your journey!
If you fit into one of these three personas (heck, even if you do not!), please – we would very much like to hear from you. Tell us how we did, so we can improve. Tell us whether you feel all your questions were answered or you have more questions. Tell us where we fell short and what you would like to learn more about! We’d also love to hear about your own journey, your experiences, your successes, and the troubles you encountered along the way.
Who are we?
Between us… We are a pair of developers or cloud architects, or entrepreneurs, or consultants, or executives, that have had the good fortune to both be in the right place at the right time and have an enormous passion for this field.
They say if you love what you do, you will never work a day in your life (warning: cheesiness).
We’ve worked across industries; across verticals and horizontals; across industries that are emerging and industries that are heavily regulated; across continents, countries, counties, and cities; working from home or from an office, in open offices, and in our own private offices; across technology stacks and across hyperscale cloud providers; and on-premises, we’ve been a part of digital transformations before digital transformations were a thing.
We’ve survived wars, we’ve survived economic downturns, and we’ve survived pandemics. We’ve had experiences we wish we could forget, and we’ve had experiences that have shaped us and will be treasured for the rest of our lives. Sometimes we’ve helped others and almost always we’ve also learned from them. We’ve dealt with providers large and small, spoken at meetups as small as a dozen people, and at conferences in front of hundreds and thousands of people. We’ve been a part of many teams and led many teams.
And now, put your hands together for Darren…
I still remember the first time I witnessed someone programming. One evening, I watched someone copy BASIC commands from a textbook into a small computer – a Sharp PC (pocket computer), though the exact model escapes me. Looking over his shoulder, I was fascinated by this strange language he wrote that could control a machine and give it a whole life of its own. I thought that, with enough commands, in the correct order, we would magically create AI – I was 8 years old, and the scale of such things was beyond me. Still, this led to owning a computer, learning to code, and spending the majority of my time since then in front of a computer screen. But I regret nothing!
I stumbled across AWS sometime around 2008. Initially, I was dismissive of S3 – to me, it was just web hosting, but with better marketing. Then came EC2, but VMs were nothing new. Local web hosts had been offering these for some time, but Amazon could do it at a fraction of the cost. Interesting… But with SQS, things got really interesting. A message queue, the backbone of any enterprise application could be created with a credit card and a few clicks. But would any serious organization trust a bookshop with even a tiny piece of their IT infrastructure?
Well, we all know what happened next. With more and more services from AWS, more players entered the space to copy and compete with Amazon. But few dared to imagine how the cloud would radically alter the IT landscape and I would wager fewer still could have predicted the effect it would have on the culture of IT departments and technology companies globally.
I feel incredibly lucky to not only have witnessed such rapid transformation in technology over the last three decades, but to actually deliver solutions that delight users – all built on the cloud created by the most innovative technology companies that have ever existed.
And once more, put your hands together for Sasa (read as Sasha)…
I taught myself how to code in a basement shelter in third/fourth grade during the Croatian War of Independence, with no electricity, reading four books on GW-Basic. I coded my first game on paper before I ever saw a computer – I had to simulate random numbers by annoying people in the shelter to pick a number (by the way, one of the worse ways of generating randomness).
I started working as a developer and I still am a developer, even though nowadays people sometimes call me an architect.
I’ve worked in public and private sectors, telecoms, financial institutions, health care providers and insurers, defense, law enforcement, secret services, and governments. I’ve used Java and C#, Python and bash, SQL and NoSQL, and Windows and Linux (and Solaris). I was there when AWS started with my favorite service, to which my best friend introduced me – Simple Queue Service (SQS), and I was there when Azure launched my now all-time favorite service – SignalR (if you don’t know anything about it, drop this book now and go learn about it – everything in this book can wait; SignalR is just so useful as a service). I am also a massive fan of Google’s cloud efforts because I feel we cannot have just two providers dominate the market as that will lead to trouble. (We are already beginning to see some of this troublesome behavior.)
I’ve been called on to approve deals worth over $50 million and I’ve helped the digital transformation efforts of global, strategic top-500 customers of Microsoft. I have talked to C-level executives and developers and quality assurance engineers and project and product managers – and I have successfully (sometimes through many iterations) convinced them to follow a course of action that led them to get the full value out of their cloud investments.
And now, to answer the inevitable questions that always come up – AWS versus Azure? Azure wins for me for two reasons: the Azure portal is amazing when compared to the AWS Management Console, and Microsoft account teams (both commercial and technical) will do anything to help you (as will AWS’) – but the sheer commercial and enterprise power of Microsoft is unrivaled if you are looking to partner commercially. AWS has a better support organization, in my humble opinion. And Google – they should be doing a lot better than they are (maybe they could use some help?) in the cloud wars, but you can definitely run your services in any of these three clouds – if you know what you are doing. Do you? Do you know?
The easiest way to contact me is at linkedin.com/in/sasakovacevic/.
Enough of the fancy words, enough of the introductions; it’s time to tell you exactly how this book will help you – today.
What is this book about?
Very specifically, this book will help you with cloud adoption by describing the following exactly:
- What you need to know to be able to strategize, plan, govern, manage, and innovate your cloud adoption and your applications and services in the cloud (with examples and focus on Azure while still being applicable to any cloud or any multi-cloud environment).
- Who will be your friends in a constant struggle to stumble forward with agility and confidence, how to manage relationships and activities with your friends, and how to be the evangelist that sees every interaction and every touchpoint with anyone in the organization and outside of it as an opportunity to include them and bring them along on the cloud adoption journey. Repetition is key! Repetition is key!
- How to get things done in an iterative and agile way and with one eye (or ear, or finger) on the business needs and the other on the technical requirements.
- When is the right time to approach each topic, when is the time for compromise, and when is the time for decisive action to achieve the business goals.
- The anti-patterns – things that may make sense initially but have been tried and proven not to work.
This book (if you are still following what we were discussing) will equip you with the tools to use with this practical guide to define and execute your cloud adoption strategy. But every business organization is different, at different stages of maturity and with different ideas about what success looks like.
You will walk away from this book with knowledge, specific insight, and a practical plan (and a mindset as well) that will help you and everyone in your organization define and execute a cloud adoption strategy.
We will explore a wealth of past experiences that have enabled us to deliver smooth execution of cloud migrations. We also want to highlight areas that are ripe for innovation.
Industry-specific considerations such as compliance and data security will be at the forefront. We won’t focus as much on a specific technology or go in-depth on how to use it but try and set broad standards and focus on technology as it enables organizational transformation.
We will also investigate the organization’s transformation and how to achieve it, who you absolutely must bring along for the ride, who will go willingly, and who you will have to drag kicking and screaming into the cloud.
You will learn how to create a compelling strategy that gets buy-in across the organization, and which approaches work to win over and influence those with the most to lose (and how to have them look at the wins that are available to them).
No plan survives the first contact with an enemy on the battlefield, but forming a realistic plan is a must for you to be able to deliver and govern cloud adoption and the cloud itself over the long term (or at least for the next 2-3 years before you change organizations).
You will also learn how to recognize the right time to, and how to, decommission existing practices, processes, and technology, and replace them with those appropriate for the cloud in 2025, 2030, and beyond (of course, being mindful that long-term plans are just a beacon in the fog of uncertainty).
The plans you will make need to be general, broad, and adaptable in order to be able to survive contact with the enemy (that is, market forces). General plans that cover a broad range of circumstances are better than specific, narrow plans. You will also need to understand which services are not going to last (in our opinion), which services are vaporware, and which technology trends are important for cloud adoption and your industry.
Finally, we will also give you some tips and you will learn all about navigating cloud adoption in heavily regulated industries such as finance, insurance, defense, and so on.
By the way, a convention to be on the lookout for…
Throughout this book, you will come across sections such as this next one, where one of us (the authors) will interject with an opinion or an anecdote, asides, and tangents that will briefly, or at length, describe something you might want to investigate after you have read, understood, and started implementing the wonderful things you’ve learned here in this book – so be on the lookout for them.
Here is an example of an aside – one of the many questions we get a lot.
Multi-cloud? Yes or no. Or, when?
Absolutely never, except if you are an organization with thousands of developers and have products that are not all interconnected; if you are an umbrella organization and have acquired companies that are already using different clouds and have customers in production; or if you are in a regulated market or a government entity with mandates for multi-cloud.
Regardless, if you can choose, do not choose multi-cloud. You must double or triple governance and you limit the growth and cross-pollination of services and developers. Also, it’s a huge pain adopting one cloud. Why adopt more than one if you really don’t have to?
If you must go with multi-cloud, then pick one primary cloud, do well with it (that is, adopt the hell out of it), and only then introduce the second one. Stay away from three. There be dragons.
But, but, but… what about vendor lock-in? That is not a thing, in so far that everything you do locks you in, so stop worrying about a future issue that may never come up, stop focusing on the lowest-common-denominator technology, and embrace – adopt – the cloud. If you must do multi-cloud, adopt one well and then introduce another.
And, again – in the next edition of this book, you could see your anecdotes here as well, so contact us if you have something to share. We’d love to learn from your experiences and share them with future readers.
I once had a client…
I just want to make sure that if you have been through this journey and had issues along the way, you understand that these things happen to the best of us.
Cloud adoption is hard on both the technology and business levels. Sustainable, governable, and painless cloud adoption is a rare exception – one that we want to help you replicate here.
So, I once had this client (one of many), a huge global financial institution that had attempted to adopt the cloud as best as they knew how and had unfortunately failed spectacularly.
They failed so spectacularly that the regulator fined them and made the governance processes so stringent and hard that they had to completely scrap their effort. And they had applications, services, and customers – live in production. But the mess was such that only a hard reset and only a change at a VP level could get them out of this crisis. Whole departments were disbanded and the organization had to undergo a re-org, and then another one just for good measure, to be able to start again.
This time, in a much smarter (and a reasonably cautious) way, with buy-in from every level of the organization. And they are just now, after a year and a half, coming back with those applications, services, and customers – to production.
Some failures you accept and shy away from, and some you embrace and you do better – maybe with some outside help.
This is one of those double, good news/bad news types of situations.
The bad news: they wasted a lot of time, their competitors plowed ahead, and they suffered in the process. The good news: they understood the benefits of the cloud and were still very keen to try again, and they are now doing a lot better having understood that cloud adoption is easy to do poorly and hard to do well – but well worth the effort.
If only there was a book they could have referred to in their time of need, or if only they had good people that read such a book and understood the complexities of a complete digital transformation. If only…
We’ve met you now and, hopefully, you now understand that this book was tailor-made for you.
Are you ready?
Change is hard. Changing an entire organization is even harder. It is made harder still when coupled with such a huge technological paradigm shift as cloud computing, which not only requires knowledge of software, networking, and cloud services but also the knowledge of high-level concepts that cloud computing brings to the forefront.
Let us share a hard truth with you: almost every organization in the world is now adopting the cloud or planning to adopt the cloud – and they are all trying to do it as a matter of course, as a thing you do and complete and get done, as a thing you do as you’ve done before. Doing that is not impossible, but the results from such a strategy are lackluster at best. Take heed and bear witness to the truths that lie herein (to quote the tales of the Horadrim from the computer game Diablo) – nothing short of revolutionary organizational change and acceptance that we are not in Kansas anymore (to quote from the Wizard of Oz) is going to suffice.
You can either accept the need for this or you can try and fight it, do what you’ve always done, and stick with your traditions of IT change management. Hopefully, it is slowly dawning on you how seriously you and your organization must take this process. For nothing is at stake here, other than the very future of your organization.
The cloud architect is the one person in an organization that needs to understand all of this. Must. Understand. All.
They must be able to convey the importance of each topic to others in the organization and will need to work with all levels in the organization to bring about the change.
In cloud computing, the top priority is to achieve the business goals of the organization. All other matters take a back seat. Focus on what really matters. And in this book, we assume that the business goals are broadly aligned to do the following:
- Deliver digital services to customers (internal and external).
- Be quick but diligent about it (agility and compliance).
- Pay for agility and acceleration but don’t break the bank (focus on agility first, but then circle back to cost optimization regularly).
- Enforce the brand and the reputation of the organization (security and sustainability).
- And lastly, have peace of mind and the time to learn new things (less firefighting, more innovation).
Your organization has decided to go all-in on the cloud. So, the buy-in, in principle at least, is there. Now, how does one transform the organization to be able to quickly and efficiently deliver on that promise in its day-to-day operation.
This book will address this challenge by showcasing the actual path to take from day zero (the decision) to strategy formation, planning, and execution, all the way down to day-to-day operation and long-term management. Anything short of total organization and technology transformation will miss all opportunities the cloud provides.
Agility: Business needs must be addressed yesterday, not in six months or two years. How does the adoption of the cloud help address this? What in the organization is preventing innovation, faster time-to-market, cost efficiency, and global scale? Not just one agile team, but repeated over and over again, at all levels and across all teams, in an orderly, organized, governed, and compliant way – without adding more bureaucracy, but rather by empowering all levels of the organization to be agile by default.
True agile adoption requires one to steer the organization not with slight and sporadic nudges but through focused radical course correction. Imagine a massive oil tanker attempting a 180-degree turn: the process is slow and appears to only make small incremental changes, but it is predictable and when it starts there is no stopping it.
This book will address this challenge by providing callouts, funny anecdotes, adoption stories from enterprise and start-up perspectives, information from running a SaaS platform in a regulated industry, ideal and cynical views, examples of what did work and what didn’t, patterns and anti-patterns, and so on.
I cannot stress this enough, and I’ve had this conversation with many CFOs, product and project managers, and even developers. To quote Donald Knuth, “premature optimization is the root of all evil.” This applies to culture as much as it does to code. Trying to optimize your practices while trying to achieve agility is lunacy – you don’t know yet what is important, or where the bottlenecks will occur.
Focusing on agility means focusing on the easiest path to production. If that means procuring more expensive services either on a higher tier or at a larger scale, do it. You do not want to waste time arguing about the sizes of VMs, App Service plans, or do we use Service Bus or Event Hub. Repeat after me: It doesn’t matter.
Remember that you are trying to develop, deploy, and deliver a digital service to customers. How do you know if it is a success or a failure? You get it out there, into the hands of your customers (again – external or internal), and you gather telemetry and feedback.
Is the service performing its business function and bringing value? Now push on and deliver more. Once you are getting diminished returns on the new features, go back and focus on optimizing the service. You may have paid for a few months of more expensive tiers and services and some of them may not have been optimal, but they were being used and the business is better for it.
The only exception to this is security: you do not compromise on security – ever.
And if the service wasn’t successful, evolve it from telemetry and feedback – or, kill it with fire. Be ruthless. You must. Or the market will be ruthless.
One practical example of this was the delivery of a COVID-19 vaccination registration and scheduling form. We did not compromise on security, but we picked the easiest (fastest, most agile) services to be able to get the form into the hands of the customers as soon as possible.
We ensured elasticity and when 5 million people accessed the form on day 1, it just worked. Then it took us two weeks to move that to more cost-efficient services and evolve the service further (for example, to be able to amend the scheduled slot). We paid for a few weeks more than was necessary, but the service was live and in the hands of those that needed it.
We could have waited a month and done the optimization upfront and then rolled it out, but you can get a lot of people vaccinated in a month.
Another example might be a service for the world’s largest sneakers manufacturer. The decision was between optimizing for agility and deploying a service that would be ready for Black Friday and the Christmas season or optimizing for cost and deploying the service in time for Valentine’s Day.
After stating it in those terms, which path do you think they chose? Which path would you choose?
Practicality: An architect needs to be in control of everything from innovation to workload deployment, scalability, agility, governance, and so on. It is literally impossible for one person to mind all these things, so to scale, an architect needs to influence the rest of the organization to get buy-in initially and continually, and to help create an organization that is then by default ready to address these challenges without the need to micromanage, argue, or struggle to deliver on all levels of the organization.
This book will address this challenge by providing the patterns and mechanisms (and sometimes just pure practical advice) on how to achieve this goal. This is a continuous process that is overwhelming initially and like any new process, it is initially painful as it involves all levels of the organization, but with the proper strategy and planning it can be done at scale.
Understand cloud adoption and digital transformation generally, and what it means in practicality in the day-to-day running of a cloud platform. Learn from actual examples – an enterprise company, a start-up/greenfield site, or a less than successful cloud adoption.
Be able to plan the cloud adoption journey and help all levels of the organization to do so as well. And then execute on that.
Innovate with the business goals in mind, then execute with cloud workloads that are automated and deployed in a predictable and safe manner, in a fast and agile way, without worrying every time someone interacts with the cloud that something will go wrong.
Have an overview of the entirety of your cloud workloads, what they do, why they are there, how they interact with each other, and how to deal with any issue relating to them being there – from communicating internally on required improvements to communicating internally to stakeholders and externally to customers when things inevitably go wrong.
Understand something about the concepts of governance, security, privacy, reliability, operational excellence, cost optimization, performance efficiency, and a whole bunch of Azure services and how/when to use them. Organize teams across the organization and join any of those teams to showcase your ideas or help them understand a difficult cloud concept.
Top of the list of assumptions is your organization has or is planning to have a cloud-first strategy and you have a significant role to play in it – hence we’ve written this book for you. So, we assume you have some understanding or maybe some experience of concepts such as these:
- Infrastructure as a Service
- The major cloud providers: Microsoft, Amazon, and Google
We will attempt to bring everyone to the same level of knowledge, but in general, we assume that architects have general knowledge of (but may lack extensive experience with some or all of) software architecture, cloud architecture, and organizational change management concepts.
As we’ve said before, this part of the book will be a level set for everyone to be on the same page with the basic concepts, so if you are 100% sure you understand the following, you can skip this part:
- Basic cloud concepts
- Security and privacy implications
- Cloud services
- Cloud workload types
- Pricing and support options
If you feel less confident, maybe just skim this section. You can always come back and read it if required. Also, when you gift copies of this book to the people in your organization, they can quickly catch up with acronyms, concepts, and ideas here. If you are one of those people that got gifted this book and need to understand the cloud concepts, welcome! Someone in your organization loves you enough that they would like you to educate yourself further, to be an active participant in your organization’s digital transformation journey.
Smash that like button!
I feel like I should now shout at you to comment, like, and subscribe as it really helps the channel out. But this is not YouTube, so it might be a bit more difficult for you to do. So, get in touch in other ways!
As we continue forward, we will focus on Azure services, however, these concepts and the concepts in this book more broadly apply to any hyperscale cloud provider, so if you are primarily working day to day with AWS or GCP, you should be perfectly fine translating these to their respective services.
So why adopt the cloud and why should we care for it?
Again, you really must think of it in terms of agility. The cloud is a way for us (all of us) to deliver value into the hands of our customers faster. It is also a way for us to deliver value that we just couldn’t before from our own data centers. Be it due to physical constraints or economical constraints (hundreds of thousands of compute units at our fingertips), it was not easy or quick to run an experiment or prototype an application, let alone test it with a small subset of our customers and then scale it to the entire global market.
The cloud also brings services that we would normally have had to introduce ourselves into our architectures and plan for their development, deployment, testing, scaling, supporting, updating, monitoring, and so on.
Azure Service Bus
A service such as Azure Service Bus now gives us the flexibility to handle publish-subscribe events with one deployment of a template and all our services can avail of it, without us having to develop, deploy, and test it for functionality.
It can be highly available and scale automatically, it has 24/7 support, and it updates itself both in terms of bringing additional functionality and security (it can even be made highly available across Azure availability zones by just picking the premium tier and configuring it) and by delivering new features. That is an awful lot of work we don’t have to do. Focusing on features and services that your organization cares about, you shouldn’t be building a service like Azure Service Bus.
Azure Service Bus is a comprehensive offering with many options and possible configurations. There is a wealth of information available on the Azure website on things such as pricing tiers, messaging design pattern scalability, observability, and so on. For example, try out these links:
- Azure Service Bus: https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-messaging-overview
- Publish-Subscribe pattern: https://docs.microsoft.com/en-us/azure/architecture/patterns/publisher-subscriber
- Azure support: https://azure.microsoft.com/en-us/support/plans/
As another example, a service such as Azure Monitor brings with it a whole wealth of integrations (automatic, semi-automatic, and manual) that allow you to monitor your entire Azure estate from a single pane of glass (to use another buzz phrase there). This means that for all Azure services and for a whole bunch of applications and services you are building, you get out-of-the-box monitoring and metrics without you having to do anything other than configuring the endpoints and start ingesting your telemetry.
The power of Azure Monitor (the App Insights part of it especially) doesn’t end there as you can extend Azure Monitoring default events with your own custom events, which usually Azure cannot reason about on its own – for example, every time a person completes a level in your game, evaluate all the inputs, game paths, times, and scores, check them for cheating and submit an event into App Insights on the result of the evaluation. Later, you can investigate these events either automatically or manually and further improve your anti-cheat methods.
Different definitions of the cloud
Getting back to the concept of the cloud, you now understand why the cloud is so powerful. But now let’s switch to what the cloud is. Ask 10 people to define the cloud and you will get at least 13 answers. Ask these 10 people tomorrow and you will get 13 completely different answers. And for sure at least 50% of all those are correct. They might even all be. The cloud means different things to different people. What does it mean for you?
CFOs might focus on cost-saving provided by PaaS services over IaaS and traditional virtualization in their own data centers – bringing costs down means an opportunity to reinvest in more research.
CTOs might focus on the standard catalog of services to be used in a compliant and repeatable way – henceforth bringing an easy onboarding of future services the organization creates.
The head of engineering might focus on reusable components, technologies, and services – thus unlocking career progression opportunities for team members to move between different teams with ease.
A developer might focus on writing just the code they need – rather than also needing to worry about what type of infrastructure will be needed to run the code. They also might focus on how easy it is now to debug in production when compared to on-premises deployments in customers’ environments the developer had no direct insight into.
And all of these are true, so how can the cloud bring that about?
What is the cloud?
Picture a seemingly endless web of physical servers spread across the world. These servers, each with their own special tasks, come together to create the all-encompassing wonder we call the cloud. The cloud is compute, storage, memory, and common IT building blocks at your fingertips without the (traditional) headaches. It is also, for most purposes, “infinitely” scalable in those dimensions. (Alright, technically not infinite, but we rarely worry about having the resources available to scale typical business applications.)
The cloud also delivers global scale, massive bandwidth, and minimal latency through data centers located closer to your customers than you can ever be. Azure has more than 60 regions with more than 60 data centers and tens of thousands of edge locations (with partners such as Akamai and others).
Could you build a service and offer it globally and cheaply before the cloud? Sure, you could. Would it be as cost-optimized? Hell no! Can you do so today in the cloud literally in hours? Yes, absolutely. You can and you should.
The cloud is a vast network of virtualized services ready for you to pick and choose (cherry-pick) which ones you need and is capable of bursting and scaling as you require for your services. The cloud is glued together by an unimaginable length of wires, numbers of chips, and ultimately is a testament to human ingenuity and a vision of evermore powerful computers in the hands of every single person and organization, enabling them and you to achieve more – every day. I hope Microsoft forgives the paraphrasing of their corporate mission here: empower every person and every organization on the planet to achieve more:
Figure 1.1 – Cloud meme
The cloud is, as that meme tells us, someone else’s computer. In fact, it is hundreds of thousands of someone else’s computers, and even millions of them. And it’s perfect that it is that way. We can use them and we don’t have to maintain them.
I don’t own a car
I rent. Either long-term (weeks and months) or short-term (hours or days). I get the benefit of a car – transportation. I don’t get the hassle of repairing and maintaining the car, taxing and insuring it, and a piece of mind worrying about what happens if I scratch it or crash it. I get any car that I want, small and cheap, large and useful, or fancy and expensive.
Is renting a car for everyone? Maybe not – self-driving cars may eventually bring about a mindset change for us all. Is this marginally more expensive on a per-use basis? Yes. Are the benefits worth it? Absolutely. Do I even like driving and am I even a good driver? No, to both. Am I terrible at parking? Yes. Did I get to drive cars I would never be able to afford (at least before everyone and their friend buys this book)? Yes.
Like the cloud, where you rent compute, memory, storage, and bandwidth, I rent cars.
And both renting computers in the cloud and renting cars are a future certainty. An inevitability that is coming soon for all of us.
Now that we are aligned on the cloud itself, let’s focus on to what it means to architect for the cloud.
Architecting the cloud
- Operational excellence
- Performance efficiency
- Cost optimization
What you should consider under the topic of operational excellence is why the processes are set up such as they are in your organization and what needs to change to achieve agility. You should also look to balance your team’s freedom (the desire to do as they like and define their own processes) versus following standard processes as defined.
APM tools must enable visibility of all aspects of application performance, from server health to user experience, giving teams deep insights into how their applications are operating in real time. APM, over time, should provide teams with data points of how changes are impacting their applications, positive or negative, allowing them to take proactive measures, for example, to maintain an optimal level of efficiency or pivot on the functional direction of their application – this type of agility is core to operational excellence.
IaC and automation go hand in hand. They essentially mean that nothing you do should be unique, one of a kind, or manual. Every change and every action needs to go through continuous integration and continuous deployment pipelines. It needs to go through as a unit, as a completed task that is traceable from the idea to the line of code to the telemetry event. This needs to be repeatable and must produce the same identical result every time (this is also referred to as idempotency).
What this also gives you is – say it again – agility. You must be able to roll back any change that is causing disruption, performance dips, or instability.
Is that easy? No.
Is there a lot to do to prepare for that? Yes.
Can it be done iteratively, so we get better at operational excellence over time? Yes.
Is it worth the peace of mind achieved once it is in place? Yes.
The end goal is for you and for everyone in your organization to be able to step away from work at any time for vacations, for fun and adventure, or just to sleep and have no worries that anything will happen that won’t be automatically resolved (by at least rolling back to what worked before). If your organization can deploy new code and new services on a Friday afternoon and no one cares or worries, you are there – you are living the dream of operational excellence. If you are one of these individuals, we’d love to hear from you.
Have I seen any organization achieve all of this? No. Never. Some, though, are so very close.
And that is what it’s all about – doing better today than you did yesterday. And every good deployment, and equally every bad deployment, is an opportunity to learn. No one needs to accept the blame and no one needs to get fired – the solution is always that the process improves.
Yes, someone may still get fired and even prosecuted for deliberate malicious activity, but the solution is and must always be the process improves, we improve, and we do better going forward.
I’ve had customers work with me and try and work out what they do with their services if a DDoS attack is initiated against them. Inevitably, someone will mention we should probably just turn all the services off to save costs in the event of DDoS as throwing infrastructure resources at the problem is sometimes necessary, so just shut down the services and wait until the attacker goes away.
To which my reply is always, let us consider the reason behind a DDoS attack and what the goal is. Pause here and think. What is the goal?
OK, so if the goal is to make your services inaccessible to others, what good does shutting them down do, except doing exactly what they wanted to achieve? For example, a DDoS attack against an Xbox service is designed to make gamers unable to, well, game. If you then turn off the service as a response, what have you achieved?
The key thing about reliability is for the services to continue to function.
DDoS mitigation could very well be a book in its own right so we won’t go into that here, but just to give you a head start: Azure has a service that mitigates DDoS attacks, one tier being free and the other costing you money. Turning that on is a really (really, really) good idea for public-facing endpoints. Also, Microsoft will have teams ready to assist at a moment’s notice if the attack does happen and the automatic mechanisms don’t prevent it. And you will have a priority service if that is the case.
Before you invest time in high availability and resiliency from a redundancy perspective, ensure that is the actual business requirement. I’ve seen so many teams struggle to achieve unreasonably high availability, only to answer my question “What is the traffic to the service?” with “Nine queries a week on average.” Or, my question “What exactly does the service do?” with “PDF generator”. Unless your business is PDF generation, people can usually come back for their PDF or wait until it is processed and generated in a background thread and emailed to them.
I am already looking forward to all the feedback like “Well, actually, our PDF service is mission-critical.” All I am saying is think before you invest effort in reliability. Ask the business how critical the service is.
And another aside here: if all services are critical, then no service is critical. This has a slight possibility of being incorrect, but I’ve never seen it.
Another way to improve resiliency is for the services to fall back to less-intensive responses. For example, if the service returns the most sold items today and it needs to do massive parallel queries and correlate the values, it can fall back to a cached list from yesterday, which is just one fast query.
Resiliency is another topic we could spend a lot of time on, but for now, just remember these concepts: single point of failure, graceful degradation, and one last thing – if there are issues with one service in your architecture, expect issues to cascade up and/or down the stack, and even after you have mitigated the issues, expect further issues in the next week or two, so be prepared and staffed. A rule of thumb– here for you free of charge (almost) – will save you a lot of headache.
The reason behind this is that in architectures we see today, interconnectedness is baked in (unfortunately) more than it should be as it is often not easy to visualize all the dependencies, so maybe work on visualizing those as well – before issues happen.
Why is it that in the cloud, which is so powerful and useful, these issues are more pronounced? Well, there are now more people and machines connected to the internet and there are more and more services being used by more and more people and machines, so this wasn’t such an issue in the 1990s, but it is today. The underpinning concept behind cloud computing is using commodity hardware, and at such a scale that small percentages matter. For example, 1% failure per year on 2 disks means disks will be fine almost all the time. But 1% failure at a scale of 60 million disks means that 600,000 will fail this year. That is an issue. And while disks fail at more than 1% per year, other components must also be considered, such as chips, and so on. Also, the cloud is, for our purposes, public (as opposed to the private cloud), meaning the cloud is a shared service. Though logically isolated, you may find yourself with noisy neighbors that may impact your services. You will get hackers from the bedroom variety to the state-sponsored type that sometimes do, but most often don’t, target specific organizations, but rather spray and pray they get you – and you too can pray that you don’t get caught in the crossfire.
Now that you are in the cloud, you also need to consider that updates to the underlying technology don’t always go well, and Microsoft, Amazon, and Google will destroy and disrupt services in one or all regions, with regularity. No slight meant here against their SRE teams, that is just again playing with large numbers and small percentages. If they do 1 update a year, then 1% failure is negligible, but if they do thousands a day, then 1% starts growing rapidly. However, that is the whole idea behind the cloud – everything can and will fail, and you will learn to love it and understand it because that very fact brings about new ways to simplify and plan for high availability in a different way than if you were running your own data center. Not to mention you and your organization are not above failure as well.
What is the risk of losing a data center? I have seen risk logs with an entry for a scenario were a meteor crashes into the data center. But that is such a remote chance that your cloud provider destroying a data center is much more likely.
Now that you know that failures are not only expected but inevitable, you can design and architect your services around that – if the business requirements are there that demand it. Remember, people doing manual work make so many more mistakes compared to automated machine processes – hence automation is again your friend. Invest in your friend.
Performance efficiency is defined by Microsoft as the ability of the system to adapt to changes in load. And this again brings us back to… agility. How hard do you have to work for your service to go from supporting one customer to a billion customers?
Can you design and configure a service that does this automatically? Azure Active Directory, Azure Traffic Manager, and Azure Functions are examples of such services with auto-scaling.
Prefer PaaS and SaaS services over IaaS, prefer managed services over unmanaged, and prefer horizontal scaling (out) rather than vertical (up). This also applies to scaling up and down.
You should consider offloading processing to the background. If you can collect data and then process it later, the performance will improve. If you can process data not in real time but as and when you have the baseline capacity, the performance will improve.
You should consider caching everything you can – static resources. Then consider caching more – dynamic resources that aren’t real-time sensitive as well. Then consider caching more – results from the database, lookup tables, and so on. When should you stop caching? When everything is cached, and you can cache no more, the performance will improve. A great caching service is Azure Redis, but it is by no means the only one. Another amazing one to consider is the CDN service.
Have you considered your write and read paths and are they stressing the environment? Try data partitioning, data duplication, data aggregation, data denormalization, or data normalization. All of these can help improve performance.
Are you using the most expensive service to store data? Azure SQL is great when you need queries, and you need to do them often. But having a 1-TB database for the past 6 months of records that keeps growing while all your users only search today’s events is a waste.
Moving data around is what you should get used to. Use the right storage and the right compute resources at the right time. Moving data to another region may be costly but moving it within the region may be completely free. And using the most appropriate storage can save you millions. And to facilitate this, a lot of Azure services provide data management and offloading capabilities.
Cosmos DB has time-to-live functionality, so if you know an item won’t be needed after a time, you can expunge it automatically, while you can still simultaneously store it in a file. Azure Blob Storage has Hot, Cold, and Archive tiers and it can move the underlying storage automatically as well. If the file is no longer needed to be highly available, move it to lower-tier storage – you will pay a lot less.
And remember, there is an egress cost! When you are about to move data, always ask What about egress costs?
Security, as defined by Microsoft, follows the zero trust model in protecting your applications and data from threats – including from components within your controlled network. There are so many ways to protect your workload.
We have Azure DDoS Protection, which protects against denial of service attacks; Azure Front Door geo-filtering, which limits traffic that you will accept to specific regions or countries; Azure Web Application Firewall, which controls access by inspecting traffic on a per request basis; IP whitelisting, which limits exposure to only the accepted IPs; VNET integration of backend services, which restricts access from the public internet; Azure Sentinel, which is cloud-native security information and event management (SIEM), and so on.
A lot of these don’t require you to manage them day to day – you set and forget them. For example, with VNET integration, once you’ve enabled it and written some automated tests to ensure it works every time, you are done.
- Microsoft Azure Well-Architected Framework:
- Microsoft Azure Well-Architected Review:
AWS and GCP offer similar guidance as well. These are specific to each hyperscale cloud provider and to each service and concept as it pertains to them, so while the general concepts are similar, the actual guidance may differ based on service definitions and implementations.
Cloud security and data privacy
Security is a shared responsibility between your entire organization and your cloud provider. Especially, as we are playing here on different levels, from the physical security of the data centers to the security of your passwords and other cryptographic secrets you need in your services’ operation.
You need to protect your – as well as your customers’ – data, services, applications, and the underlying infrastructure.
Services such as Microsoft Defender for Cloud are your friend and will give you plenty to concern yourself with – everything from ports open to the public to automatic security insights such as traffic anomalies, for example, machine A has started communicating with machine E and has never previously done so.
You will also need to understand the patterns around the use of Azure Key Vault and how to successfully use Key Vault in your IaC scripts and in your applications and services.
Then there are services that protect the public perimeter, such as Azure DDoS Protection, Azure Front Door, Azure Application Firewall, and so on. And each service has security recommendations and best practices and guidance on how best to protect it from internal and external threats.
Sometimes though, you will just need to guarantee that data hasn’t been tampered with, so we slowly start moving from security to compliance. Azure confidential ledger (ACL) is one such service that ensures that your data is stored and hosted in a trusted execution environment. The scope around these is fascinating and the science and patterns are really showcasing what is possible today with technology – not just possible but guaranteed.
In Microsoft, there are teams whose job is to ensure the compliance of services and the platform with legal and regulatory standards around the world. You name it, they have it. AWS and GCP are close behind as well.
Again, a reminder that implementing recommendations from any or all of these does not mean you are compliant as well or that you are secure. Shared responsibility means you still must do your due diligence and work to satisfy the requirements of compliance frameworks. Theory and practice both must be satisfied.
As mentioned, we’ve focused on Azure in this book as a primary hyperscale cloud provider, but here are three great pages (one from GCP and two from Azure) that give an overview and compare services and offerings so you can easily understand similar services across these providers:
- AWS, Azure, GCP service comparison: https://cloud.google.com/free/docs/aws-azure-gcp-service-comparison
- Azure for GCP Professionals: https://docs.microsoft.com/en-us/azure/architecture/gcp-professional/
Figure 1.2 – Azure for GCP Professionals screenshot
- Azure for AWS Professionals: https://docs.microsoft.com/en-us/azure/architecture/aws-professional/
Figure 1.3 – Azure for AWS Professionals screenshot
Getting to grips with one cloud platform may seem like a daunting task. If so, you probably think that learning about all three is an impossibility. Rest assured that each cloud has many similarities and the skills you acquire now will stand you in good stead if you ever need to use another cloud in the future. Hopefully, these articles have enlightened you a little and shown just how similar the major cloud platforms really are.
Cloud workload types
A workload is a collection of assets that are deployed together to support a technology service or a business process – or both. Specifically, we are talking about things such as database migration, cloud-native applications, and so on.
When talking about cloud adoption, we are looking for an inventory of things that we will be deploying to the cloud, either directly or via migration.
You need to work across the organization with all the stakeholders to identify workloads and understand them, prioritize them, and understand their interdependencies to be able to properly plan and parallelize or serialize your workloads depending on their needs and dependencies.
You and the stakeholders will need to identify, explain, and document each workload in terms of its name, description, motivations, sponsors, units, parts of the organization they belong to, and so on. This then means you can further identify metrics of success for each workload and the impact this workload has on the business, on data, and on applications.
Then you can approach the technical details such as the adoption approach and pattern, criticality, and resulting SLAs, data classification, sources, and applicable regions. This will enable you to assign who will lead each workload and who can align and manage the assets and personnel required.
The highest priority must be given to a decision between migration as is (commonly known as lift and shift) or a more modern cloud-native approach. The highest priority must be given to this task as any error here will cause delays and, because of dependency issues, the timeline slip may escalate quickly. And with enterprise customers, there may be thousands of workloads to execute. Take care that this step is taken very seriously and meticulously by the organization.
One common thing that happens is that a lot of responsibility gets assigned to a very small team who may not have all the information and must hunt for the information in the organization while trying to prioritize and plan the workloads and dependencies. This usually results in poor decisions. While it might be tempting to go for modernization, where migration is concerned it is best to lift and shift first, followed quickly by an optimization phase. Business reasons for the migration are usually tied to contractual obligations (for example, a data center contract) and modernization for teams new to the cloud rarely goes swimmingly with a looming deadline.
On the topic of business cases for each workload, do remember to compare apples to apples and so compare the total cost of ownership for all options. This rarely gets done properly, if at all, especially if done internally without a cloud provider or consulting support.
Ensuring cost consciousness is another activity that gets overlooked. You need to plan before you start moving workloads around. Who will be responsible and how will we monitor costs. Overprovisioning happens often and with regularity. And remember: ensuring cost-optimized workloads is not a one-time activity. You are now signing up for continuous improvement over the lifetime of the workload, or risk costs spiraling out of control. Once they do, it is even harder to understand them and get them back under control.
As mentioned before, quite a few times now, agility not cost must be your primary goal. Having said that, letting costs spiral out of control is wasteful, so occasionally, (at least quarterly) every team should invest some time in optimizing costs. And if you are a great architect, you (or your team) will want to join in (or initiate things) and help out with synergies across teams they may have missed.
Counter-intuitively, cloud providers and their account teams should and usually are incentivized to help you optimize costs, so check in with them regularly if they don’t proactively reach out to you. The reason is simple: the happier you are with the cloud performance cost-wise, the more you can do for the same amount of money, so you will do more in the future. It really is that simple.
OK, so you want to optimize your costs. What do you look at first?
The easiest is to start with two things: individual services and high availability requirements. Individual services are updating all the time, adding new cost tiers (for example, the Azure Storage archival tier), adding serverless options (for example, pay only for the actual usage on a per request basis, such as the Azure Cosmos DB serverless option), and moving features from higher tiers into lower ones, giving you the ability to trade off cost versus capacity, performance, and features.
The next best thing is that, thanks to overzealous reliability requirements at the start of any project, you can usually go back and architect around or remove completely such requirements and save considerably. For example, a calculation service that is deployed to 14 data centers because you started with one Azure region pair of 2 and then replicated that to all other paired regions and now have deployed that service 14 times because you have 7 two region pairs. Is this really required or could it be just 1 in each paired region and the fallback is to any of the other 6?! Beware of data residency requirements here, so maybe it still is a valid requirement.
Multi-regional failover is relatively easy and is often overlooked. With just a few DNS changes and a few Azure Traffic Manager settings, you can increase reliability significantly and quickly with little effort.
Other things you can do require a bit more effort, such as moving from one database type to another (for example, Azure SQL to Azure Cosmos DB), switching between comparable services, optimizing APIs to have them be less chatty, deploying to Linux machines instead of Windows, and so on.
Sometimes you can get amazing results – for example, my favorite service in Azure is Azure SignalR, which is used to add real-time functionality to your apps. But if you think about it, real-time functionality is similar to querying a database directly, and if you have a lot of the same queries, there may be a way to use SignalR to execute the query once and have thousands of requests return the same response, like caching but not even having to query the cache, getting the response through a push mechanism before the request to then cache or database gets made.
Azure has a pricing calculator on the website, which you can use to get your overall estimate, but for cost optimization, it doesn’t really help outside of showing you the reservation options. For example, if you have a standard baseline usage of some services (for example, Azure VMs, Azure Cosmos DB, Azure SQL, etc.), you may reserve capacity and prepay for it and get significant discounts – over 50% in some cases.
You will also get recommendations from the AI behind the Azure Advisor service, and while those are almost always great to act upon, quarterly reviews are still a necessity.
As for paid support, there are multiple options available. If you are playing around in a sandbox environment and you really don’t need support, you will get some help when trying to make things work from Stack Overflow and other random blogs. However, only the official support can diagnose certain technical issues. Of course, in production, you will likely need a quick response time and help through service outages.
The support options in Azure are as follows (https://azure.microsoft.com/en-us/support/plans/):
Included for all customers, provides self-help resources
Access to Technical Support via email
For production environments
24/7, 8-hour response
Includes proactive guidance
24/7, 1-hour response
For support across the Microsoft suite of products, including Azure
24/7, 1-hour response
Table 1.1 – Compare Azure support plans
What is the Cloud Adoption Framework?
All the hyperscale cloud providers – all three of them (Azure, AWS, and GCP) – know that to get the most value out of your investment, you must adopt the cloud and the cloud concepts properly – otherwise, you will invest less in the future.
One struggle the account teams in these hyperscale cloud providers have (and yes, that includes the account team that works with you) is your speed of adoption, which is limited (and therefore impacts their KPIs and their promotions and bonuses) by your struggles with getting things done quickly, at scale, and with definitive and recognizable business benefits.
So, your and your organization’s lack of proper cloud adoption is not only making it difficult for your organization to avail of the benefits of the cloud, but it may also in fact limit the cloud adoption by your internal teams. And of course, think of your cloud account team and their bonuses. While this is just a bit facetious, it really is not a win for anyone. Luckily, the only win case here is for everyone to win by adopting the cloud in the right way. Doing what this book advises is good for everyone, including the broader consumer market, and is the only way for your organization to stop struggling and start enjoying cloud adoption.
What exactly is the benefit of this book over comprehensive resources that are available freely online? Great question. What you will get from those guides is a lot of insight into the specifics of each cloud, but what you won’t get is the years of experience working with clients, helping you avoid pitfalls and letting you know what and how to prioritize your way out of these. Also, all these lack any humor whatsoever. And sometimes they are just plain wrong – they lack any insight into your organization. Every organization has its quirks, its legacy issues, and its future plans, and so what we are doing in this book is guiding you on a path where you can confidently pick and choose (cherry-pick, if you will) what will and won’t work for your organization.
Should you read all about every cloud service or just focus on the subset that your organization is adopting? Another great question, you absolute legend, but one you already know the answer to. Yes. Read (skim through) all the available documentation. You will learn a lot. Sometimes what you learn you will also remember if you’ve read it multiple times. You will also learn the subtle, nuanced differences between the providers. And you will learn what their priorities are and who their target audience seems to be. You might be surprised.
One thing you must resist though, is the temptation to adopt everything you read online, hence this book. Otherwise, you will sacrifice agility for premature optimization. And the providers’ own account teams will try and take you on a journey of fully adopting these in the way they are written. This is 100% wrong. Calling it right here. Yes, absolutely you should work with them and their wealth of knowledge, but on your own terms after fully understanding the causes and effects each of the recommendations will have on one thing – your organization’s agility (that is, your organization’s ability to deliver business value).
So which hyperscale cloud provider is best?
Amazing question. So original. No, I always get asked it, having experience with all three clouds. So here is my definitive answer – just an opinion though, so think carefully before writing me a nasty “Well, actually…” note! It’s a short opinion, so missing a lot of nuances, but you are not here for nuanced opinions – no one ever is.
Azure is best for two target audiences: enterprise companies and everyone who hates the AWS console. Enterprise companies cannot find a better partner out there than Microsoft. You are using Office 365 and/or you have legacy enterprise software and/or own data centers and/or need commercial support selling your software and services. No company other than Microsoft will serve you better or support you better. And Azure portal blades are the best thing since sliced bread. The AWS console is holding their customers back – literally. For start-ups, look elsewhere unless you are on the Microsoft stack, then pick Azure. However, you will be on your own – Microsoft will throw you a bone sometimes (such as through the Microsoft for Startups program), but it is up to you to get things done. Once you start scaling customers and profit, welcome – you are now an enterprise company. Talk to Microsoft again.
AWS is best if you are a start-up focused on business value rather than geeking out over technology. If technology is a means to a business end, AWS is for you. It has easily the best marketplace, easily the best support (hello, chat), and is the easiest path to take, if you are not all in on the Microsoft stack. All services are there for you. Just pick them and scale.
GCP is for the technology geeks and those start-ups with a deep affinity to the way Google services work. If technology is your business, GCP is for you. This is the true home of any SRE. And if you are in the advertising space, GCP is your valued partner. Do not buy into any early access or new and innovative market-making service though as Google is famous for killing or abandoning services. If all you do is AI, GCP is for you as well. If AI is a valuable piece of your overall business, you are better off with AWS or Azure.
Two final thoughts: one, you won’t go wrong picking any of these if you are a capable individual and a robust and knowledgeable organization, so don’t stress it too much; two, none of this matters anyway, as your organization’s CEO will pick a hyperscale cloud provider, throw $50 or $500 million at them and commit your organization to them for the next 5 years (and beyond) and you will have to just deal with it. So there!
By now, you should have a good sense of what we hope to achieve with this book. You probably breezed through the cloud foundations, though you likely have many unanswered questions on cloud adoption. That is expected at this stage. We will hopefully answer these over the course of the book.
In the next chapter, we set the scene for every successful cloud adoption – strategy.