Home

Optimizing Microsoft Azure Workloads

By Rithin Skaria

Book

eBook $39.99 $27.98

Print $49.99

Subscription $15.99 $10 p/m for three months

BUY NOW

$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

eBook $39.99 $27.98

Print $49.99

Subscription $15.99 $10 p/m for three months

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

About this book

It’s easy to learn and deploy resources in Microsoft Azure, without worrying about resource optimization. However, for production or mission critical workloads, it’s crucial that you follow best practices for resource deployment to attain security, reliability, operational excellence and performance. Apart from these aspects, you need to account for cost considerations, as it’s the leading reason for almost every organization’s cloud transformation. In this book, you’ll learn to leverage Microsoft Well-Architected Framework to optimize your workloads in Azure. This Framework is a set of recommended practices developed by Microsoft based on five aligned pillars; cost optimization, performance, reliability, operational excellence, and security. You’ll explore each of these pillars and discover how to perform an assessment to determine the quality of your existing workloads. Through the book, you’ll uncover different design patterns and procedures related to each of the Well-Architected Framework pillars. By the end of this book, you’ll be well-equipped to collect and assess data from an Azure environment and perform the necessary upturn of your Azure workloads.

Publication date:: August 2023
Publisher: Packt
Pages: 240
ISBN: 9781837632923

Planning Workloads with the Well-Architected Framework

Microsoft has different frameworks nurtured for Azure; prominent ones are the Cloud Adoption Framework (CAF) and the Well-Architected Framework (WAF). There are other frameworks that are subsets of these prominent ones. In this book, we will be covering the WAF and its five pillars.

Important note

Do not get confused with the Web Application Firewall in Azure, which is also often denoted as WAF. If you see any reference to WAF in this book, that is the Well-Architected Framework.

Just to give you a quick introduction, the WAF deals with a set of best practices and guidelines developed by Microsoft for optimizing your workloads in Azure. As described in the opening paragraph, this framework has five pillars, and the optimization is aligned with these pillars. Let’s not take a deep dive into these pillars at this point; nevertheless, we will certainly cover all aspects of the five pillars as we progress. Further, we will cover the elements of the WAF. When we discuss elements, we will talk about cloud design patterns. This is a lengthy topic, and it’s recommended that you refer to the Cloud Design Patterns documentation (https://docs.microsoft.com/en-us/azure/architecture/patterns/) if you are new to this topic. You will see the pattern names coming up when we discuss design principles, but as patterns are out of scope for this book, we will not take a deep dive into this topic.

In this chapter, we will learn why there is a need for the WAF, its pillars, and its elements.

Why the WAF?

Microsoft Azure has incredible documentation that can help any beginner to deploy their first workload in Azure. With the help of this well-planned documentation and tutorials, deployment is not a tedious task. Now, the question is: Are these workloads optimized or running in the best shape?

When it comes to optimizing, some considerations include the following:

What is the cost of running this workload?
What is the business continuity (BC) and disaster recovery (DR) strategy?
Are the workloads secured from common internet attacks?
Are there any performance issues during peak hours?

These are some common considerations related to optimization. Nonetheless, considerations may vary from workload to workload. We need to understand the best practices and guidelines for each of our workloads, and if it’s a complex solution, then finding the best practices for each service can be a weighty task. This is where the Microsoft Azure WAF comes into the picture.

Quoting Microsoft’s documentation: “The Azure Well-Architected Framework is a set of guiding tenets that can be used to improve the quality of a workload.”

While some organizations have already completed their cloud adoption journey, others are still in the transition and early stages. As the documentation states, this framework is a clear recipe for improving the quality of mission-critical workloads we migrate to the cloud. Incorporating the best practices outlined by Microsoft will produce a high-standard, durable, and cost-effective cloud architecture.

Now that we know the outcome of leveraging the WAF, let’s look at its pillars. The framework comprises five interconnected pillars of architectural excellence, as follows:

Cost optimization
Operational excellence
Performance efficiency
Reliability
Security

The assessment of the workload will be aligned with these pillars, and the pillars are interconnected. Let’s take an example to understand what interconnected means.

Consider the case of a web application running on a virtual machine (VM) scale set. We can improve the performance by enabling autoscaling so that the number of instances is increased automatically whenever there is a performance bottleneck. On the other hand, when we enable autoscaling, we are only using the extra compute power whenever we need it; this way, we only pay for the extra instances at the time of need, not 24x7.

As you can see in this scenario, both performance and cost optimization are achieved by enabling autoscaling. Similarly, we can connect these pillars and improve the quality of the workload. Nonetheless, there will be trade-offs as well—for example, trying to improve reliability will increase the cost; we will discuss this later in this book.

Let’s take a closer glimpse of these pillars in the next section.

What are the pillars of the WAF?

As you read in the previous section, Microsoft has divided its optimization plans, targeting five pillars of architectural excellence. Even though we have dedicated chapters for each of the pillars, for the time being, let’s cover some key concepts related to each of the pillars.

The following figure shows the five pillars of the WAF:

Figure 1.1 – The five pillars of the WAF

We will start with the first pillar, cost optimization.

Cost optimization

One of the main reasons for organizations to adopt the cloud is its cost-effectiveness. The total cost of ownership (TCO) is much less in the cloud as the end customer doesn’t need to purchase any physical servers or set up data centers. Due to the agility of the cloud, they can deploy, scale, and decommission as required. With the help of the Azure TCO calculator (https://azure.microsoft.com/en-us/pricing/tco/calculator/), customers can estimate cost savings before migrating to Azure. Once they are migrated, the journey doesn’t end there; migrations mostly go with the lift-and-shift strategy where the workloads are deployed with a similar size as on-premises. The challenge here is that with on-premises, there is no cost for individual VMs or servers as the customer will make a capital investment and purchase the servers. The only cost will be for licensing, maintenance, electricity, cooling, and labor. In the case of Azure, the cost will be pay-as-you-go; for n number of hours, you must pay n times the per-hour cost, and the price of the server varies with size and location. If the servers were wrongly sized on-premises, then during the migration we will replicate that mistake in the cloud. With the servers running underutilized, you are paying extra every hour, every day, and every month. For this reason, we need cost optimization after migration.

It’s recommended that organizations conduct cost reviews every quarter to understand anomalies, plan the budget, and forecast usage. With the help of cost optimization, we will find underutilized and idle resources, often referred to as waste, and eliminate them. Eliminating this waste will improve the cost profile of your workloads and result in cost savings. In Chapter 3, Implementing Cost Optimization, we will assess a demo Azure environment and see how we can develop a remediation plan. Once we figure out the weak points in our infrastructure, we can resize, eliminate, or enforce policies for cost optimization.

Operational excellence

Operations and procedures required to run a production application are covered by operational excellence. When we are deploying our applications to our resources, we need to make sure that we have a reliable, predictable, and repeatable process for deployment. In Azure, we can automate the deployment process, which will eliminate any human errors. Bug fixes can be easily deployed if we have a fast and reliable deployment. Most importantly, whenever there is an issue post-deployment, we can always roll back to the last known good configuration.

In Chapter 4, Achieving Operational Excellence, we will learn about key topics related to operational excellence. For the time being, let’s name the topics and explore them later. The key topics are application design, monitoring, app performance management, code deployment, infrastructure provisioning, and testing.

Operational excellence mainly concentrates on DevOps patterns for application deployment and processes related to deployment. This includes guidance on application design and the build process, as well as automating deployments using DevOps principles.

Performance efficiency

As we saw in the case of cost optimization, we scale the workloads to meet demand with the help of autoscaling; this ability to scale is what we cover in the performance efficiency pillar. In Azure, we can define the minimum number of instances that are adequate to run our application during non-peak hours. During peak hours, we can define an autoscaling policy by which the number of instances can be increased. The increase can be controlled by a metric (CPU, memory, and so on) or a schedule. Nevertheless, we can also define the maximum number of instances to stop scaling after a certain number to control billing. To be honest, this autoscaling scenario was not at all possible before the cloud. Earlier, administrators used to create oversized instances that could handle both peak and non-peak hours. But with Azure, this has changed; the advantage here is that Azure will collect all metrics out of the box, and we can easily figure out bottlenecks.

Proper planning is required to define the scaling requirements. In Azure, how we define scaling varies from resource to resource. Some resource tiers don’t offer autoscaling and you must go with manual scaling, while others don’t support both automatic and manual scaling. One thing to note here is performance efficiency is not only about autoscaling, but it also includes data performance, content delivery performance, caching, and background jobs. Thus, we can infer that this pillar deals with the overall performance efficiency of our application.

In Chapter 5, Improving Applications with Performance Efficiency, we will take a deep dive into performance patterns, practices, and performance checklists.

Reliability

The word “reliability” means consistent performance and, in this context, it means redundant operation of the application. When we build and deploy our applications in Azure, we need to make sure that they are reliable. In our on-premises environment, we use different redundancy techniques to make sure that our application and data are available even if there is a failure. For example, we use Redundant Array of Independent Disks (RAID) on-premises, where we replicate the data using multiple disks to increase data reliability.

In Azure or any other cloud, the first and foremost thing we need to admit is that there are chances of failure and it’s not completely failproof. Keeping this in mind, we need to design our applications in a reliable manner by making use of different cloud features. Incorporating these techniques in the design will avoid a single point of failure (SPOF).

The level of reliability is often driven by the service-level agreement (SLA) required by the application or end users. For example, a single VM with a premium disk offers 99.9% uptime, but if a failure happens on the host server in the Azure data center, your VM will face downtime. Here, we can leverage availability sets or availability zones, which will help you deploy multiple VMs across fault domains/update domains or zones. By doing so, the SLA can be increased to 99.95% for availability sets and 99.99% for availability zones. Always keep in mind that to get this SLA, you need to have at least two VMs deployed across the availability sets or zones. Earlier, we read that the pillars of the WAF are interconnected, and they work hand in hand. However, in this case, if you want to increase reliability, you need to deploy multiple instances of your application, and what that essentially means is your costs will increase. Remember that these pillars work hand in hand, and sometimes there will be trade-offs, as we have seen in this scenario.

Security

Security in public clouds was—and is always—a concern for enterprise customers because of the complexity and the way attackers are coming up with new types of attacks. Coping with these types of attacks is always a challenge, and finding the right skills to mitigate these attacks is not easy for organizations. In Azure, we follow the shared responsibility model; the model defines the responsibilities of Microsoft and its customers based on the technology. If we take Infrastructure-as-a-Service (IaaS) solutions such as VMs, more responsibility is with the customer, and Microsoft is responsible for the security of the underlying infrastructure. The levels of responsibilities will shift more to Microsoft if you choose a Platform-as-a-Service (PaaS) solution.

It’s very important to leverage the different security options provided by Azure to improve the security of our workloads. In the security pillar, we will assess the workloads and make sure they align with the security best practices outlined by Microsoft. As we progress, in Chapter 7, Leveraging the Security Pillar, we will take a holistic approach to security and how to build secure applications.

Exploring the elements of the WAF

Cost optimization, operational excellence, performance efficiency, reliability, and security are the five pillars of the WAF. When it comes to the elements of the WAF, this is different from the pillars. If we place the WAF in the center, then we have six supporting elements. These elements support the pillars with the principles and datasets required for the assessment.

As you know, the WAF is a set of best practices developed by Microsoft; these best practices are further categorized into five interconnected pillars. Now, the question is: Where exactly are these best practices inscribed? In other words, the practices should be developed first before we can categorize them into different pillars. This is where the elements come into the picture. The elements act as a stanchion for the pillars.

As per Microsoft’s documentation, the supporting elements for the WAF are the following:

Azure Well-Architected Review
Azure Advisor
Documentation
Partners, support, and service offers
Reference architecture
Design principles

Now, we will see the explanation of each of these elements. Let’s start with the Azure Well-Architected Review.

Azure Well-Architected Review

Assessment of the workload is required for the creation of the remediation plan; the assessment is inevitable. In the Well-Architected Review, there will be a set of questions prepared by Microsoft to understand the processes and practices in your environment. There will be a separate questionnaire for each pillar of the WAF. For example, the questionnaire for cost optimization will contain questions related to Azure Reserved Instances, tagging, Azure Hybrid Benefit, and so on. Meanwhile, the operational excellence questionnaire will have questions related to DevOps practices and approaches. There will be different possible answers to these questions, varying from recommended methods to non-recommended methods. Customers can answer based on their environment, and the system will generate a plan with recommendations that can be implemented to make their environment aligned with the WAF.

The review can be taken by anyone from the Microsoft Assessments portal (https://docs.microsoft.com/en-us/assessments/?mode=home). In the portal, you must select Azure Well-Architected Review, as shown in the following screenshot:

Figure 1.2 – Accessing Microsoft Assessments

Once you select Azure Well-Architected Review, you will be presented with a popup asking whether you want to create a new assessment or create a milestone. If you want to create a new assessment, then you can go for New Assessment, or choose Create a milestone for an existing assessment. At this point, we will conduct an assessment; nevertheless, each pillar of the WAF has its own dedicated chapter, and we will perform the assessment there.

With that, we will move on to the next element of the framework, which is Azure Advisor.

Azure Advisor

If you have worked on Microsoft Azure, you will know that Azure Advisor is the personalized cloud consultant developed by Microsoft for you. Azure Advisor can generate recommendations for you, and you can leverage this tool to improve the quality of workloads. Looking at Figure 1.3, we can see that the recommendations are categorized into different groups, and the group names are the same as the pillars of the WAF:

Figure 1.3 – Azure Advisor

With the help of Azure Advisor, you can do the following:

Get best practices and recommendations aligned to the pillars of the WAF
Enhance the cost optimization, performance, reliability, and operational excellence of workloads using actionable recommendations, thus improving the quality of the workloads
Postpone recommendations if you don’t want to act immediately

Advisor has a score based on the number of actionable recommendations; this score is called Advisor Score. If the score is lower than 100%, that means there are recommendations, and we need to remediate them to improve the score. As you can see in Figure 1.3, the Advisor Score total for the environment is 81%, and the Score by category values are on the right side.

The good thing about Azure Advisor is that recommendations will be generated as soon as you start using the subscription. You don’t have to deploy any agents, make any additional configurations, or pay to use the Advisor service. The recommendations are generated with the help of machine learning (ML) algorithms based on usage, and they will also be refreshed periodically. Advisor can be accessed from the Azure portal, and it has a rich REST API if you prefer to retrieve the recommendations programmatically and build your own dashboard.

In the coming chapters, we will be relying a lot on Azure Advisor for collecting recommendations for each of the pillars.

Now that we have covered the second element of the WAF, let’s move on to the next one.

Documentation

Microsoft’s documentation has done an excellent job of helping people who are new to Azure. All documentation related to the WAF is documented at https://docs.microsoft.com/en-us/azure/architecture/framework/. As a matter of fact, this book is a demystified version of this documentation with additional examples and real-world scenarios.

As with all documentation, the WAF documentation is lengthy and refined, but for a beginner, the amount of information in the documentation can be overwhelming. This book distills the key insights and essentials from the documentation, providing you with everything you need to get started. The following screenshot shows the documentation for the framework:

Figure 1.4 – WAF documentation

As you can see in the preceding screenshot, the contents are organized according to the pillars, and finally, the documentation is concluded with steps to implement the recommendations. You could call this the Holy Bible of WAF. Everything related to the WAF is found in this documentation and we would strongly recommend bookmarking the link to stay updated.

All documentation for Azure is available at https://docs.microsoft.com/en-us/azure/?product=popular. The documentation covers how to get started, the CAF, and the WAF, and includes learning modules and product manuals for every Azure service. Apart from the documentation, this site offers sample code, tutorials, and more. Regardless of the language you write your code in, Azure documentation provides SDK guides for Python, .NET, JavaScript, Java, and Go. On top of that, documentation is also available for scripting languages such as PowerShell, the Azure CLI, and infrastructure as code (IaC) solutions such as Bicep, ARM templates, and Terraform.

Partners, support, and service offers

Deploying complex solutions by adhering to the best practices can be challenging for new customers. This is where we can rely on Microsoft partners. The Microsoft Partner Network (MPN) is massive, and you can leverage Azure partners for technical assistance and support to empower your organization. You can find Azure partners and Azure Expert Managed Service Providers (MSPs) at https://azure.microsoft.com/en-us/partners/. MSPs can aid with automation, cloud operations, and service optimization. You can also seek assistance for migration, deployment, and consultation. Based on the service you are working with and the region you belong to, you can find a partner with the required skills closer to you.

Once the partner deploys the solution, there will be break-fix issues that you need assistance with. Microsoft Support can help you with any break-fix scenarios. For example, if one of your VMs is unavailable or a storage account is inaccessible, you can open a support request. Billing and subscription support is free of cost and does not require you to purchase any support plans. However, for technical assistance, you need to purchase a support plan. A quick comparison of these plans is shown in the following table:

	Basic	Developer	Standard	ProDirect
Price	Free	$29/month	$100/month	$1,000/month
Scope	All Azure customers	Trial and non-production environments	Production workloads	Mission-critical workloads
Billing support	Yes	Yes	Yes	Yes
Number of support requests	Unlimited	Unlimited	Unlimited	Unlimited
Technical support	No	Yes	Yes	Yes
24/7 support	N/A	During business hours via email only	Yes (email/phone)	Yes (email/phone)

Table 1.1 – Comparison of Azure support plans

A full comparison is available at https://azure.microsoft.com/en-us/support/plans/. Basic support can only open Severity C cases with Microsoft Support. In order to open Severity B or Severity A cases, you must have a Standard or ProDirect plan. Severity C has an SLA of 8 business hours and is recommended for issues with minimal business impact, while Severity B is for moderate impact with an SLA of 4 hours. If the case opened is a Severity A case, then the SLA is 1 hour. Severity A is reserved for critical business impact issues where production is down. Having a ProDirect plan offers extra perks to customers, such as training, a dedicated ProDirect manager, and operations support. The ProDirect plan also has a Support API that customers can use to create support cases programmatically. For example, if a VM is down, by combining the power of Azure alerts and action groups, we can make a call to the Support API to create a request automatically.

In addition to these plans, there is a Unified/Premier contract that is above the ProDirect plan and is ideal for customers who want to cover Azure, Microsoft 365, and Dynamics 365. Microsoft support is available in English, Spanish, French, German, Italian, Portuguese, traditional Chinese, Korean, and Japanese to support global customers. Keep in mind that the plans cannot be transferred from one customer to another. Based on your requirement, you can purchase a plan and you will be charged every month.

Service offers deal with different subscription types for customers. There are different types of Azure subscriptions having different billing models. A complete list of available offers is listed at https://azure.microsoft.com/en-in/support/legal/offer-details/. When it comes to organizations, the most common options are Enterprise Agreement (EA), Cloud Solution Provider (CSP), and Pay-As-You-Go; these are commercial subscriptions. Organizations deploy their workloads in these subscriptions, and they will be charged based on consumption. How they get charged depends solely on the offer type. For example, EA customers make an upfront payment and utilize the credits for Azure; any charges above the credit limit will be invoiced as an overage. Both Pay-As-You-Go and CSP will get monthly invoices. In CSP, an invoice will be generated by the partner; however, in Pay-As-You-Go, the invoice comes directly from Microsoft.

There are other types of subscriptions used for development, testing, and learning purposes, such as Visual Studio subscriptions, Azure Pass, Azure for Students, the Free Trial, and so on. However, these are credit-based subscriptions, and they are not backed up by the SLAs. Hence, these cannot be used for hosting production workloads.

The next element we are going to cover is reference architecture.

Reference architecture

If you know coding, you might have come across a scenario where you are not able to resolve a code error and you find the solution from Stack Overflow or some other forum. Reference architecture serves the same purpose, whereby Microsoft provides guidance on how the architecture should be implemented. With the help of reference architecture, we can design scalable, secure, reliable, and optimized applications by taking a defined methodology.

Reference architecture is part of the application architecture fundamentals. The application architecture fundamentals comprise a series of steps where we will decide on the architecture style, technology, architecture, and—finally—alignment with the WAF. This will be used for developing the architecture, design, and implementation. The following diagram shows the series of steps:

Figure 1.5 – Application architecture fundamentals

In the preceding diagram, you can see that the first choice is the architectural style, and this is the most fundamental thing we must decide on. For example, we could take a three-tier application approach or go for microservices architecture.

Once that’s decided, then the next decision is about the services involved. Let’s say your application is a three-tier application and has a web frontend. This frontend can be deployed in Azure Virtual Machines, Azure App Service, Azure Container Instances, or even Azure Kubernetes Service (AKS). Similarly, for the data store, we can decide whether we need to go for a relational or non-relational database. Based on your requirements, you can select from a variety of database services offered by Microsoft Azure. Likewise, we can also choose the service that will host the mid-tier.

After selecting the technology, we need to choose the application architecture. This is the stage at which we decide how the architecture is going to be in the following stages and select the style and services we are going to use. Microsoft has several design principles and reference architectures that can be leveraged in this stage. We will cover the design principles in the next section.

The reference architectures can be accessed from https://docs.microsoft.com/en-us/azure/architecture/browse/?filter=reference-architecture, and this is a good starting point to begin with the architecture for your solution. You might get an exact match as per your requirement; nevertheless, we can tweak these architectures as required. Since these architectures are developed by Microsoft by keeping the WAF pillars in mind, you can deploy with confidence as these solutions are scalable, secure, and reliable. The following screenshot shows the portal for viewing reference architectures:

Figure 1.6 – Browsing reference architectures

The portal offers filtering on the type of product and categories. From hundreds of reference diagrams, you can filter and find the one that matches your requirements. For example, a simple search for 3d video rendering returns two reference architectures, as shown in the following screenshot:

Figure 1.7 – Filtering reference architectures

Clicking on the reference architecture takes you to a complete explanation of the architecture components, data flow, potential use cases, considerations, and best practices aligned with the WAF. The best part is you will have the Deploy to Azure button, which lets you directly deploy the solution to Azure. The advantage is the architecture is already aligned with the WAF and you don’t have to spend time assessing the solution again.

With that, let’s move on to the last element of the WAF—design principles.

Design principles

In Figure 1.5, we saw that reference diagrams and design principles are part of the third stage of application architecture fundamentals. In the previous section, we saw how we can use the reference architecture, and now we will see how to leverage the design principles. There are 11 design principles you should incorporate into your design discussions. Let’s understand each of the design principles.

Design for self-healing

As with on-premises, failures can happen in the cloud as well. We need to acknowledge this fact; the cloud is not a silver bullet for all the issues that you faced on-premises but does offer massive advantages compared to on-premises infrastructure. The bottom line is failures can happen, hardware can fail, and network outages can happen. While designing our mission-critical workloads, we need to anticipate this failure and design for healing. We can take a three-branched approach to tackle the failure:

Track and detect failures
Respond to failures using monitoring systems
Log and monitor failures to build insights and telemetry

The way you want to respond to failures will entirely depend on your services and the availability requirements. For example, you have a database and would like to failover to a secondary region during the primary region failover. Setting up this replication will sync your data to a secondary region and failover whenever the primary region fails to serve the application. Keep in mind that replicating data to another region can be more expensive than having a database with a single region.

Regional outages are generally uncommon, but while designing for healing, you should also consider this scenario. Your focus should be on handling hardware failures, network outages, and so on because they are very common and can affect the uptime of your application. There are recommendations provided by Microsoft on how to design for healing—these are called design patterns. The recommended patterns are presented here:

Circuit breaker
Bulkhead
Load leveling
Failover
Retry

As mentioned at the beginning of this chapter, design patterns are not within the scope of this book. Again, thanks to Microsoft, all patterns are listed at https://docs.microsoft.com/en-us/azure/architecture/patterns/. Let’s move on to the next design principle.

Make all things redundant

SPOFs in architecture can be eliminated by having redundancy. Earlier, we discussed RAID storage in the Reliability subsection of the What are the pillars of the WAF? section, where multiple disks are used to improve data redundancy. Azure has different redundancy options based on the service that you are using. Here are some of the recommendations:

Understand the business requirements: Redundancy is directly proportional to complexity and cost, and not every solution requires you to set up redundancy. If your business demands a higher level of redundancy, be prepared for the cost implications and complexity, and the demand should be justifiable. If not, you will end up with a higher cost than you budgeted for.
Use a load balancer: A single VM is a SPOF and is not recommended for hosting mission-critical workloads. Instead, you need to deploy multiple VMs and place them behind a load balancer. On top of that, you can consider deploying the VMs across multiple availability zones for improved SLAs and availability. Once the VMs are behind the load balancer, with the help of health probes we can verify if the VM is available or not before routing the user request to the backend VM.
Database replication: PaaS solutions such as Azure SQL Database and Cosmos DB have out-of-the-box replication within the same region. In addition to that, you can replicate the data to another region with the help of the geo-replication feature. If the primary region goes down, the database can failover to the secondary region for any read or write requests.
Database partitioning: With the help of database partitioning, we can improve the scalability as well as the availability of the data. If one shard goes down, only a subset of total transactions will be affected; meanwhile, other shards are still reachable.
Multi-region deployment: Regional outages are uncommon; however, we need to account for regional failure as well based on the application requirements. Deploying the infrastructure to multiple regions can help in improving application availability during regional outages. With the help of Azure Traffic Manager and its priority routing, we can failover to the secondary region if the health probe fails.
Coordinate failover: As we discussed in the previous point, we can failover the frontend using Azure Traffic Manager; however, we need to make sure that the database transactions are synchronized to the secondary region and are ready to failover. We need to make sure that when the frontend fails over to the secondary region, the database failover is also coordinated. Depending on the data store that you are using, the failover process may vary.
Plan for manual failback: With the help of Traffic Manager, we can perform automatic failover using health probes, but don’t opt for automatic failback. When the primary region recovers from an outage, not all services need to be up and running. For example, let’s say the frontend service in the primary region is back online; however, the database is still in recovery. Automatic failback will check if the frontend is up and starts the failback, but the database is not recovered yet. Hence, it’s recommended to go with manual failback so that we can verify whether all services are back online and for data consistency to resolve any database conflicts.
Plan redundancy for Traffic Manager: We rely on Azure Traffic Manager for routing traffic in case of regional failure; having said that, the Traffic Manager service can also face downtime. Make sure that you review the SLA of the Traffic Manager service, and if you require more redundancy, consider adding other traffic management solutions as a contingency plan. In case of Traffic Manager failure, we can route the request to the other traffic management solution by repointing our DNS records.

With that, let’s learn about the next design principle—minimize coordination.

Minimize coordination

This principle applies to Storage, SQL Database, and Cosmos DB where we diminish the coordination between application services to accomplish scalability. The key concepts of this design principle are mostly aligned with some data concepts that are not in the scope of this book. The following are recommendations provided by Microsoft for this design principle:

Consider using the Compensating Transaction pattern
Use domain events to synchronize state
Use Command and Query Responsibility Segregation (CQRS) and event-sourcing patterns
Partition data
Design idempotent operations
Consider using async parallel processing
Use parallel distributed algorithms
Improve coordination using leader election

An in-depth explanation of these recommendations is available at https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/minimize-coordination.

Design to scale out

In on-premises, one of the main issues is the capacity constraint. Traditional data centers had capacity issues, and when it comes to the cloud, the advantage is that it offers elastic scaling. In simpler terms, we can provision workloads as required without the need to pre-provision or buy capacity. Talking of scaling, we have two types of scaling, as follows:

Vertical scaling: Changing the CPU, memory, and other specifications of the resource; this is more of a resizing operation. This type of scaling cause the service to reboot. Increasing the size is called scaling up, and reducing the size is called scaling down.
Horizontal scaling: This is where autoscaling comes into context. In horizontal scaling, the number of instances is increased or decreased based on the demand. As there is no change to the initial instance, rebooting is not required, and this process of increasing or decreasing can be automated. Increasing the number of instances is called scaling out, and decreasing the number of instances is called scaling in.

Now that we know the types of scaling, as the name suggests, we need to design for scaling out so that the instances are automatically increased based on the demand. The following recommendations are provided for this design principle:

Disable session affinity: Load balancers have a feature where we can enable session stickiness or session affinity. If we enable this feature, requests from the same client are routed to the same backend server. If there is heavy traffic from a user, the load will not be distributed due to the stickiness, and a single server needs to handle that. Hence, consider avoiding session affinity.
Find performance bottlenecks: Scaling out is not a silver bullet for all performance issues; sometimes, performance bottlenecks are due to the application code itself. Adding more servers won’t solve these problems, so you should consider debugging or optimizing the code. Secondly, if there is a database performance issue, adding more frontend servers won’t help. You need to troubleshoot the database and understand the issue before choosing to scale out.
Identify scaling requirements: As mentioned in the previous point, different parts or tiers of your application require different scaling requirements. For example, the way the frontend needs to be scaled is not the same way as a database scales. Identify the requirements and set up scaling as required for each application component.
Offload heavy tasks: Consider moving tasks that require a lot of CPU or I/O to background jobs where possible. By doing this, the servers that are taking care of user requests will not be overkilled.
Use native scaling features: Autoscaling is supported by most Azure compute resources. The scaling can be triggered with the help of metrics or based on a schedule. It’s recommended that you set up autoscaling using metrics (CPU, memory, network, and so on) if the load is unpredictable. On the other hand, if the load is predictable, you can set up the scaling based on a schedule.
Scale aggressively for mission-critical workloads: Set up autoscaling aggressively for mission-critical workloads as we need to add more instances quickly due to the increased demand. It’s recommended that you start the scaling bit earlier than the tipping point to stay ahead of the demand.
Design for scaling in: Just as we scale out, we should design for scaling in. While scaling out, we are increasing the number of instances based on demand; once the demand is gone, we need to deallocate the extra instances that are added during the scaling event. If we don’t set up scale-in, the additional instances will keep on running and will incur additional charges.

Now that you are familiar with the scale-out design, let’s shift the focus to the next item on the list.

Partition around limits

In Azure, we have limits for each resource. Some of the limits are hard limits, while others are soft limits. If the limit is a soft limit, we can reach out to Microsoft Support and increase the limit as required. When it comes to scaling, there is also a limit imposed by Microsoft for every resource. If your system is growing tremendously, you will eventually reach the upper limit of the resource. These limits include the number of compute cores, database size, storage throughput, query throughput, network throughput, and so on. In order to efficiently overcome the limits, we need to use partitioning. Earlier, we discussed how we can use data partitioning to improve the scalability and availability of data. Similarly, we can use partitioning to work around resource limits.

There are numerous reasons a system can be partitioned to avoid limits, such as the following:

To avoid limits on database size, number of concurrent sessions, or data I/O of databases
To avoid limits on the number of messages or the number of concurrent connections of a storage queue or message bus
To avoid limits on the number of instances supported on an App Service plan

In the case of databases, we can partition vertically, horizontally, or functionally. Just to give you an idea, let’s have a closer look at this:

In vertical partitioning, frequently accessed fields are stored in one partition, while less frequent ones are in a different partition. For example, customer names are stored in one partition that is frequently accessed by the application while their emails are stored in a different partition as they are not frequently accessed.
Horizontal scaling is basically sharding where each partition holds data for a subset of the total data. For example, the names of all cities starting with A-N are stored in one partition, while those starting with O-Z are stored in another partition.
As the name suggests, functional partitioning is where the data is partitioned based on the context or type of data. For example, one partition stores the stock-keeping unit (SKU) of the products while the other one stores customer information.

A full list of recommendations is available here: https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/partition. The next design principle we are going to cover is design for operations.

Design for operations

With the cloud transformation, the regular IT chorus of managing hardware and data center is long gone. The IT is no longer responsible for the data center management as it will be handled by the cloud provider. Having said that, the IT team or the operations team is still responsible for deploying, managing, and administering the resources deployed in the cloud. Some key areas that the operations team should handle include the following:

Deployment: The provisioning of resources is considered deployment, and this is one of the key responsibilities of the operations team. It’s recommended that you use an IaC solution for the deployment of services. Using these tools will help reduce human error and makes replication of the environment easy, as templates are reusable and repeatable.
Monitoring: Once the solution is deployed, it’s very important that the operations team monitor the solution for any failures, performance bottlenecks, and availability. Having a monitoring system can detect anomalies and notify administrators before they turn into bigger problems. The operations team needs to set up logging and collection to collect logs from all services. The collected logs need to be stored for insights and analysis.
Incident response: As mentioned earlier, we need to acknowledge the fact that failures can happen in the cloud, and if it’s a platform issue, the operations team needs to raise a ticket with Microsoft Support. Internally, the operations team can use an IT service management (ITSM) solution to create incidents and assign them to different teams for resolution or investigation.
Escalation: If the initial analysis is not yielding any results, there should be processes in place to escalate the issue to the stakeholders and find a resolution. The operations team can have different tiers within the organization that handle different issues; further, they can collaborate with Microsoft Support for issues that require engineering intervention and bug fixes.
Security auditing: Auditing is very important to make sure that the environment is secure. With the help of security information and event management (SIEM) solutions, we can collect data from different data sources and analyze them. The operations team can collaborate with external auditors if they lack the necessary skills to perform security auditing. For example, consider using Azure Defender for Cloud and action recommendations. In addition to that, we can use Sentinel to collect data from different sources for analysis and investigation.

A list of recommendations shared by Microsoft can be reviewed at https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/design-for-operations. With that, we will move on to the next design principle.

Use PaaS services

Unlike on-premises, the cloud offers different service models such as IaaS, PaaS, and Software-as-a-Service (SaaS). Here, we will discuss IaaS and PaaS as SaaS is more of a solution where the end customer doesn’t manage the code and is managed by the cloud provider.

In IaaS, the cloud provider takes care of the infrastructure (physical servers, network, storage, hypervisor, and so on) and the customer can create a VM on top of this hardware. Microsoft is not responsible for maintaining the VM OS; it will be the duty of the customer to update, patch, and maintain the OS and code of the application. In contrast, in PaaS, the cloud provider provides a hosting environment where the infrastructure, OS, and framework are managed by Microsoft. The only thing that the customer needs to do is push their code to the PaaS service, and it’s up and running. Developers can be more productive and write their code without the need to worry about the underlying hardware or its maintenance.

The design principle recommends using PaaS services instead of IaaS whenever possible. IaaS is only recommended if you require more control over the infrastructure, but if you simply require a reliable environment and ease of management, then PaaS is right for you. Table 1.2 shows some of the IaaS replacements for popular caches, queues, databases, and web solutions in Azure:

Instead of running (IaaS)	Consider deploying (PaaS)
Active Directory	Azure AD
RabbitMQ	Azure Service Bus
SQL Server	SQL Database
Hadoop	Azure HDInsight
PostgreSQL/MySQL	Azure Database for PostgreSQL/Azure Database for MySQL
IIS/Apache/NGINX	Azure App Service
MongoDB/Cassandra/Gremlin	Cosmos DB
Redis	Azure Cache for Redis
File Share	Azure File Share/Azure NetApp Files
Elasticsearch	Azure Cognitive Search

Table 1.2 – IaaS-to-PaaS considerations

This is not a complete list; there are different ways by which you can replace VMs (IaaS) with platform-managed services. Speaking of services, let’s discuss identity services, which are the subject of the next design principle.

Use a platform-managed identity solution

This is often considered a subsection of the previous design principle; however, there are some additional key points that we need to cover as part of the identity solution. Every cloud application needs to have user identities. Due to this reason, Microsoft recommends using an Identity-as-a-Service (IDaaS) solution rather than developing your own identity solution. In Azure, we can use Azure AD or Azure AD B2C as an identity solution for managing users, groups, and authentication.

The following recommendations are shared by Microsoft for this design principle:

If you are planning to use your own identity solution, you must have a database to store the credentials. While storing the credentials, you need to make sure that they are not stored in clear text. In fact, you should consider encrypted data as well. A better option is to perform cryptographic hashing and then salting before persisting the data in the database. The advantage is that even if the database is configured, the data is not easily retrievable. In the past few years, databases storing credentials have been targets for attack, and no matter how strong your hashing algorithm is, maintaining your own database is always a liability. To mitigate this, you can use an IDaaS, where the credential management is done by the provider in a secure manner. In other words, it’s the responsibility of the IDaaS provider to maintain the database and secure it. You might be wondering how safe is to outsource the credentials to another provider. The short answer is they have invested time and resources to build the IDaaS platform; if something happens, they are responsible for that.
Use modern authentication and authorization protocols. When we design applications, use OAuth2, SAML, OpenID Connect (OIDC), and so on. Don’t go for legacy methods, which are prone to attacks such as SQL injection. Modern IDaaS systems such as Azure AD use these modern protocols for authentication and authorization.
IDaaS offers a plethora of additional security features compared with traditional home-grown identity systems. For example, Azure AD offers passwordless login, single sign-on (SSO), multi-factor authentication (MFA), conditional access (CA), just-in-time (JIT) access, privileged identity management (PIM), identity governance, access reviews, and so on. It’s going to be a very complex, time-consuming, and resource-consuming task if you are planning to include these features in your own identity system. Above all, the maintenance required for these add-ons is going to be high. If we are using an IDaaS solution, these are provided out of the box.
The reliability and performance of the identity solution are also a challenge when opting for your own identity solution. What if the infrastructure hosting your identity solution goes down? How much concurrent sign-in and token issuance can happen simultaneously? These questions need to be addressed as they point to the reliability and performance of the identity solution. Azure AD offers SLAs for Basic and Premium tiers, which include both sign-on and token issuance. Microsoft will make sure that uptime is maintained, but in the case of home-grown identity solutions, you must set up redundant infrastructure for keeping the uptime high. Setting up redundant infrastructure is expensive and hard to maintain. Speaking of performance, Azure AD can handle millions of authentication requests without fail. Unlike your own identity solution, IDaaS is designed to withstand enormous volumes of traffic.
Attacks are evolving and they are getting more sophisticated, so you need to ensure that your identity solution is also evolving and can resist these attacks. Periodic penetration testing, vetting of employees and vendors with access to the system, and tight control need to be implemented. This process is going to be expensive and time-consuming. In the case of Azure AD, Microsoft conducts periodic penetration testing by both internal and external security professionals. These reports are available publicly. If required, you can raise a request for performing penetration testing on your Azure AD tenant.
Make complete use of features offered by the identity provider (IdP). These features are designed to protect your identities and applications. Instead of developing your own features, rely on native features, which are easy to set up and configure.

With that, we will discuss the next design principle.

Use the best data store for your application

Most organizations use relational SQL databases for persisting applications. These databases for good for transactions that contain relational data. Keep the following considerations in mind if your preferred option is a relational database:

Expensive joins are required for queries
Data normalization and restructuring are required for schema on write
Performance can be affected due to lock contention

The recommendation is not to use a relational database for every scenario. There are other alternatives, such as the following:

Key/value stores
Document databases
Search engine databases
Time-series databases
Column-family databases
Graph databases

Choose one based on the type of data that your application handles. For example, if your application handles rain-sensor data, which is basically a time series, then you should go for a time-series database rather than using a relational database. Similarly, if you want to have a product catalog for your e-commerce application, each product will have its own specification. The specifications of a smartphone include brand, processor, memory, and storage, while the specifications of a hair dryer are completely different. Here, we need to store the details of each product as a document, and these will be retrieved when the user clicks on the item. For these kinds of scenarios, you should use a document database. In Azure, this type of product catalog can be stored in Azure Cosmos DB.

To conclude, a relational database is not meant for every scenario; consider using alternatives depending on the data that your application wants to store.

We have two more design principles to be covered before we wrap up, so let’s move on to the next one.

Design for evolution

According to Charles Darwin’s theory of evolution, species change over time, give rise to new species, and share a common ancestor. The theory also looks at natural selection, which causes the population to adapt or get accustomed to the environment. Keeping this theory in mind, when you design applications, design for evolution. This design principle talks about the transformation from a monolithic to a microservices architecture. This transformation is more of an evolution to eliminate tight coupling between application components, which makes the system more inflexible and weaker.

Microservices architecture decouples the application components, and they are loosely coupled. If they are closely packed, the changes in one component will create repercussions in another one. This makes it very difficult to launch new changes into the system. To avoid this, we can consider a microservices architecture, where we can issue changes to the system without affecting other services.

A list of recommendations for this design principle is available at https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/design-for-evolution.

Now, we are going to discuss the last design principle. Let’s dive right in!

Build for business needs

All the principles we discussed so far are driven by a common factor: business requirements. For example, when we discussed the Make all things redundant design principle, we explored different recommendations for setting up redundant infrastructure. But what if the workload that I have is a proof of concept (POC) or development workload? Do I need to have redundant VMs for a development workload? As you can imagine, development workloads don’t require redundant VMs unless this is demanded by the key factor—business requirements. It might seem apparent, but everything boils down to business requirements.

Leverage the following recommendations to build solutions to meet business needs:

Define business objectives that include certain metrics to reflect the characteristics of your architecture. These numbers include recovery time objective (RTO), recovery point objective (RPO), and maximum tolerable outage (MTO). For instance, a low RTO business requirement needs quick failover automatically to the DR region. On the other hand, you don’t have to set up higher redundancy if the business requirement has a higher RTO.
Define SLAs and service-level objectives (SLOs) for your application; this will help in choosing the right architecture. For example, if the SLA requirement is 99.9%, we can go for a single VM; however, if the requirement is 99.95%, then you must deploy two VMs in an availability zone.
Leverage domain-driven design (DDD), whereby we model the application based on the use cases.
Differentiate workloads based on the requirements for scalability, availability, data consistency, and DR. This will help you plan the strategy for each workload efficiently.
Plan for growth; as your business grows, your user base and traffic will grow. You need to make sure your application also evolves to handle the new users and traffic. As we discussed in the Design for evolution section, think about decoupling your application components so that your application changes can be easily introduced without disrupting other dependencies.
On-premises, the cost is paid upfront for hardware, and it’s a capital expenditure. With the cloud, on the other hand, there’s an operational expenditure, which means you pay for the resources that you consume. Here, we need a shift in mindset because with on-premises, even if you let your VM run for 60 days, there is no additional cost as the hardware cost is paid upfront; the only cost is for electricity and maintenance. But in the cloud, you will be paying for the entire 60 days for which the VM was running. To conclude, delete resources you no longer need to avoid incurring more additional costs than expected.

That was the last design principle, and it’s a wrap-up. We have finally completed the elements of the WAF.

Summary

In this chapter, we started with an introduction to the WAF, and we discussed the five pillars of the WAF. The pillars are cost optimization, operational excellence, performance efficiency, reliability, and security. We briefly covered the concepts and principles of these pillars. Adopting the best practices and recommendations provided by these pillars of the WAF will help you to improve the quality of your Azure workloads.

Then, we discussed the elements of the WAF; recommendations and best practices of the WAF are derived from these elements. In simple words, elements act as the data source for the WAF. There are six elements of the WAF: Azure Well-Architected Review, Azure Advisor, documentation, partners, support, and service offers, reference architecture, and design principles. Understanding these elements will help you learn the best practices that are used to build the WAF. Design patterns and some recommendations for design principles are not included in this chapter as they are out of the scope of the book; nevertheless, you can always refer to the shared links to learn more.

As mentioned in the introduction of this chapter, there are multiple frameworks for the cloud. In the next chapter, we will understand the difference between the CAF and WAF. Readers often tend to get confused between these frameworks, so let’s take deep dive into the CAF versus the WAF.

Optimizing Microsoft Azure Workloads

Planning Workloads with the Well-Architected Framework

Why the WAF?

What are the pillars of the WAF?

Cost optimization

Operational excellence

Performance efficiency

Reliability

Security

Exploring the elements of the WAF

Azure Well-Architected Review

Azure Advisor

Documentation

Partners, support, and service offers

Reference architecture

Design principles

Design for self-healing

Make all things redundant

Minimize coordination

Design to scale out

Partition around limits

Design for operations

Use PaaS services

Use a platform-managed identity solution

Use the best data store for your application

Design for evolution

Build for business needs

Summary

Further reading