2. Azure solution availability, scalability, and monitoring
Architectural concerns, such as high availability and scalability, are some of the highest-priority items for any architect. This is common across many projects and solutions. However, this becomes even more important when deploying applications to the cloud because of the complexity involved. Most of the time, the complexity does not come from the application, but from the choices available in terms of similar resources on the cloud. The other complex issue that arises from the cloud is the constant availability of new features. These new features can almost make an architect's decisions completely redundant in hindsight.
In this chapter, we will look at an architect's perspective in terms of deploying highly available and scalable applications on Azure.
Azure is a mature platform that provides a number of options for implementing high availability and scalability at multiple levels...
High availability
High availability forms one of the core non-functional technical requirements for any business-critical service and its deployment. High availability refers to the feature of a service or application that keeps it operational on a continuous basis; it does so by meeting or surpassing its promised service level agreement (SLA). Users are promised a certain SLA based on the service type. The service should be available for consumption based on its SLA. For example, an SLA can define 99% availability for an application for the entire year. This means that it should be available for consumption by users for 361.35 days. If it fails to remain available for this period, that constitutes a breach of the SLA. Most mission-critical applications define their high-availability SLA as 99.999% for a year. This means the application should be up, running, and available throughout the year, but it can only be down and unavailable for 5.2 hours...
Azure high availability
Achieving high availability and meeting high SLA requirements is tough. Azure provides lots of features that enable high availability for applications, from the host and guest OS to applications using its PaaS. Architects can use these features to get high availability in their applications using configuration instead of building these features from scratch or depending on third-party tools.
In this section, we will look at the features and capabilities provided by Azure to make applications highly available. Before we get into the architectural and configuration details, it is important to understand concepts related to Azure's high availability.
Concepts
The fundamental concepts provided by Azure to attain high availability are as follows:
- Availability sets
- The fault domain
- The update domain
- Availability zones
As you know, it's very important that we design solutions to be highly available...
Architectural considerations for high availability
Azure provides high availability through various means and at various levels. High availability can be at the datacenter level, the region level, or even across Azure. In this section, we will go through some of the architectures for high availability.
High availability within Azure regions
The architecture shown in Figure 2.6 shows a high-availability deployment within a single Azure region. High availability is designed at the individual resource level. In this architecture, there are multiple VMs at each tier connected through either an application gateway or a load balancer, and they are each part of an availability set. Each tier is associated with an availability set. These VMs are placed on separate fault and update domains. While the web servers are connected to application gateways, the rest of the tiers, such as the application and database tiers, have internal load balancers...
Scalability
Running applications and systems that are available to users for consumption is important for architects of any business-critical application. However, there is another equally important application feature that is one of the top priorities for architects, and this is the scalability of the application.
Imagine a situation in which an application is deployed and obtains great performance and availability with a few users, but both availability and performance decrease as the number of users begin to increase. There are times when an application performs well under a normal load, but suffers a drop in performance with an increase in the number of users. This can happen if there is a sudden increase in the number of users and the environment is not built for such a large number of users.
To accommodate such spikes in the number of users, you might provision the hardware and bandwidth for handling spikes. The challenge with this is that the additional...
VM scale sets
VMSSes are Azure compute resources that you can use to deploy and manage a set of identical VMs. With all VMs configured in the same way, scale sets are designed to support true autoscaling, and no pre-provisioning of VMs is required. It helps in provisioning multiple identical VMs that are connected to each other through a virtual network and subnet.
A VMSS consists of multiple VMs, but they are managed at the VMSS level. All VMs are part of this unit and any changes made are applied to the unit, which, in turn, applies it to those VMs that are using a predetermined algorithm:

Figure 2.12: A VM scale set
This enables these VMs to be load balanced using an Azure load balancer or an application gateway. The VMs could be either Windows or Linux VMs. They can run automated scripts using a PowerShell extension and they can be managed centrally using a state configuration. They can be monitored as a unit, or individually...
Upgrades and maintenance
After a VMSS and applications are deployed, they need to be actively maintained. Planned maintenance should be conducted periodically to ensure that both the environment and application are up to date with the latest features, from a security and resilience point of view.
Upgrades can be associated with applications, the guest VM instance, or the image itself. Upgrades can be quite complex because they should happen without affecting the availability, scalability, and performance of environments and applications. To ensure that updates can take place one instance at a time using rolling upgrade methods, it is important that a VMSS supports and provides capabilities for these advanced scenarios.
There is a utility provided by the Azure team to manage updates for VMSSes. It's a Python-based utility that can be downloaded from https://github.com/gbowerman/vmssdashboard. It makes REST API calls to Azure to manage scale...
Monitoring
Monitoring is an important architectural concern that should be part of any solution, big or small, mission-critical or not, cloud-based or not—it should not be neglected.
Monitoring refers to the act of keeping track of solutions and capturing various telemetry information, processing it, identifying the information that qualifies for alerts based on rules, and raising them. Generally, an agent is deployed within the environment and monitors it, sending telemetry information to a centralized server, where the rest of the processing of generating alerts and notifying stakeholders takes place.
Monitoring takes both proactive and reactive actions and measures against a solution. It is also the first step toward auditing a solution. Without the ability to monitor log records, it is difficult to audit a system from various perspectives, such as security, performance, and availability.
Monitoring helps us identify availability, performance, and scalability...
Summary
High availability and scalability are crucially important architectural concerns. Almost every application and every architect try to implement high availability. Azure is a mature platform that understands the need for these architectural concerns in applications and provides resources to implement them at multiple levels. These architectural concerns are not an afterthought, and they should be part of the application development life cycle, starting from the planning phase itself.
Monitoring is an important architectural aspect of any solution. It is also the first step toward being able to audit an application properly. It enables operations to manage a solution, both reactively and proactively. It provides the necessary records for troubleshooting and fixing the issues that might arise from platforms and applications. There are many resources in Azure that are specific to implementing monitoring for Azure, other clouds, and on-premises datacenters. Application...