Reader small image

You're reading from  Microsoft Azure Fundamentals Certification and Beyond - Second Edition

Product typeBook
Published inJan 2024
PublisherPackt
ISBN-139781837630592
Edition2nd Edition
Right arrow
Author (1)
Steve Miles
Steve Miles
author image
Steve Miles

Steve Miles works in a technology leadership role for the cloud practice of a multi-billion turnover IT distributor based in the UK and Ireland. He is a Microsoft Azure MVP (Most Valuable Professional), MCT (Microsoft Certified Trainer) and Microsoft technologies author. Steve has more than 25 years of experience in hosted datacenter services, hybrid, and multi-cloud platforms. In his free time, Steve also can be found tinkering on cars.
Read more about Steve Miles

Right arrow

Cloud Computing Operations Model

Cloud computing is “elastic,” “scalable,” “agile,” “fault-tolerant,” highly “available,” and helps with “disaster recovery.” These operational model characteristics in cloud computing add value and benefit an organization’s operational model.

These inherent and defining characteristics allow a workload deployed into a cloud computing environment to become highly available and scale in and out (both vertically and horizontally), which maps closely to demand. This ability to be elastic in nature allows the agility to provide a highly effective operations and economics model to flex with the changing demands of a business.

By optimizing running hours and right-sizing resources in line with demand and changing requirements, switching to a consumption-based system of paying as you use resources allows monitored spending without the over-commitment of a traditional computing cost model.

Figure 2.3 outlines the computing resources demand model and shows the implications of actual demand against implemented resources based on predicted demand:

Figure 2.3: A graph, with a Y-axis of Created resources and an X-axis of Time, plotting implemented resources against demand

Figure 2.3 – Cloud computing resource demand model

You can also see the traditional computing mindset from the last section in Figure 2.4. This traditional computing mindset means over-provisioning resources to meet predicted demand, leaving many resources underutilized. When actual demand exceeds the predicted demand, no resources are available as there is no burst capacity or scale to meet the demand. To compound things, this demand has dropped off by the time these extra resources have been implemented and are no longer needed.

With the cloud computing mindset, resource utilization can be tracked and right-sized to demand. So, there is never a case of over-provisioning and paying for more resources than are needed.

With this knowledge of the cloud computing operations model, you will look more closely at the characteristics of cloud computing that deliver benefits and value over the traditional computing model.

Operational Benefits of Cloud Computing

This section will look at the operational benefits cloud computing can add to an organization compared to those provided in a traditional computing model. Cloud computing platforms primarily provide the following operational benefits over traditional computing models:

  • Scalability
  • Elasticity
  • Agility
  • High availability (and geo-distribution)
  • Disaster recovery
  • Cost model

These operational benefits may be an inherent built-in platform function that provides features as part of the service, as is typically the case with PaaS or Function as a Service (FaaS) and SaaS. These operational benefits just need to be enabled in some cases, if not automatically included as part of the services.

These operational benefits could also be something that needs to be designed into part of the solution as an individual set of resources that need to be implemented to enable these characteristics.

For example, IaaS virtual machines will not provide scale, elasticity, high availability, and disaster recovery without these being designed into the solution and then implementing resources to provide the functionality to provide each of these characteristics.

The key takeaway is that cloud platform providers will generally provide these functions and characteristics. You may layer on additional functionality as your needs dictate.

Of course, not everything is perfect with the cloud computing model. Here are some challenges that can be overcome but must be considered and provided for:

  • Network dependency—that is, reliability, stability, quality, and performance
  • Confidentiality, Integrity, Availability (CIA) of users, apps, and data
  • Access control and operational governance
  • Cost control

With the operational benefits and challenges considered in this section, it is time to look at the benefits of cloud computing in more detail.

What Is Scalability?

Scalability refers to how to react to and increase resources based on demand, usually in an automated way triggered upon a metric such as a time or resource threshold being reached. The following two concepts are related to the scalability of computing resources:

  • Scaling up (vertical scaling): This means capacity is increased within the resource, such as increasing the processor or memory by resizing a virtual machine; the opposite is “scaling down,” where resource capacity is decreased.
  • Scaling out (horizontal scaling): This means additional resource instances, such as adding other virtual machines or compute node/scale units; the opposite is “scaling in,” where resource instances are de-allocated.

Scalability should not be confused with fault tolerance, which moves a workload automatically to another resource or system when it detects a failure or unhealthy state.

What Is Elasticity?

Elasticity refers to the ability to shape the resources needed automatically, to burst and scale to meet any peak in demand, and to return to a normal operating baseline.

What Is Agility?

Agility means deploying and configuring resources effectively and efficiently in a short space of time to meet any change in requirements or operational needs.

What Is High Availability?

High availability and geo-distribution mean deploying resources to operate within the required or mandated Service-Level Agreement (SLA) for those resources. An SLA sets out an expected level of service that a customer can expect from their service provider. This agreement will set out terms such as availability metrics, service availability, responsibilities, claims, and credit processes, as well as the vocabulary and terminology that will be used to express these aspects of the agreement.

The SLA is a guaranteed measure of uptime, which is the amount of time services are online, available, and operational. The following are the concept of availability in the context of computing and systems:

  • Availability is the percentage of time a resource is available to service requests.
  • Service availability is expressed as the uptime percentage over time, for example, 99.9%.
  • Availability depends on resilient systems, meaning that a system can continue to function after recovering from failures.
  • Increasing availability often results in an increase in costs due to the complexity of the solutions required to deliver the level of availability.
  • Failover is another critical factor in availability. This means one system takes over from another when a resource fails and becomes unavailable and is part of an availability and disaster recovery strategy.

Microsoft defines an SLA as follows:

Microsoft’s commitments to uptime and connectivity, meaning the amount of time the services are online, available, and operational.

Microsoft provides each service with an individual SLA that will detail what is covered by the agreement and any exceptions. For any service that does not meet the guarantees, a percentage of the monthly fees are eligible to be credited; each service has its own defined SLA.

While you see lots of references to availability and uptime when looking at an SLA that will be provided for a service, the customer and consumer of the services will want to know what that means in the real world and what impact any breach may have on them. Therefore, it is often the case that the real metric that matters is downtime, which means for a given SLA, how long is that service permitted to be down (that is, the service is not available from the service provider)? You should scrutinize any SLA to determine whether that level of downtime is acceptable.

The service availability depends on the number of nines (as in the three nines is 99.9% and five nines is 99.999%) of the SLA. Microsoft SLAs are expressed on a monthly basis, so 99.9% would have an allowed service downtime of 43.2 minutes per month.

Table 2.1 illustrates examples of SLA commitments and downtime permitted per month as part of an SLA:

SLA of a Service

Permitted Downtime Per Month

99.9%

43m 28s

99.95%

21m 44s

99.99%

4m 21s

99.999%

26s

Table 2.1 – The SLA for a service indicating the acceptable level of downtime per month

Observe that 99.9% is the minimum SLA that Microsoft provides; 99.999 % is the maximum. It should be noted that 100% cannot be provided by Microsoft.

You should also be aware of the concept of a composite SLA; this means that when you combine services (such as virtual machines and the underlying services such as storage, networking components, and so on), the overall SLA is lower than the individual highest SLA on one of the services. This is because each service that you add increases the probability of failure and increases complexity.

The following actions will “positively” impact and “increase” your SLA:

  • Using services that provide an SLA (or improve the service SLA), such as Entra ID Premium editions and Premium SSD managed disks
  • Adding redundant resources, such as resources to additional/multiple regions
  • Adding availability solutions, such as using availability sets and availability zones

The following actions will “negatively” impact and “decrease” your SLA:

  • Adding multiple services due to the nature of composite SLAs
  • Choosing non-SLA-backed services or free services

The following actions will have no impact on your SLA:

  • Adding multiple tenancies
  • Adding multiple subscriptions
  • Adding multiple admin accounts

The Azure status page (https://packt.link/DdVgV) provides a global overview of the service health across all regions; this should be the first place you visit, should you suspect there is a wider issue affecting the availability of services globally. From the status page, you can click through to Azure Service Health in the Azure portal, which provides a personalized view of the availability of the services that are being used within your Azure subscriptions.

Service credits are paid through a claims process by a service provider when they do meet the guarantees of the agreed service level; each service has its own defined SLA. You should evaluate all your services to ensure that, where required, you always have an SLA-backed service; as they say, there is often an operational impact that’s felt from “free services”.

If you suspect that your services have been affected and that Microsoft has not been able to meet its SLA, then it is your responsibility to take action and pursue credit; you must submit a claim to receive service credit. For most services, you must submit the claim the month after the month the service was impacted. If your services are provided through the Microsoft Cloud Solution Provider (CSP) channel, they will pursue this claim on your behalf and provide the service refunds accordingly.

Note

You can find more details about Microsoft SLAs for Azure services and the composite SLA at https://packt.link/X5G0B and https://packt.link/qzOnf.

What Is Disaster Recovery?

Disaster recovery is based upon a set of practices or measures to ensure that, when a system fails, it can be restored to operation by failing over to a replicated instance in another region.

A “disaster recovery strategy” will be determined by the required Recovery Time Objective (RTO) and Recovery Point Objective (RPO). Replication technologies allow for much shorter RTOs and RPOs that can be achieved with backups. The following are the crucial elements in creating comprehensive disaster recovery plans:

  • RTO: This refers to the maximum duration of acceptable downtime for the system.
  • RPO: This refers to how much data loss is acceptable to a system.

This is represented in Figure 2.4:

Figure 2.4: The image depicts a Timeline graphic that depicts the concepts and contrasts between RPO and RTO

Figure 2.4 – RTO and RPO

Having grasped the operational benefits, read on to compare disaster recovery to high availability and backup concepts.

Comparing Disaster Recovery, High Availability, and Backup

High availability and disaster recovery can be classified as system protection, whereas backup can be classified as data protection. The following concepts help in building robust and resilient systems in cloud computing environments like Microsoft Azure:

  • High availability: When systems fail and are not available, you can run a second instance in the same Azure region.
  • Disaster recovery: When systems fail and are not available, you can run a second instance in another Azure region.
  • Backup: When data is corrupted, deleted, lost, or irretrievable (perhaps due to ransomware), you can restore the instance from another copy of the system.

Figure 2.5 outlines the three preceding points of high availability, disaster recovery, and backup:

Figure 2.5: The image shows a Multi-part shape that compares and contrasts the concepts and solutions for protecting systems and protecting data

Figure 2.5 – Comparing backup, high availability, and disaster recovery

High availability, disaster recovery, and backup should not be an “either-or” decision in a strategy for business continuity; any strategy should include “all three” as they serve different purposes.

Fault tolerance is a means of providing high availability in systems. It is similar to Auto Scale, in which workloads can be moved from one system to another. The trigger for fault tolerance is a health check on a failed system, as opposed to a system under load from demand.

Challenges of Implementing Business Continuity

Cost, complexity, and compliance are the biggest challenges for business continuity. These challenges result in systems that are often not covered by disaster recovery or protected by backups, which challenges your ability to comply with any regulatory or internal mandatory policy.

While you may be familiar with the traditional causes of a disaster or business disruption, a threat to business operations can also come from a “global pandemic.” Mitigation and planning for a pandemic have not often been included in a disaster recovery or business continuity strategy.

While not a disaster or outage, a “pandemic” certainly causes a significant business disruption that almost nobody can foresee. It is reasonable to say that those who had already adopted some form of cloud services and a remote working strategy before the COVID-19 pandemic were probably better prepared than others.

Figure 2.6 shows that when you adopt a cloud computing model, your cost model changes; you may have reduced complexity, and your compliance levels may increase:

Figure 2.6: Three-part shape with three icons that depict the three challenges to implementing business continuity as the cloud is adopted. They are  Cost, Complexity, and Compliance

Figure 2.6 – Challenges to implementing business continuity

Adopting a cloud strategy utilizing Microsoft Azure can address many of these challenges. The challenges are often centered around costs, and the benefit and driver can be the changing cost model that can be provided by the cloud.

An additional benefit is that there is no need to maintain and purchase the resources required for a secondary site. With Microsoft Azure as the secondary site, only what is used is paid for in a consumption-based model.

From the content in this section, you have now learned about the cloud computing operations model, including aspects of the demand model, and operational benefits, as well as comparing disaster recovery, high availability, and backup. The following section will cover the economics of cloud computing, the consumption model, and the cost-expenditure model.

Previous PageNext Page
You have been reading a chapter from
Microsoft Azure Fundamentals Certification and Beyond - Second Edition
Published in: Jan 2024Publisher: PacktISBN-13: 9781837630592
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Steve Miles

Steve Miles works in a technology leadership role for the cloud practice of a multi-billion turnover IT distributor based in the UK and Ireland. He is a Microsoft Azure MVP (Most Valuable Professional), MCT (Microsoft Certified Trainer) and Microsoft technologies author. Steve has more than 25 years of experience in hosted datacenter services, hybrid, and multi-cloud platforms. In his free time, Steve also can be found tinkering on cars.
Read more about Steve Miles