Chapter 1: Google Cloud Platform Infrastructure
To learn about Google Cloud Platform's infrastructure, you must have a good understanding of what cloud computing is and the cloud service models that are available, such as Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). Moreover, since Google Cloud Platform is a public cloud provider, we will provide a brief explanation of the differences between public, private, and hybrid cloud services.
Google Cloud Platform's physical architecture will be described. We will also specify the regions and zones, as well as the logical architecture that specifies the organizations, folders, projects, and resources.
A deep explanation of what a Google Compute Engine instance is, and how you can use one for your workload, will be provided in the second part of this chapter.
After introducing a few of the Google Cloud Platform (GCP) services, such as Cloud DNS, Cloud CDN, and Cloud Load Balancer, we will provide an overview of the DevOps culture, as applied to Kubernetes and the Google Cloud implementation of Kubernetes – Google Kubernetes Engine (GKE).
In this chapter, we are going to cover the following main topics:
- Introducing cloud computing and virtualization
- Introducing GCP
- Getting started with GCP
- Understanding virtual machines in the cloud
- Exploring containers in the cloud
Introducing cloud computing and virtualization
This section introduces the concepts of cloud computing and virtualization, which are fundamental to understanding how GCP works. We will go through the basic elements of cloud computing and virtualization that are required to dive into the chapter.
What is cloud computing?
Whether you are a fresh entry to the cloud or not, we can consider cloud computing a model that enables ubiquitous, on-demand network access to a shared pool of configurable computing resources. These resources can be servers, storage, networks, applications, and services. A great advantage that cloud computing brings to users is that you can rapidly provision and de-provision computing resources with minimal management effort.
Cloud computing models can be oriented to private customers like you or to enterprises or public organizations. Many popular internet services have been introduced over the years. Think about Dropbox, Google Photos, Apple iCloud, and so on, which let you store your files or images in a private space that can be accessed anywhere, anytime. Additionally, Amazon Web Services, Microsoft Azure, and Google Cloud brought services to the market cloud to help enterprises and organizations scale their IT infrastructures and applications globally.
The cloud computing model is based on several important pillars:
- Data center: This refers to a large building with independent power and cooling systems that hosts a large number of servers, storage, and networking devices.
- Virtualization: This is an enabling technology that allows physical resources to be shared across multiple users privately.
- Programmability: Every cloud resource (compute, storage, network, and so on) is software-driven. This means that there is no human interaction to request, deploy, or release a resource-enabling self-service model.
- Global network: This refers to the global private physical network that interconnects all the data centers around the world.
Consumers can rent these services from cloud providers on-demand in a self-service manner. This model allows cloud users to pay only for the resources they reserve and consume, thus reducing Capital Expenditure (CAPEX) and time-to-market.
More specifically, cloud computing is built on five fundamental attributes:
- On-demand self-service: Cloud users can request cloud computing services with a self-service model when they need them. This can be achieved with automated processes without any human interaction.
- Broadband network access: Cloud users can access their resources anytime, anywhere, through a broadband connection. This lets cloud users interact with remote resources as if they were on-premises.
- Resource pooling: Cloud users can access a wide, almost infinite pool of resources without worrying about its size and location.
- Rapid elasticity: Cloud users can rapidly scale their resources elastically based on their actual workload needs. This allows cloud users to increase resource utilization and reduce costs.
- PAYG (Pay As You Go) model: Cloud users only pay for what they reserve or use. This allows them to greatly reduce CAPEX, increase agility, and reduce time-to-market.
There are three distinct kinds of cloud services that a user can choose from:
- Infrastructure as a Service (IaaS): Cloud users can rent the entire IT infrastructure, including virtual machines, storage, network, and the operating system. With this type of service, the user has full access to and control over the virtual infrastructure and is responsible for it. The cloud provider is responsible for the physical architecture and virtualization infrastructure.
- Platform as a Service (PaaS): This type of service is ideal for developers who want an on-demand environment for developing, testing, and delivering applications. Here, developers can quickly deploy their applications without worrying about the underlying infrastructure. There is no need to manage servers, storage, and networking (which is the responsibility of the cloud provider) since the focus is on the application.
- Software as a Service (SaaS): Cloud providers can lease applications to users, who can use them without worrying about managing any software or hardware platforms.
The following diagram shows a comparison between these three cloud services:
Your cloud infrastructure can be deployed in two ways:
- On-premises: This deployment refers to resources that are deployed on a private data center that belong to a single organization.
- On a public cloud: This deployment refers to resources that are deployed in third-party data centers owned by the cloud provider. These resources will be running in a virtual private space in a multi-tenant scenario or sole-tenant scenario (https://cloud.google.com/compute/docs/nodes/sole-tenant-nodes) with dedicated hardware.
It is quite common that cloud users need to interconnect services that are running on-premises and services that have been deployed on the public cloud. Thus, it is particularly important to create hybrid cloud services that span both private and public cloud infrastructure. GCP offers many services to build public cloud infrastructure and interconnect them to those running on-premises.
Now that you have learned what cloud computing is, let's introduce virtualization.
What is virtualization?
Sometimes, in the Information Technology (IT) industry, there is the need to abstract hardware components into software components. Virtualization is the technology that does this. Today, virtualization is used on servers to abstract hardware components (CPU, RAM, and disk) to virtual systems that require them to run. These virtual systems are commonly referred to as virtual machines and the software that abstracts the hardware components is called a hypervisor. By using virtualization, IT administrators can consolidate their physical assets in multiple virtual machines running on one or few physical servers. Hypervisors lets you have multiple virtual machines with different requirements in terms of the hardware and operating system. Moreover, the hypervisor isolates operating systems and their running applications from the underlying physical hardware. They run independently of each other.
The following diagram shows the architecture for virtualization:
As we can see, the hypervisor virtualizes the hardware and provides each operating system with an abstraction of it. The operating systems can only see the virtualized hardware that has been provisioned in the hypervisor. This allows you to maximize the hardware resource utilization and permits you to have different operating systems and their applications on the same physical server.
Virtualization brings several benefits compared to physical devices:
- Partitioning: Virtualization allows you to partition virtual resources (vCPU, vRAM, and vDISK) to give to the virtual machine. This improves physical resource utilization.
- Isolation: Virtual machines are isolated from each other, thus improving security. Moreover, they can run operating systems and applications that can't exist on the same physical server.
- Encapsulation: Virtual machines can be backed up, duplicated, and migrated to other virtualized servers.
Now that we have introduced cloud computing and virtualization, let's introduce GCP.
This section will provide an overview of GCP and its services. Additionally, we will look at the Google Cloud global network infrastructure, which includes regions and zones. Finally, we will describe the concepts of projects, billing, and quotas in GCP.
GCP's global infrastructure – regions and zones
Over the years, Google has invested billions of dollars to build its private network and today can carry 40% of the world's internet traffic every day. The customers who decide to deploy their cloud services on GCP will benefit from the highest throughput and the lowest latency. Google offers connection to their cloud services from over 140 network edge locations (https://cloud.google.com/vpc/docs/edge-locations), as well as via private and public internet exchange locations (https://peeringdb.com/api/net/433). Thanks to Google's edge caching network sites, which are distributed all around the globe (https://cloud.google.com/cdn/docs/locations), latency can be reduced, allowing customers to interact with their cloud services in near real time. In the following diagram, you can see where Google's network has its presence in terms of regions and PoP:
As you can see, GCP data centers are organized into regions and zones around the globe and are interconnected with Google's physical private network. Regions are independent geographic areas that include three or more zones. For example, the
us-central1 region includes the
us-central1-c zones. In GCP projects, there are global resources such as static external IP addresses:
To design a robust and failure-tolerant cloud infrastructure, it is important to deploy resources across zones or even regions. This prevents you from having an infrastructure outage that affects all resources simultaneously. Thus, it is particularly important to know which of the following categories your resources belong to, as follows:
- Zonal resource: This is a resource that is specific to a zone, such as a virtual machine instance.
- Regional resource: This is a resource that is specific to a region and spans over multiple zones, such as a static IP address.
- Global resource: This is a location-independent resource, such as a virtual machine instance image.
Choosing a region and a zone where your resources should be deployed, as well as where data should be stored, is an especially important design task. There are several reasons you should consider this:
- High availability: Distributing your resources across multiple zones and regions will help mitigate outages. Google has designed zones to minimize the risk of correlated failures caused by power, cooling, or networking outages. In the case of a zone outage, it is very easy to migrate to another zone to keep your service running. Similarly, you can mitigate the impact of a region outage by running backup services in another region, as well as using load balancing services.
- Decreased network latency: When latency is a crucial topic in your application, it is very important to choose the zone or region closest to your point of service. For example, if the end users of a service are located mostly in the west part of Europe, your service should be placed in that region or zone.
At the time of writing, there are 24 available regions and 73 zones. Recently, Google announced that five new regions will be available soon in Warsaw (Poland), Melbourne (Australia), Toronto (Canada), Delhi (India), and Doha (Qatar). The full list of available regions can be queried from Cloud Shell, as shown in the following screenshot. Cloud Shell is a ready-to-use command-line interface that's available in GCP that allows the user to interact with all GCP products:
The full list of available zones can also be queried from Cloud Shell, which is available in GCP, as shown in the following screenshot:
Each zone supports several types of CPU platforms between Ivy Bridge, Sandy Bridge, Haswell, Broadwell, Skylake, or Cascade Lake. This is an important aspect to know when you decide to deploy your virtual machine instance in one particular zone. You need to make sure that the zone you choose supports the instance that you are willing to deploy. To find out what CPU platform one zone supports, you can use Cloud Shell, as shown in the following screenshot:
When selecting zones, keep the following tips in mind:
- Communication within and across regions will have different costs: Generally, communication within regions will always be cheaper and faster than communication across different regions.
- Apply multi-zone redundancy to critical systems: To mitigate the effects of unexpected failure on your instances, you should duplicate critical assets in multiple zones and regions.
Now, let's look at projects, billing, and quotas.
Projects, billing, and quotas
When cloud users request a resource or service in GCP, they need to have a project to track resources and quota usage. GCP projects are the basis for enabling and using GCP services. Inside a GCP project, users must enable billing to monitor, maintain, and address the costs of the GCP services running on the project itself.
Moreover, projects are separate compartments, and they are isolated from each other. GCP resources belong to exactly one project and they cannot be shared across projects, except for shared VPC networks, which can be shared with other projects. In addition, GCP projects can have different owners and users with several rights, such as project editor or project viewer. They are managed hierarchically using the Google Cloud resource manager, which will be described shortly.
GCP projects have three identifying attributes that uniquely distinguish them globally. These are as follows:
- Project ID: This is a permanent, unchangeable identifier that is unique across GCP globally. GCP generates one at project creation time but you can choose your unique ID if needed. The project ID is a human-readable string that can be used as a seed for uniquely naming other GCP resources, such as Google Cloud Storage bucket names.
- Project name: This is a nickname that you can assign to your project for your convenience. It does not need to be unique, and it can be changed over time.
- Project number: This is a permanent, unchangeable number that is unique across GCP globally. This number is generated by GCP and it cannot be chosen.
Projects can belong to a GCP organization for business scenarios, or they can exist without an organization. This happens when we have an individual private project. However, you can always migrate to a private project inside a GCP organization.
Projects must belong to a billing account, which is used as a reference for paying for Google Cloud resources. This billing account is linked to a payment profile, which contains payments methods that costs are charged for. As shown in the following diagram, one billing account can have multiple projects assigned:
The cloud billing account is responsible for tracking all the costs that are incurred using the GCP resources for all the projects attached to it. In practice, cloud billing has the following key features:
- Cost reporting: This can monitor, share, and print monthly costs and keep track of the cost trends of your resource spending, as shown in the following screenshot:
- Cost breakdown: This shows how many discounts your base usage cost will benefit from in a month. This is shown as a waterfall chart, starting from the base cost and subtracting discounts progressively until you see the final costs, as shown here:
- Budget and alerts: This is very important for setting budgets for your projects to avoid surprises at the end of the month. Here, you can decide the upper limit for a monthly expense and generate alerts for billing administrators to control costs once the trigger is reached. The following screenshot shows an example of a budget of 100 euros with the actual monthly expenses and three thresholds that trigger emails:
Resources in projects can be limited with quotas. Google Cloud uses two categories of quotas:
- Rate quotas: This limits a certain number of API requests to a GCP resource within a time interval, such as a minute or a day, after which the resource is not available.
- Allocation quotas: This limits the number of GCP resources that are available to the project at any given time. If this limit is reached, the resource must be released so that you can request a new one.
Projects can have different quotas for the same services. This may depend on various aspects; for example, the quota administrator may reduce the quota for certain resources to equalize the number of services among all projects in one organization.
To find out what the quota is for the service you want to use in GCP, you can search for it on the Cloud IAM Quotas page. Here are all the quotas assigned to your project and you can request different quota sizes if needed. As shown in the following screenshot, you can display the actual usage of CPU quotas in all project regions:
In this section, you learned about the physical architecture of GCP. However, to start using it, you must understand how Google architects the resources that are available to users. This will be described in the next section.
Getting started with GCP
In this section, we are going to describe how resources are organized inside GCP and how to interact with them. This is important, especially when the projects and their resources belong to large enterprises. Moreover, this section describes what tools users can use to interact with GCP.
GCP resource hierarchy
The cloud resource hierarchy has two main functions inside GCP:
- To manage a GCP project life cycle hierarchically inside one organization.
- Organization and Identity and Access Management (IAM) policies can be applied for project and resource access control.
The best way to understand the GCP resource hierarchy is to look at it from the bottom up. Resources are grouped into projects, which may belong to a single folder or organization node. Thus, the resource hierarchy consists of four elements, as shown in the following diagram:
Let's understand what each of the four elements is, as follows:
- Organization node: This is the root node for your organization and it centralizes the project's management in a single structure. The organization node is associated with a Google workspace or cloud identity account, which is mandatory.
- Folders: This is an additional grouping method that wraps projects and other folders hierarchically to improve separation and policy administration. You can apply an access control policy to the folder or even delegate rights to all the sub-folders and projects that are included.
- Projects: This is the fundamental grouping method for containing GCP resources and enabling billing. They are isolated from each other.
- Resources: These are GCP services that users can deploy.
With the resource hierarchy, it is easy to apply access control at various levels of your organization. Google uses IAM to assign granular access to a specific Google resource. IAM administrators can control who can do what on which resources. IAM policies can be applied at the organization level, folder level, or project level. Note that with multiple IAM policies applied at various levels, the most effective policy for a resource will be the union between the policy set on the resource itself and the ones inherited from the ancestors.
Interacting with GCP
There are five ways of interacting with GCP:
- Cloud Platform Console: This is a web user interface that allows you to use all GCP resources and services graphically.
- Cloud Shell and Cloud SDK: This is a command-line interface that allows you to use all GCP resources.
- RESTful API: This is an API that can be accessed via RESTful calls and allows you to access and use GCP resources and services.
- API client libraries: These are open libraries that are available in various programming languages and allow you to access GCP resources.
- Infrastructure as Code (IaC): Open source IaC tools such as Terraform or Google Deployment Manager can be used to deploy and manage IaaS and PaaS resources on GCP (https://cloud.google.com/docs/terraform).
The first two operating modes are more appropriate for cloud architects and administrators who prefer to have direct interaction with GCP. The other two are chosen by programmers and developers who build applications that use GCP services. In this book, we will focus more on the Console and Cloud Shell to explain GCP features.
The following screenshot shows the main components of the Console:
Let's explore what's labeled in the preceding screenshot:
- The navigation menu lets you access all the GCP services and resources (1).
- The combo menu lets you select the project you want to work with (2).
- The search bar lets you search for resources and more within the project (3).
- The Cloud Shell button lets you start the Cloud Shell (4).
- The Project Info card lets you control the project settings (5).
- The Resources card lets you monitor the active resources (6).
- The Billing card lets you monitor the cost and its estimation (7).
Cloud Shell is the preferred interaction method for administrators who want to use the command-line interface. Cloud Shell also has a graphical editor that you can use to develop and debug code. The following screenshot shows Cloud Shell:
Cloud Shell Editor is shown in the following screenshot:
Cloud Shell comes with the Cloud SDK preinstalled, which allows administrators to interact with all GCP resources.
bq are the most important SDK tools that you will use to, for instance, manage Compute Engine instances, Cloud Storage, and BigQuery, respectively.
In this section, you learned about the logical architecture of GCP. In the next section, you will understand how virtual machines work in Google Cloud.
Understanding virtual machines in the cloud
In this section, you will learn about Compute Engine in GCP and its major features. This includes the virtual machine types that are available in GCP, disk options, and encryption solutions. Moreover, this section will introduce Virtual Private Cloud and its main characteristics. Finally, we will look at Load Balancing, DNS, and CDN in GCP.
Google Compute Engine
IaaS in GCP is implemented with Compute Engine. Compute Engine allows users to run virtual machines in the cloud. The use cases for Compute Engine are as follows:
- Legacy monolithic applications
- Custom databases
- Microsoft Windows applications
Compute Engine is a regional service where, when you deploy it, you must specify the instance name, the region, and the zone that the instance will run in. Note that the instance must be unique within the zone. GCP allows administrators to deploy Compute Engine VMs with the same name, so long as they stay in different zones. We will discuss this in more detail when we look at internal DNS.
There are four virtual machine family types that you can choose from:
- General-purpose: This category is for running generic workloads such as websites or customized databases.
- Compute-optimized: This category is for running specific heavy CPU workloads such as high-performance computing (HPC) or single-threaded applications.
- Memory-optimized: This category is for running specific heavy in-memory workloads such as large in-memory databases or in-memory analytics applications.
- GPU: This category is for running intensive workloads such as machine learning, graphics applications, or blockchain.
In the general-purpose category, you can choose between four different machine types, as illustrated in the following diagram:
To choose the appropriate machine type for your workload, let's have a look at the following table:
Each of the previous machine types can have different configurations in terms of vCPUs and memory. Here, you can select between predefined and custom machine types. Predefined machine types let you choose a Compute Engine instance that has a predefined amount of vCPUs and RAM. On the other hand, the custom machine type allows you to select the vCPUs and RAM that are required for your workload. You can have additional options for your predefined Compute Engine instance. You can run a virtual machine that shares a core with other users to save money, or you can choose an instance that has a different balance of vCPUs and memory.
We can summarize all the machine type configurations with the following diagram:
Another important aspect of Compute Engine is its boot disk. Each virtual machine instance requires a boot disk to run properly. In the boot disk, the operating system is installed, as well as the main partition. The boot disk is a permanent storage disk and it can be built from several types of images. GCP offers pre-built public images for both Linux and Windows operating systems. Some of them are license free such as CentOS, Ubuntu, and Debian. Others are premium images, and they incur license fees.
Boot disks can be divided into three types:
- Standard persistent disk: This is a magnetic hard disk drive (HDD) that can have up to 7,500 IOPS in reading and 15,000 IOPS in writing operations.
- Balanced persistent disk: This is the entry-level solid-state drive (SSD) and can have up to 80,000 IOPS in both reading and writing operations
- SSD persistent disk: This is the second-level SSD and can have up to 100,000 IOPS in both reading and writing operations.
Boot disks are the primary disks for a Compute Engine instance. Additionally, you can attach more disks to your virtual machine if you need extra space or for extremely high performance. For the latter, you can add a local SSD as a secondary block storage disk. They are physically attached to the server that hosts your Compute Engine instance and can have up to 0.9/2.4 million IOPS in reading and 0.8/1.2 million IOPS in writing (with SCSI and NVMe technology, respectively).
Security is a particularly important feature when you design your Compute Engine instance. For this reason, Google lets you choose from three different encryption solutions that apply to all the persistent disks of your virtual machine, as follows:
- Google-managed key: This is the default encryption and it is enabled by default on all persistent disks. The encryption key is managed by Google and users do not have to worry about anything.
- Customer-managed key: With this encryption method, the data is encrypted with a user key that is periodically rotated via Google's Key Management System (KMS). This GCP-managed service allows users to have their encryption keys and manage them inside the user's project.
- Customer-supply key: With this encryption method, the data is encrypted with a user-supply key, which is stored and managed outside the GCP user's project.
In this section, you learned what options you have when you decide to run virtual machines on GCP. In the next section, you will be introduced to Virtual Private Cloud in GCP.
Virtual Private Cloud (VPC) is a virtualized private network and data center. Compute resources, storage, and load balancers live within VPC inside the Google Cloud production network that belongs to a specific project. VPC is powered by Andromeda (https://www.usenix.org/system/files/conference/nsdi18/nsdi18-dalton.pdf), the GCP network virtualization stack, which is fully distributed to avoid having a single point of failure. A VPC network provides the following:
- Connectivity services for interconnected Compute Engine, Google Kubernetes Engine (GKE) clusters, and an App Engine flexible environment
- Connectivity services to private on-premises networks using cloud VPN tunnels or cloud interconnect services
- Traffic distribution from Google Cloud load balancers and the backends running inside GCP
GCP projects can have multiple VPC networks that can contain several subnets for each region they cover, as shown in the following diagram:
VPC networks logically separate resources, even though they are in the same project. As shown in Figure 1.17, Compute Engine instances that stay in the same network can communicate with each other, even though they are in different regions without using public internet. On the contrary, instances in the same regions but different networks cannot communicate with each other unless they pass through the public internet or over VPC peering between the different VPC networks.
VPC networks have the following properties:
- Networks are global resources and do not belong to any region or zone. They are contained in one specific project and may span across all the available GCP regions.
- Networks use subnets to logically group GCP resources in one region. Therefore, subnets are regional resources, and they span across zones.
- Networks do not have any IP addresses assigned, while subnets do and they belong to private IPv4 ranges.
- Routes and firewall rules are global resources, and they are associated with one network.
- Firewall rules control the traffic to and from a virtual machine.
- Networks only support IPv4 unicast traffic. Multicast, broadcast, or IPv6 traffic within the network are not supported.
There are two ways to create a VPC network within a GCP project:
- Auto mode: The network has one subnet for each available region and default predefined firewall rules. The IP address range has a fixed /20 for each subnet that's created.
- Custom mode: The network has no default preconfigured subnets and the user has full control over defining all the network resources, such as subnet IP ranges, firewalls, and routing configuration.
In this section, you learned about the basics of VPC and its main components. In the next section, you will get an overview of Load Balancing, DNS, and CDN in GCP.
Overview of Load Balancing, DNS, and CDN
In GCP, load balancers can help distribute traffic across multiple Compute Engine instances. This reduces the risk of having performance issues on the backend application and improves reliability. Moreover, Google Cloud Load Balancing services are engineered on fully distributed and scalable infrastructure that uses software-defined networking to direct traffic to VPC networks. This helps avoid a single point of failure and allows traffic at scale.
The Google Cloud Load Balancer architecture can be represented as follows:
External Google Cloud load balancers provide the following features:
- A single external IP address as the frontend
- Global or regional load balancing to reach your application from the internet
- Internal load balancing
- Layer 4 to Layer 7 load balancing
When it comes to DNS, it is important to clarify which Google Cloud products are needed and when. Google offers three services that deal with DNS:
- Internal DNS
- Cloud DNS
- Cloud Domain
Internal DNS allows Compute Engine instances in the same VPC network to communicate via internal DNS names. Internal records for the virtual machine are created in the DNS zone for
.internal domains. These records are automatically created, updated, and removed by GCP, depending on the virtual machine's life cycle. Moreover, internal DNS is for the virtual machine's primary internal IP address resolution and it cannot be used to resolve its public IP address, nor its private secondary IP address. Additionally, Google recommends using zonal DNS to improve reliability. The fully qualified domain name (FQDN) for a Compute Engine instance has the following format:
Note that to avoid conflict with FQDNs, two Compute Engine instances running in the same zone must have unique instance names.
Cloud DNS is a Google-managed service that allows us to publish domain names to the global DNS in a reliable, cost-effective, and scalable way. Cloud DNS provides both public and private zones and lets the user publish DNS records without worrying about managing its DNS server. Public zones are visible globally, while private zones are visible to one or more specified VPC networks.
Cloud Domain allows users to register and configure domains in Google Cloud. Through Cloud Domain, users can purchase domains and attach them to their applications. The domain is associated with a specific project and its price will be charged to the same Cloud Billing account of the user's project. Therefore, Cloud Domain can be used to search for available domains, buy them, and manage their registration.
Google Cloud offers a managed service to implement a content delivery network to serve content closer to users. The name of this service is Cloud CDN. Through Google's global edge network, Cloud CDN can accelerate websites and applications and improve the user experience. Cloud CDN requires an HTTP(S) load balancer that provides the frontend IP address that receives users' requests and forwards them to the backends. In GCP, there are distinct types of backends that can work with Cloud CDN:
- Managed instance groups: Groups of Compute Engine instances running on a region with autoscaling
- Zonal Network Endpoint Groups (NEGs): A group of IP addresses of running virtual machines or containers in one specific zone or a group of IP addresses and ports of running services
- Serverless NEGs: A group of serverless services such as Cloud Run, Cloud Functions, or App Engine sharing the same URL
- Internet NEGs: A group of IP addresses running outside GCP
- Cloud storage bucket: GCP object storage that's used to store any type of file at scale
All these backend types are called origin servers when we consider the Cloud CDN architecture. The following diagram shows how responses from origin servers flow through an HTTP(S) load balancer and are delivered to the final users via Cloud CDN:
In this section, you learned about the basics of global load balancer services, DNS, and CDN in GCP. In the next section, you will explore DevOps, containers, and Kubernetes.
Exploring containers in the cloud
In recent years, digital transformation has changed the way business is done. Mobility, Internet of Things, and cloud computing require agility, simplicity, and speed to meet market demands. However, traditional businesses and enterprises maintain separation between departments, especially those that are responsible for developing new features and those responsible for maintaining application stability. DevOps methodologies break down this dogma and create a circular environment between development and operational processes. The DevOps goal is to deliver services faster and on-demand, and this can be achieved when development and operation teams work together without any barriers.
DevOps concepts and microservice architectures
The DevOps culture introduces important guidelines, also called CALMS, that should be adopted at every level:
- Culture: Trust, collaboration, respect, and common goals are the main pillars of DevOps culture.
- Automation: Everything should be automated, from building to application delivery.
- Lean: Always optimize processes and reduce waste as much as possible.
- Measurement: Measure everything for continuous improvement.
- Sharing: Share everything, from ideas to common problems.
DevOps culture starts with increasing velocity in software development and deployment. This Agile approach allows us to reduce the time between the application's design and deployment. Thus, DevOps culture promotes the continuous integration, continuous delivery, and continuous deployment model (often referred to as CI/CD) against the traditional waterfall model, as shown in the following diagram:
Continuous integration is the process of constantly merging new code into the code base. This allows software engineers and developers to increase velocity in new feature integrations. Also, automated testing can be inserted early in the process so that it is easier to catch problems and bugs. Continuous delivery is the process of staging code for review and inspection before release. Here, there is manual control over the deployment phase of a new feature. On the other hand continuous deployment leverages automation to deploy new features in production when code has been committed and tested.
To support the CI/CD model and adopt DevOps methodology, software engineers have moved from monolith to microservices application design. A microservice is a small piece of software that is independently developed, tested, and deployed as part of a larger application. Moreover, a microservice is stateless and loosely coupled with independent technology and programming languages from other microservices. Large applications built as collections of microservices that work together have the following benefits:
- High horizontal scalability: Microservices can be created as workload increases.
- High modularity: Microservices can be reused to build modular applications.
- High fault tolerance: Microservices can be restarted quickly in case of crashes. Workloads can also be distributed across multiple identical microservices to improve reliability.
- High integration with the CI/CD model: Microservices can fit the CI/CD model because they can be quickly and easily tested and deployed in production.
The best way to follow the microservices approach is to leverage virtualization technology, or better, the containerization methodology. In the next section, we will show how containers are like virtual machines and the main differences that make them ideal for microservices implementation.
Containerization versus virtualization
Since we introduced virtual machines at the beginning of this chapter, it is time to understand what a container is and how it differs from virtual machines. Containers are portable software packages that are independent of the infrastructure that they run in. They wrap one application and all its dependencies that are needed for execution.
Containers fit very well into the microservice architecture because they are modular and they are easy to change and scale.
The main differences between containers and virtual machines are shown in the following diagram:
The major features of containers, compared to virtual machines, are as follows:
- Faster deployment: Deploying a container requires seconds rather than minutes.
- Less overhead: Containers do not include the operating systems. Virtual machines do.
- Faster migration: Migrating one container from one host to another takes seconds instead of minutes.
- Faster restart: Restarting one container takes seconds rather than minutes.
Usually, containers apply when users want to run multiple instances of the same application. Containers share a single operating system kernel, and they are logically separated in terms of the runtime environment, filesystem, and others. Virtual machines are logically separated operating systems running on the same general-purpose hardware. Both virtual machines and containers need to run on software that allows for virtualization. For virtual machines, the hypervisor is responsible for virtualizing the hardware to let multiple operating systems run on the same machine. For containers, Container Engine is responsible for virtualizing the operating system (binaries, libraries, filesystem, and so on) to let multiple applications run on the same OS.
It is clear from Figure 1.21 that containers have less overhead than virtual machines. They do not need to load the operating system when the workload requires new applications. Applications can be started in seconds and their isolation is maintained as it would be with virtual machines. In addition, application agility is improved as applications can be created or destroyed dynamically when the workload requires it. Moreover, containers reduce the number of resources that would be needed to deploy a new application. It has been well demonstrated that running a new containerized application consumes far fewer resources than one running on a virtual machine. This is because containers do not need to load an OS that includes dozens of processes in the idle state.
One of the most popular platforms for developing, packaging, and deploying containers is Docker. It also includes Docker Engine, which is supported on several operating systems. With Docker, users can build container images and manage their distribution. Docker has several key concepts:
- Portability: Docker applications can be packaged in images. These can be built on a user's laptop and shift unchanged to production.
- Version control: Each image is versioned with a tag that is assigned during the building process.
- Immutable: When Docker containers are created, they cannot be changed. If restarted, the container is different from the previous one.
- Distribution: Docker images can be maintained in repositories called registries. Images can be pushed to the registry when new images are available. They can be pulled to deploy new containers in production.
Using Docker, applications can be packed into containers using
Dockerfiles, which describe how to build application images from source code. This process is consistent across different platforms and environments, thus greatly increasing portability. The main instructions contained in a
Dockerfile are represented in the following diagram:
FROM instruction tells Docker Engine which base image this containerized application will start from. It is the first statement of every Dockerfile, and it allows users to build images from the previous one. The
COPY instruction copies the code and its library files into the container image. The
RUN clause instruction runs commands when the container will be built. The
WORKDIR instruction works as a change directory inside the container. The
EXPOSE instruction tells us which port the container will use to provide services. Finally,
ENTRYPOINT starts the application when the container is launched.
EXPOSE instruction does not publish the port. It works as a type of documentation. To publish the port when running the container, the user who runs the container should use the
-p flag on
docker run to publish and map one or more ports.
Dockerfile is ready, you can build the container image using the
docker build command. It is mandatory to also include the code and the library requirement files during the building process. Additionally, it is good practice to tag images that have been built to identify the application version.
Container orchestration with Google Kubernetes Engine
So far, we have learned that containerization helps adopt DevOps culture and minimize the gap between application development and deployment. However, when large and complex applications are composed of dozens of microservices, it becomes extremely difficult to coordinate and orchestrate them. It is important to know where containers are running, whether they are healthy, and how to scale when the workload increases. All these functions cannot be done manually; they need a dedicated system that automatically orchestrates all the tasks. Here is where Kubernetes comes in.
Kubernetes (K8s for short) is an open source orchestration tool (formerly an internal Google tool) that can automatically deploy, scale, and failover containerized applications. It supports declarative configurations, so administrators describe the state of the infrastructure. K8s will do everything it can to reach the desired state. So, Kubernetes maintains the state of the infrastructure that is written in configuration files (also known as manifest files).
The main Kubernetes features can be listed as follows:
- Supports both stateless and stateful applications: On K8s, you can run applications that do not save user sessions such as web servers or some others that do store persistently.
- Auto-scaling: K8s can scale containerized applications in and out based on resource utilization. This happens automatically and is controlled by the cluster itself. The administrators can declare autoscaling thresholds in the deployment manifest files.
- Portable: Administrators are free to move their workloads between on-premises clusters and public cloud providers with minimal effort.
K8s is composed of a cluster of several nodes. The node that's responsible for controlling the entire cluster is called the master node. At least one of these is needed to run the cluster. Here, Kubernetes stores the information regarding the objects and their desired states. The most common Kubernetes objects are as follows:
- Pod: This object is a logical structure that the container will run in.
- Deployment: This object describes how one application should be deployed into the K8s cluster. Here, the administrator can decide what container image to use for its application, the desired number of Pods running, and how to auto-scale.
- Service: This object describes how the application that's been deployed can be reached from other applications.
In Kubernetes, worker nodes are responsible for running containers. Containers cannot run on the Kubernetes cluster in their native format. They need to be wrapped into a logical structure known as a Pod. Kubernetes manages Pods, not containers. These Pods provide storage and networking functions for containers running within the Pod. They have one IP address that is used by containers to expose their services. It is good practice to have one container running in a Pod. Additionally, Pods can specify a set of volumes, which can be used as a storage system for containers. Pods can be grouped into namespaces. This provides environment isolation and increases cluster utilization.
The Kubernetes architecture is shown in the following diagram:
In GCP, administrators can run managed Kubernetes clusters with Google Kubernetes Engine (GKE). GKE allows users to deploy Kubernetes clusters in minutes without worrying about installation problems. It has the following features:
- Node autoscaling: GKE can auto-scale worker nodes to support variable workloads.
- Load balancing: GKE can benefit from Google Load Balancing solutions for its workloads.
- Node pools: GKE can have one or more worker node pools with different Compute Engine instances.
- Automatic repair and upgrades: GKE can monitor and maintain healthy Compute Engine worker nodes and apply automatic updates.
- Cluster logging and monitoring: Google Cloud Operations lets administrators have full control over the state of the Kubernetes cluster and its running workloads.
- Regional cluster: GKE can run K8s clusters across multiple zones of one region. This allows you to have highly available K8s clusters with redundant masters, and multiple worker nodes spread between zones.
When it comes to networking with Kubernetes and GKE, it is important to remember the following definitions:
- Node IP: This is the IP address that a worker node gets when it starts. In GKE, this IP address is assigned based on the VPC subnet that the cluster is running in. This address is used to allow communication between the master node and the worker node of the K8s cluster.
- Pod IP: This is the IP address that's assigned to the Pod. This address is ephemeral and lives for as long as the Pod runs. By default, GKE allocates a
/14secondary network block for the entire set of Pods running in the cluster. More specifically, GKE allocates a
/24secondary IP address range for each worker node the cluster has.
- Cluster IP: This is the IP address that's given to a service. This address is stable for as long as the service is present on the cluster. By default, GKE allocates a secondary block of IP addresses to run all the services in the cluster.
The following diagram provides a better understanding of GKE IP addressing:
Since Pods maintain a separate IP address space from worker nodes, they can communicate with each other in the same cluster without using any kind of network address translation. This is because GKE automatically configures the VPC subnet with an alias IP, which is an authorized secondary subnet in the region where the cluster is deployed.
In Kubernetes, Pods are ephemeral, and they might have a short life. K8s may create new Pods in case of a change in the deployment or may restart Pods in case of crashes or errors. Moreover, when load balancing is needed across multiple Pods, it is crucial to have load balancing services to direct traffic to Pods. Here, the Kubernetes Service comes in handy because it allocates a static IP address that refers to a collection of Pods. The link between the Service and Pods is based on Pod labels and the Service selector. This last parameter allows Service objects to bind one static IP address to a group of Pods.
When Services are created, the ClusterIP is allocated statically, and it can be reached from any other application running within the cluster. However, most of the time, traffic comes from outside the cluster, so this cannot reach Services running inside it. GKE provides four types of load balancers that address this problem, as follows:
- External TCP/UDP load balancer: This is a layer 4 load balancer that manages traffic coming from both outside the cluster and outside the VPC.
- External HTTP(S) load balancer: This is a layer 7 load balancer that uses a dedicated URL forwarding rule to route the traffic to the application. This is also called Ingress.
- Internal TCP/UDP load balancer: This is a layer 4 load balancer that manages traffic coming from outside the cluster but internally to the VPC.
- Internal HTTP(S) load balancer: This is a layer 7 load balancer that uses a dedicated URL forwarding rule to route the intra-VPC traffic to the application. This is also called Ingress and it is applied to internal traffic.
In this section, you learned about the basics of Kubernetes and its implementation in GCP, Google Kubernetes Engine. Since GKE is based on clusters of Compute Instance VMs, networking is a crucial part to make your Pods and Services run as you need them to.
In this chapter, you learned what cloud computing and virtualization are. You also learned what cloud service models are available for your business and their advantages. Moreover, since GCP is a public cloud provider, you have been given a brief explanation of the differences between public, private, and hybrid cloud services.
GCP's physical architecture was described, as well as its regions and zones. We also looked at its logical architecture and specified the various organizations, folders, projects, and resources.
A deep explanation of what a Google Compute Engine instance is, and how you can use one for your workload, was provided in the second part of this chapter.
After introducing a few of the GCP services, such as Cloud DNS, Cloud CDN, and Cloud Load Balancing, we looked at the DevOps culture and its implementation with Kubernetes and Google Kubernetes Engine (GKE).
In the next chapter, you will learn how to design, plan, and prototype a GCP network.
- GCP locations and services:
- GCP landscape and concepts:
- GCP IaaS and PaaS services:
- GCP networking products overview:
- Resources for the certification:
- Network Engineer learning path:
- Network Engineer sample questions:
- Network Engineer labs: