AWS Certified Solutions Architect – Professional Exam Guide (SAP-C02)

Designing Networks for Complex Organizations

Networking is a key aspect in meeting the security and compliance requirements of an organization. It determines whether and how resources in your Amazon Web Services (AWS) environment can be accessed from anywhere in your organization and beyond.

This chapter will cover the services on AWS that can be used to design hybrid networks, allowing an organization to reach AWS resources from its on-premises environments and vice versa. You will learn how to connect to AWS services without going through the internet and will also look at network communication across multiple AWS accounts.

The following topics will be covered in this chapter:

Establishing virtual private network (VPN) connections
Introducing AWS Direct Connect (DX)
Introducing AWS Storage Gateway
Leveraging virtual private cloud (VPC) endpoints
Introducing AWS Transit Gateway

Establishing VPN Connections

The first option when it comes to protecting connectivity between an enterprise’s on-premises infrastructure and its AWS environment is to establish a VPN connection. AWS offers several alternatives to achieve that. The following section details each of them.

AWS Managed VPN

The first one is AWS Managed VPN, or Site-to-Site VPN. This is a fully managed service that provides an Internet Protocol Security (IPsec) VPN connection over the internet from your on-premises network equipment to AWS-managed network equipment attached to your AWS VPC.

The VPN concentrator end on the AWS side can be either a virtual private gateway (VGW) attached to a single VPC, as illustrated in the following diagram, or a transit gateway (TGW) attached to multiple VPCs (see Figure 2.2). The other end connecting to your on-premises equipment is called a customer gateway (CGW):

Figure 2.1: VPN connection between single VPC and on-premises equipment

The architecture you choose depends on your AWS environment network topology. Figure 2.2 shows the TGW option:

Figure 2.2: VPN connection between TGW and on-premises equipment

Complex organizations usually end up managing multiple VPCs that require inter-VPC communication, connectivity to the internet, and/or connectivity to your on-premises infrastructure. They then often leverage the TGW service to have a clean hub-and-spoke network model (more on this in the section dedicated to TGWs at the end of this chapter).

It is worth noting that AWS Managed VPN also provides redundancy and automatic failover, therefore it is highly recommended to connect your VGW or TGW to two separate CGWs on your end. By doing so, you establish two separate VPN connections, and if one of your on-premises devices fails, all traffic will be automatically redirected to the second VPN connection (see Figure 2.3). It allows you to nicely handle failover, as follows:

In case of an unexpected failure of your on-premises router sitting behind your CGW
When you need to perform maintenance on your network equipment and must take one of two VPN connections offline for the duration of the maintenance operation

This is illustrated in the following diagram:

Figure 2.3: VPN connection redundancy for failover

AWS Managed VPN offers both dynamic and static routing options. Dynamic routing leverages Border Gateway Protocol (BGP) to pass routing information between the VGW on AWS and your on-premises CGW. It allows you to specify routing priorities, policies, and weights in your BGP advertisements and to influence the network path between your networks and AWS. It is worth noting that when using BGP, both the

IPsec and BGP connections must be terminated on the same CGW device(s). Both the BGP-advertised and static route information tell gateways on each side which tunnels are available to re-route traffic in case of failure. That said, the BGP protocol brings more robustness to the table thanks to the live detection checks it performs, so using BGOP-capable devices will make your life easier when dealing with failover from the primary to the secondary VPN connection upon failure.

AWS Managed VPN is a great approach when you need to connect one on-premises location with your AWS environment, but what about situations where you need to interconnect several remote offices together and with your AWS environment?

AWS VPN CloudHub

AWS VPN CloudHub is a hub-and-spoke VPN solution to securely connect multiple branch offices together and a VPC on AWS. It leverages the AWS Managed VPN service, but instead of creating CGWs for a single on-premises location, you create as many CGWs as you have remote branches/offices that need a VPN connection and connect all of them to the same VGW on AWS. The result is a simple low-cost hub-and-spoke VPN setup that can be used for communicating securely from one branch/office to another and between your branches/offices and your AWS environment.

The following diagram illustrates this:

Figure 2.4: Hub-and-spoke VPN

Important Note

The remote sites must not have overlapping IP ranges.

Redundancy and failover mechanisms follow the same principle as for AWS Managed VPN. For greater reliability, it is recommended to use multiple CGW devices on your on-premises locations.

It is worth noting that the AWS VPN CloudHub construct is compatible with AWS DX, which will be covered in the next section. For instance, on the hub-and-spoke model represented in the previous diagram, one of your on-premises environments could connect to AWS using an AWS DX connection while the other two on-premises locations use a VPN connection over the internet.

Now that you’ve seen which managed services AWS provides to establish a VPN connection, you can consider cases where an organization may prefer or need to bring its own VPN software solution.

Software VPN

An additional alternative consists of connecting your on-premises network equipment to a software VPN appliance running inside a VPC on AWS. This is the right option if, for some reason, you want or need to manage both ends of the VPN connection. You can select between several partner solutions or open-source solutions that provide VPN software appliances that can run on Amazon Elastic Compute Cloud (EC2) instances.

The major difference between this option and AWS Managed VPN is that in this case, you must manage the software appliances entirely, including updates and patching at operating system (OS) and software levels. Another essential point to note is that a software VPN appliance deployed on an Amazon EC2 instance is, per se, a single point of failure (SPOF). Thus, reliability is an extra complexity that you must deal with, whereas it is handled for you by AWS, on the AWS end of the connection, when using the Managed VPN solution.

This concludes the section on VPN connections, but as you will see now, a VPN is not the only way to establish a private connection between your on-premises infrastructure and your AWS environment.

Introducing AWS DX

Using a VPN connection when you get started makes a lot of sense. It can be up and running in no time and will likely cause no big change in your network topology.

However, it is not always the best option. For cases where internet connectivity unreliability becomes a business risk, AWS DX offers the right alternative by offering low latency and consistent bandwidth connectivity between your on-premises infrastructure and AWS.

In a nutshell, a DX connection ties one end of the connection to your on-premises router and the other end to a virtual interface (VIF) on AWS. There are three different types of VIFs: public VIFs, private VIFs, and transit VIFs. Public VIFs are used to connect to AWS services’ public endpoints. Private VIFs are used to connect to your own AWS environments within a VPC. Transit VIFs allow you to end the connection on a TGW.

Various Flavors of AWS DX

You can use AWS DX provided that one of the following applies:

Your network is co-located with an existing AWS DX location; see https://packt.link/Awm60 for a current list of these.
You leverage an AWS DX partner, a member of the AWS Partner Network (APN); see https://packt.link/6OyGq for a current list of these.
You work with an independent service provider to connect to AWS DX.

There exist three types of DX connections. That said, only the first two types listed in the following section are recommended when you require a consistent connection capacity, which is eventually the main reason to set up a DX connection.

Dedicated Connection

This type of connection, available as 1 gigabit per second (Gb/s), 10 Gb/s, or 100 Gb/s ports, consists of a dedicated link assigned to a single customer. Dedicated connections can be combined to further increase your bandwidth by using link aggregation groups (LAGs). Link-speed availability can vary per DX location, so it is best to consult the list of DX connections from the AWS documentation at the link mentioned previously.

Dedicated connections support up to 50 private or public VIFs and 1 transit VIF.

Hosted Connection

This type of connection, available from 50 megabits per second (Mb/s) to 10 Gb/s, consists of a connection provided by an AWS DX partner and is made available on a link shared with other customers. AWS makes sure that the sum of all hosted connections’ capacities per link does not exceed the network link’s actual capacity.

With up to 500 Mb/s capacity, hosted connections support one private or public VIF. Hosted connections of 1 Gb/s or more support one private, public, or transit VIF.

If you require more than one VIF, either obtain multiple hosted connections or use a dedicated connection.

Hosted VIF

Some AWS DX partners provide hosted VIFs, which consist of a VIF made available to you in your AWS environment while the underlying DX connection is managed in a separate account by the provider. However, it is worth noting that AWS does not limit the traffic capacity on hosted VIFs. Therefore, the underlying DX connection capacity can be oversubscribed, which could result in traffic congestion, and therefore it is not a recommended option when you’re looking for a consistent capacity to connect your on-premises and AWS environments.

AWS DX Connectivity Overview

The following diagram shows an overview of end-to-end (E2E) connectivity when setting up an AWS DX link between your on-premises and AWS environments:

Figure 2.5: Public and private VIFs

In the case of a private VIF, the VIF can be attached either to a VGW in a VPC in the same region as your DX connection or to a DX gateway (DX GW). An AWS DX GW is a globally available resource on AWS that can be accessed from any region. Its role is to help connect multiple VPCs, possibly in multiple AWS regions, through AWS DX.

It is important to note that a single DX dedicated connection can support up to 50 public or private VIFs. When using private VIFs, you have a choice either to connect those VIFs directly to your VPCs or to use a DX GW in between. Because each DX GW can connect on the other end up to 10 VGWs (so, 10 VPCs), using a DX GW allows you not only to connect to 500 VPCs through a single DX connection, but those VPCs can also be in multiple AWS regions.

Additionally—and this will be the focus of a later section in this chapter—you can also leverage an AWS TGW to simplify routing in cases where you have a large number of VPCs (in the 100s or 1,000s). A single TGW can support up to 5,000 (VPC) attachments today.

Large and complex organizations typically have an AWS environment spanning more than one AWS region, whether this is because they operate in multiple geographies or to follow some regulatory recommendations, or for disaster recovery (DR) purposes.

The following diagram summarizes the various options available:

Figure 2.6: DX options summary

Such complex organizations adopt either a private VIF to DX GW (Option 2) or a transit VIF to DX GW (Option 3) or sometimes a combination of the two, essentially because an AWS DX GW and a TGW make their life so much easier. A VPN connection over a public VIF (Option 4) can be used to enforce E2E encryption as an extra security measure over public VIFs when MACsec (IEEE 802.1AE Media Access Control (MAC) security standard) encryption over DX is not available at your preferred DX location.

Now, you may be wondering when to use IPsec encryption and when to use MACsec encryption over DX. The first consideration is connection speed. MACsec encryption is available at speeds (10 Gb/s and 100 Gb/s) that cannot be reached with a single VPN connection (maximum 1.25 Gb/s). So, if you require encryption on links of 10 Gb/s or more, then MACsec, if it is available at your DX location, is a much easier solution for encryption. Alternatively, you could think of aggregating multiple VPN IPsec connections to work around the throughput limit, but that increases the operational complexity. The second consideration is technology. IPsec encryption is an E2E connectivity encryption mechanism that works at layer 3 of the Open Systems Interconnection (OSI) model (that is, IP). MACsec encryption, on the other hand, is a hop-by-hop encryption mechanism at layer 2 of the OSI model (that is, MAC). In this case, every network hop is responsible for encrypting the data frames until the next hop, and so on so forth. Both encryption mechanisms operate at different layers and are not mutually exclusive, but you can use either of the two or both simultaneously. MACsec encryption brings an additional protection layer to your security arsenal.

Additional Considerations for Resiliency

As a best practice, it is recommended to have at least two separate connections at two different DX locations. In this case, you end up with two DX connections. This will provide resiliency against connectivity failure due to a device failure, a network cable cut, or an entire location failure.

To achieve maximum resiliency, use at least two separate connections terminating on distinct devices in at least two DX locations attached to two different regions. In this case, you end up with at least four DX connections and are protected not just against a single device failure, a network cable cut, or an entire location failure, but also against an entire geography failure.

Either as an alternative to additional DX connections or as an additional resiliency protection measure, you can also create a VPN connection as a backup connectivity option.

Now that you know how to set up hybrid network infrastructure, you are ready to learn how to create a hybrid storage infrastructure between your on-premises locations and your AWS environment.

Cost Factor

On top of the already mentioned reasons to opt for a DX connection, such as network bandwidth consistency and throughput, the cost is obviously an important aspect not to be ignored or discarded too quickly.

For occasional usage and low data volume transmission between your on-premises environment and AWS, in many cases, a VPN connection is good enough, and this is what organizations typically begin with when they start using AWS. After all, all organizations already have broadband internet access nowadays, so setting up a simple IPsec connection is usually straightforward. However, when additional requirements come in, such as improving network consistency and reliability or benefiting from a higher network throughput, organizations start looking into AWS DX.

And beyond technical requirements, the overall cost of the solution should also be estimated. For AWS Managed VPN, you pay for the number of hours the connection is active (which varies per AWS region) and for data (volume) that you transfer from AWS to your on-premises environments, also known as Data Transfer Out (DTO). For AWS DX, you pay a price per port hours a DX connection is up (which varies per AWS region and connection capacity) and for data (volume) that you transfer from AWS to your on-premises environments. Data sent into AWS bears no costs.

DTO costs for data sent over a VPN connection are the same as for data sent over the internet from your AWS environment and vary per AWS region. DTO costs for data sent over a DX connection vary per combination of AWS region and AWS DX location. The closer those two are to each other, the lower the DTO costs. For instance, DTO costs are lower when transferring data from any AWS region in Europe to any DX location in Europe than from any AWS region in Asia to any DX location in Europe (and vice versa, by the way). That said, DTO costs for traffic sent over DX are always lower than DTO costs for traffic sent over the internet (VPN or not), and sometimes even an order of magnitude lower.

Thus, besides mere technical requirements, in situations when large volumes of data (terabytes (TB) or beyond) need to be transferred from your AWS environments to your on-premises environment(s), it can become significantly more beneficial financially to leverage a DX connection.

Introducing AWS Storage Gateway

AWS Storage Gateway is a service that provides a series of solutions to expand your storage infrastructure into the AWS cloud for purposes such as data migration, file shares, backup, and archiving. It uses standard protocols to access AWS storage services such as Amazon Simple Storage Service (S3), Amazon S3 Glacier, Amazon Elastic Block Store (EBS) snapshots, and Amazon FSx.

There are three different flavors of Storage Gateway as listed here:

File Gateway
Volume Gateway
Tape Gateway

The following section dives into the details of each.

File Gateway

File Gateway is nowadays further split into two distinct types: S3 File Gateway and FSx File Gateway.

S3 File Gateway

Initially the only available type of file gateway when AWS Storage Gateway launched, S3 File Gateway allows you to store files on S3 transparently accessible from your on-premises environment through the Network File System (NFS) and Server Message Block (SMB) protocols. S3 File Gateway does a one-to-one mapping of your files to S3 objects and stores the file metadata (for example, Portable Operating System Interface (POSIX) file access control lists (ACLs)) in the S3 object metadata. The files are written synchronously to the file gateway local cache before being copied over to S3 asynchronously.

Concretely, S3 File Gateway comes either as a preset hardware appliance or as a software appliance that you deploy in your on-premises environment. The software appliance consists of a virtual machine (VM) that can run either on VMware Elastic Sky X (ESX), Microsoft Hyper-V, or a Linux kernel-based VM (KVM) hypervisor (but also on Amazon EC2 instances, should you need to).

See the following diagram for an illustration of how S3 File Gateway works:

Figure 2.7: Amazon S3 File Gateway

Once deployed and configured, your servers on-premises can use it like any other file share through the NFS and SMB protocols. Multiple elements can condition the performance of your gateway, but key factors are CPUs, local disk size, and network capacity.

The CPU resources and network capacity available to the appliance will directly influence the amount of data the gateway can process in parallel. The local disk size assigned to the file gateway will condition the cache size (on the hardware appliance, this is obviously constrained by the amount of physical storage available, so it is best to think it through before ordering the appliance). The cache size is to be determined such that it provides enough capacity to store your most frequently accessed files so that they benefit from low-latency access. On the software appliance, you can always add more cache capacity (additional storage volumes) later if you realize that your cache is undersized.

In terms of security, it remains your responsibility to control and manage access to the S3 bucket(s) sitting behind the gateway and to follow best practices. Therefore, remember to set up the right permissions (Identity and Access Management (IAM) role identity-based policies and/or S3 bucket policies) accordingly, to follow a least-privileges approach.

Because the files are ultimately stored as objects on S3, you also have the freedom to use the rich set of capabilities Amazon S3 provides to manage their life cycle, such as life cycle policies, versioning, cross-replication rules, and so on.

Finally, back up your file gateway storage. AWS Backup integrates with AWS Storage Gateway, so you can back up your file gateway storage to AWS. AWS Backup stores the gateway backup on Amazon S3 as EBS snapshots that can later be restored either on-premises or on AWS.

FSx File Gateway

Amazon FSx File Gateway is a recent addition to the AWS Storage Gateway family to provide access to Amazon FSx for Windows File Server file shares on AWS from your on-premises environment. The idea is very similar to S3 File Gateway, which is that you can access the data on AWS through either a physical hardware appliance or a software appliance that you deploy on-premises either on VMware ESX, Microsoft Hyper-V, or a Linux KVM hypervisor (but also on Amazon EC2 instances, should you need to).

There are a few major differences from the S3 File Gateway service as outlined below:

Your files, managed through FSx File Gateway, will be available through the SMB protocol only (you cannot use NFS).
You need to have previously deployed an Amazon FSx for Windows File Server filesystem in your AWS environment.
You must have access via VPN or DX from your on-premises environment to that Amazon FSx for Windows File Server filesystem on your AWS environment.

For the above reasons, the use cases for each gateway type are also slightly different.

You would use Amazon S3 File Gateway when you want to access files you have stored on S3 from on-premises or want to make files you store on-premises available on S3 for further processing on AWS. In this case, you can then leverage all the services AWS provides to run all sorts of data analytics, including machine learning capabilities, to analyze the data on S3.

You would rather use Amazon FSx File Gateway when you want to move on-premises network file shares accessed through the SMB protocol to the cloud and keep accessing them seamlessly from your on-premises environment. Think of cases such as user home directories, team file shares, and so on.

See the following diagram for an illustration of how Amazon FSx File Gateway works:

Figure 2.8: FSx File Gateway

Amazon FSx File Gateway is integrated with AWS Backup, so you can also manage and automate backups centrally, like Amazon S3 File Gateway. Additionally, you can activate Microsoft snapshotting technology and Microsoft Windows Shadow Copy on your version of Amazon FSx for the Windows File Server filesystem to allow users to easily view and restore files and folders on your file shares from a snapshot.

Volume Gateway

Volume Gateway allows you to create storage volumes on S3 that offer a block storage interface accessible from your on-premises environment through the standard Internet Small Computer Systems Interface (iSCSI) protocol.

Concretely, Volume Gateway comes either as a preset hardware appliance or as a software appliance that you deploy in your on-premises environment. The software appliance consists of a VM that can run either on VMware ESX, Microsoft Hyper-V, or a Linux KVM hypervisor (but also on Amazon EC2 instances, should you need to).

You have the choice between two operations modes for your Volume Gateway Service. Either you cache a portion of the data (cached volume) or keep a full copy of the volume (stored volume) locally on the gateway.

With cached volumes, as illustrated in Figure 2.9, you can reduce the amount of storage you need on-premises by limiting it to store the most frequently accessed data. In this scenario, Volume Gateway stores all your data on storage volumes on Amazon S3 and retains only the most recently accessed data on your local cache storage on-premises for low-latency access. You can additionally take incremental backups, also known as snapshots, of your storage volumes in Amazon S3. These snapshots are also stored in Amazon S3 as Amazon EBS snapshots. If you need to recover your data after an incident, these snapshots can be restored to a storage volume on your gateway.

Alternatively, for cases such as application migration to the cloud or DR in the cloud, you can create a new Amazon EBS volume from one of your EBS snapshots (provided the snapshot is not larger than 16 tebibytes (TiB)) and then attach it to an Amazon EC2 instance.

See the following diagram for an illustration of how Volume Gateway works with cached volumes:

Figure 2.9: Volume Gateway (cached volumes)

With stored volumes, as illustrated in Figure 2.10, you retain your data entirely on-premises for low-latency access. In this case, Volume Gateway makes use of your local storage for storing your entire set of data and creates a backup copy of your volumes to Amazon S3 to provide durable offsite backup. The backup copy is performed asynchronously through Amazon EBS snapshots on Amazon S3:

Figure 2.10: Volume Gateway (stored volumes)

Volume Gateway can serve multiple use cases, such as the following:

Hybrid cloud storage for file services (expandable cloud storage for on-premises file servers)
Backup and DR (offsite durable storage with DR capability in the cloud)
Application data migration (application ready to start in the cloud with a copy of the data)

Now, you may be wondering how to choose between cached volumes and stored volumes. Well, they serve slightly different use cases, don’t they? On the one hand, Cached Volumes gives you the opportunity to keep your most frequently accessed data on-premises for low latency access, while storing everything else—that is, cold(er) data—on Amazon S3. Thus, they let you keep the storage hardware you need on-premises to a minimum. They are a great solution when only a limited portion of your overall data is frequently accessed and when reducing your on-premises storage footprint and related costs is important to you. Maybe you need to expand your overall storage capacity but don’t want to do so on-premises. Occasional longer data access times must also be acceptable in this case (when the requested data is not in the local cache).

On the other hand, Stored Volumes keeps your entire dataset on-premises in local storage for low latency access. They are particularly adapted for cases where longer data access cannot be tolerated and where the focus is not on reducing your on-premises storage infrastructure footprint or costs as much as it is on improving the durability of your data and providing an additional option for DR in the cloud.

Tape Gateway

Tape Gateway offers a virtual tape library (VTL) service backed by storage on Amazon S3 and accessible on-premises through the standard iSCSI protocol.

Concretely, Tape Gateway comes either as a preset hardware appliance or as a software appliance that you deploy in your on-premises environment. The software appliance consists of a VM that can run either on VMware ESX, Microsoft Hyper-V, or a Linux KVM hypervisor (but also on Amazon EC2 instances, should you need to).

As illustrated in the following diagram, Tape Gateway provides a VTL infrastructure that scales seamlessly, without the burden of having to operate or maintain the tape infrastructure on-premises. It integrates with the most popular backup solutions on the market, so chances are high that you can keep using your existing backup application. Now, the major difference from your previous physical tape solution or VTL solution is that Tape Library will store your virtual tapes in the cloud on Amazon S3. When your backup application sends data to the tape gateway, the data is first stored locally on the gateway and then copied over to the virtual tapes on Amazon S3 asynchronously:

Figure 2.11: Tape gateway

Just as with any VTL solution, Tape Gateway proposes the concepts of a tape drive and media changer. Both the tape drive and media changer are available to your backup application as iSCSI devices.

The tape archive also offers the possibility to archive your tapes. When your backup application instructs Tape Gateway to archive a tape, the tape will be moved to a lower-cost storage tier using Amazon S3 Glacier or Amazon S3 Glacier Deep Archive.

Additional Considerations

To wrap up what was just covered, AWS Storage Gateway offers three different types of gateways to enable a hybrid storage architecture across your on-premises infrastructure and your AWS environment. You leverage each of these three types depending on the use case at stake—File Gateway when setting up a hybrid file server infrastructure, Volume Gateway when expanding your block storage infrastructure to the cloud, and Tape Gateway for replacing your physical tape infrastructure with virtual tapes on AWS.

The following section will take you through a few additional considerations to better plan the actual implementation of such a hybrid storage infrastructure.

Resiliency

The gateway, hardware, or software appliance is by default a SPOF. So, what are your options to deal with any type of failure, for instance, if a component crashes or at least stops responding, whether it is due to the appliance, the hypervisor, the network, and so on?

In the case of a software appliance that you deploy on VMware ESXi, you have an option to enable high availability (HA) using VMware HA. AWS Storage Gateway provides a series of application health checks that VMware HA can interpret to automatically recover your storage gateway when the health-check thresholds you specify are breached. That will cater to most failure cases.

This option is most useful when organizations cannot tolerate a long interruption of service or any data loss.

Quotas

As with any other AWS service, AWS Storage Gateway is bound by certain quotas. These quotas can be soft or hard limits constraining the service. Different quotas apply depending on the flavor of storage gateway that you implement. Here is an indication of the main quotas for each different type, but remember to check the AWS documentation to have the latest and most up-to-date figures:

File Gateway quotas concern the maximum number of file shares per gateway (10), the maximum size of an individual file in the share (5 TB), the maximum path length (1,024 TiB). Note that one file share maps exactly to one Amazon S3 bucket. Adding more file shares will add more S3 buckets onto your AWS environment, so you also need to make sure you will not be exceeding your Amazon S3 quotas.
Volume Gateway quotas are the maximum size of a volume (32 TiB for cached volumes; 16 TiB for stored volumes), the maximum number of volumes per gateway (32), the maximum size of all volumes per gateway (1,024 TiB for cached volumes; 512 TiB for stored volumes).
Tape Gateway quotas concern the minimum and maximum sizes of a virtual tape (100 gibibytes (GiB) -> 5 TiB), the maximum number of virtual tapes per virtual tape library (1,500), the total size of all tapes in a library (1 pebibyte (PiB)).

This concludes the first half of this chapter, which focused on the creation of a hybrid infrastructure across on-premises infrastructure and AWS. In the second half of this chapter, you will investigate how to enhance communication first between your private environment on AWS and AWS services or third-party services offered on AWS, and secondly, within the realm of your AWS environment.

The following sections will describe how you can improve communication between your private environment on AWS and AWS services or third-party services offered on AWS.

Leveraging VPC Endpoints

AWS offers a highly available and scalable technology called AWS PrivateLink. AWS PrivateLink enables you to privately connect any of your VPCs either to the supported AWS services or to VPC endpoint services (that is, services powered by AWS PrivateLink that are hosted in other AWS accounts, whether by you or by a third party). For example, many of the services that AWS partners offer on AWS Marketplace support AWS PrivateLink nowadays.

Using AWS PrivateLink, you can then avoid exposing the traffic between your VPC and the target service on AWS to the internet; the E2E communication does not leave the AWS network.

Now, how does this work?

To use AWS PrivateLink, you simply create a VPC endpoint that will serve as an entry point to reach the destination service. This is illustrated in Figure 2.12:

Figure 2.12: VPC endpoint

As illustrated in the preceding diagram, a VPC endpoint does not require a public IP address, an internet gateway, a peering link, a VPN, or a DX connection to be able to reach the destination service using AWS PrivateLink. The traffic always stays within the boundaries of the AWS network.

VPC endpoints are highly available and scalable virtual devices that you create in your AWS environment. There are currently three types of endpoints, as outlined here:

Interface endpoints
Gateway Load Balancer (GWLB) endpoints
Gateway endpoints

The following sections discuss each of these in detail.

Interface Endpoints

Interface endpoints, powered by AWS PrivateLink, are entry points for the traffic targeting a supported AWS service or a VPC endpoint service.

Concretely, an interface endpoint consists of an Elastic Network Interface (ENI) with a private IP address taken from the address range associated with the subnet in which it is created.

It is recommended to enable the private Domain Name System (DNS) (which is the default option) when you create an interface endpoint as this will make it easier to reach out to the supported service. Specifically, it will allow you to make use of the default DNS name of the service and still go through the interface endpoint leveraging private connectivity. Doing so avoids your applications from becoming aware of and having to use the endpoint-specific DNS name; instead, they can keep using the default (public) DNS name of the supported service. The following diagram illustrates this:

Figure 2.13: VPC interface endpoints and DNS names

You can enforce security best practices with interface endpoints in several ways.

First, you can associate security groups with interface endpoints and control which resources can use your endpoints. Secondly, you can associate IAM resource-based policies—called endpoint policies—with your interface endpoints to control which principals (users or roles) under certain conditions are allowed to use the endpoint.

Furthermore, interface endpoints can also be used in a hybrid cloud scenario where they can be accessed from your on-premises environment. The following current limitations are worth noting:

An interface endpoint can only be created in one subnet per Availability Zone (AZ).
Not all AWS services support interface endpoints: the list keeps growing on a regular basis, but it is recommended to check the AWS documentation for the latest update.

An interface endpoint is the principal type of VPC endpoint you will come across but, as previously mentioned, it is not the only one. The following sections present the other two types, starting with the latest and newest sort—GWLB endpoints.

GWLB Endpoints

GWLB endpoints are a new type of endpoint, recently added following the introduction of the GWLB service. GWLB provides inline traffic analysis for when you want to use specific virtual appliances for security inspection on AWS.

GWLB endpoints, powered by AWS PrivateLink, provide private connectivity to your gateway load balancers. A GWLB endpoint effectively consists of an ENI with a private IP address taken from the address range associated with the subnet in which it is created. To make use of this type of endpoint, you need to make sure to add the necessary routes in your subnet and gateway route tables to direct the traffic through the GWLB endpoint.

See an example of this in Figure 2.14:

Figure 2.14: GWLB endpoint

The current limitation worth noting is that, at the time of this writing, this type of endpoint does not support endpoint policies and security groups.

Gateway Endpoints

A gateway endpoint is the first type of endpoint that launched on AWS, and it has been supporting connectivity to only two AWS services ever since: Amazon S3 and Amazon DynamoDB.

A gateway endpoint is a routable object that you must add to your VPC or subnet route table to be able to leverage it, like an internet or NAT gateway on AWS. On top of that, you can specify custom access permissions for your gateway endpoint by attaching endpoint policies to it.

See an example of this Figure 2.15:

Figure 2.15: VPC gateway endpoint

You can attach several AWS gateway endpoints to any VPC. You will separate gateway endpoints, one for each service (S3 or DynamoDB) that you want to access, and then if you require different access permissions for different groups of resources, you may even have different gateway endpoints for the same service within the same VPC. If you use multiple endpoints for the same service in the one VPC, you will need to set different routes to use each of these endpoints in different route tables (for each service, you can only have a single route in every route table).

The following current limitations are worth noting:

Cross-region is not supported. Gateway endpoints can only be used to reach out to AWS services in the same region as the VPC where they are set.
Endpoint connections do not extend beyond the boundaries of a VPC. You cannot leverage the gateway endpoint defined in your VPC to access a service behind
that endpoint from another VPC or from your on-premises environment, whichever network topology you may have (VPC peering, Transit Gateway, VPN, DX, and so on).

Before moving on to the next section, consider the two key aspects: resiliency and cost.

Additional Considerations

There are a few attention points when using private endpoints. You want to be mindful of resiliency and cost aspects.

AZs

Services offered by third-party providers, whether in your own organization or beyond, may not always be available in each AZ within a given AWS region.

An interface endpoint is mapped to an AZ upon creation. Therefore, it is important, especially for third-party services, to validate in which AZs they are available and to use AZ identifiers (IDs) to identify AZs uniquely and consistently across accounts. Remember the difference between the following:

An AZ name (for example, eu-west-1a) that does not necessarily map to the same AZ in two different AWS accounts
An AZ ID (for example, euw1-az1) that always refers to the same AZ across all AWS accounts

So first, you must use AZ IDs to make sure that you deploy endpoints in the right AZs where the service is also available. Secondly, it is recommended as a best practice to always deploy endpoints in at least two AZs for HA purposes.

Pricing

Gateway endpoints are provided at no charge, other than the cost generated for using the service and transferring data.

Endpoints powered by AWS PrivateLink—that is, interface endpoints and GWLB endpoints—are priced against two dimensions: the time the endpoint exists (per hour, for each AZ where the endpoint is deployed) and the amount of data that goes through it (per GB).

For enterprises, it becomes cost-efficient to centralize interface endpoints—for example, in a VPC within a central shared services or network services account—and to share them within the rest of the organization. This allows not just better control over connectivity aspects, but by avoiding duplicated interface endpoints (times the number of VPCs in use), you are able to save on costs as well, especially if the number of accounts in your organization grows significantly over time.

You are now ready to investigate yet another service that can help you optimize your organization’s network infrastructure, AWS Transit Gateway.

Introducing AWS Transit Gateway

AWS Transit Gateway is a central hub construct to interconnect multiple VPCs on AWS and on-premises networks together.

Its purpose is to do the following:

Avoid finishing with a spaghetti network topology, which is likely to happen if you start peering all your VPCs one to another.
Share common network functions across multiple VPCs such as internet and on-premises connectivity (either via VPN or AWS DX), VPC endpoints, and DNS endpoints.
Keep those essential network functions separate from the rest of your AWS environment and in a central place managed by your network experts.

AWS Transit Gateway Overview

AWS Transit Gateway is a regional network construct, so in the case where you need to operate in more than one AWS region, you would end up with (at least) one TGW in each region. If you need to establish connectivity between VPCs in different regions, you have the option to create a cross-region peering connection between two TGWs.

TGWs are highly available by design, so you do not need to rely on more than one TGW for the resiliency purposes of the network transit hub. That said, when you attach a

VPC to a TGW, you need to specify on which subnet(s) in which AZ(s) you want that attachment to be effective. So, although the TGW is highly available, it is a best practice to specify subnets in more than one AZ when attaching a VPC to make the VPC attachment itself highly available. That said, resources deployed in a subnet within a specific AZ can only reach a TGW if there exists a TGW attachment to a subnet within the same AZ. In other words, even if you specify a route in a subnet’s route table to reach the TGW, if there is no TGW attachment to a subnet in the same AZ, then the TGW will not be reachable from that subnet. So, it is key to make sure to tie one subnet in each AZ to a TGW attachment wherever your resources need access to the TGW. It is usually recommended to use a separate subnet for that in each AZ, with a small Classless Inter-Domain Routing (CIDR) range (for example, a /28) so that you keep more IP addresses for your own resources. This allows you to have distinct network ACLs for the subnets where you deploy your resources and the subnets associated with the TGW, and you can also use separate route tables for those two types of subnets.

For organizations that intend to use stateful network appliances on their AWS environment, a specific mode called appliance mode can be enabled on the TGW.

The idea is to enable that appliance mode on the VPC attachment corresponding to the VPC where the appliance is deployed. It has then the effect of routing ingress and egress traffic through the same AZ in that VPC (for the sake of statefulness), which is not guaranteed otherwise.

Another important consideration for complex organizations that may have an AWS environment spread across multiple AWS regions is that you will not be charged extra for additional TGWs. Indeed, TGW usage is priced along two dimensions: per VPC attachment and per volume of traffic (GB) going through the TGW. So, unless you decide to attach some VPCs to more than one TGW, these costs will stay the same. TGW peering does not affect the costs either since there is no extra cost for peering, and the TGW traffic costs are not accounted for twice but only at one of two peered TGWs (typically at the sending TGW). The only additional costs in the case of cross-region peering between two TGWs would be inter-region data transfer charges.

Routing with AWS Transit Gateway

AWS Transit Gateway supports both dynamic and static routing. By default, the network elements (VPCs; VPN or DX connections; peered TGWs) attached to a TGW are associated with its default route table, unless otherwise specified. You naturally have the choice to organize routing as you please by creating additional routing tables and then associating each network element attached to the TGW with the routing table of your liking.

The routes that are defined in those routing tables can be defined statically or dynamically. When you attach a network element to a TGW, you specify whether you want the routes coming from that element to be automatically propagated to the TGW route table associated with that element. If you prefer not to, you must specify routing statically to and from the TGW.

Routes can be propagated automatically both from your on-premises networks connected to the TGW via VPN or DX and from your VPCs attached to the TGW. In the first case, routes are advertised back and forth using BGP between the TGW and your on-premises network equipment on the other end of the VPN or DX connection. In the case of VPCs, the routes are propagated from the VPCs to the TGW but not back to the VPCs from the TGW. You then need to update your VPCs’ route table, creating static routes for your VPCs to communicate with the TGW.

One more thing worth mentioning on routing is that Transit Gateway cannot handle VPC attachments when some VPCs contain IP addresses overlapping with each other. Thus, when you want to attach a set of VPCs (or on-premises networks) that may have overlapping IP addresses to a TGW, you need to deal with the overlapping IP addresses first. Going into more details on how exactly to do this goes beyond the scope of this chapter, but make sure to find a solution to that problem before attempting to connect these networks to a TGW. Multiple solutions exist out there, such as network address translation (NAT), leveraging IP version 6 (IPv6) instead of IP version 4 (IPv4) addresses, or leveraging a third-party solution to do the magic for you (typically through NATing).