Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
AWS Certified Database – Specialty (DBS-C01) Certification Guide

You're reading from  AWS Certified Database – Specialty (DBS-C01) Certification Guide

Product type Book
Published in May 2022
Publisher Packt
ISBN-13 9781803243108
Pages 472 pages
Edition 1st Edition
Languages
Author (1):
Kate Gawron Kate Gawron
Profile icon Kate Gawron

Table of Contents (24) Chapters

Preface 1. Part 1: Introduction to Databases on AWS
2. Chapter 1: AWS Certified Database – Specialty Overview 3. Chapter 2: Understanding Database Fundamentals 4. Chapter 3: Understanding AWS Infrastructure 5. Part 2: Workload-Specific Database Design
6. Chapter 4: Relational Database Service 7. Chapter 5: Amazon Aurora 8. Chapter 6: Amazon DynamoDB 9. Chapter 7: Redshift and DocumentDB 10. Chapter 8: Neptune, Quantum Ledger Database, and Timestream 11. Chapter 9: Amazon ElastiCache 12. Part 3: Deployment and Migration and Database Security
13. Chapter 10: The AWS Schema Conversion Tool and AWS Database Migration Service 14. Chapter 11: Database Task Automation 15. Chapter 12: AWS Database Security 16. Part 4: Monitoring and Optimization
17. Chapter 13: CloudWatch and Logging 18. Chapter 14: Backup and Restore 19. Chapter 15: Troubleshooting Tools and Techniques 20. Part 5: Assessment
21. Chapter 16: Exam Practice
22. Chapter 17: Answers 23. Other Books You May Enjoy

Chapter 11: Database Task Automation

Automation is the practice of creating scripts, code, or programs to allow operational and development activities to be carried out automatically with minimal user involvement. Automation can be as simple as creating a script you can schedule to run at fixed time intervals to inspect a database, or it can be an entire package that deploys and configures an entire application stack within AWS. There is an IT field called Development Operations (DevOps) that specializes in using automation techniques to reduce failure, improve deployment speed and accuracy, and create systems that can fix themselves if something goes wrong. For the Database Specialty exam, we won't need to know advanced DevOps skills and tools, but questions on automation techniques that are specific to databases will be asked, so it's important to understand AWS automation techniques at a high level. By the end of this chapter you will be confidently able to use CloudFormation...

Technical requirements

For this chapter, you will require an AWS account with root access. Not everything we will do in this chapter will be available in the free tier, which means it may cost you a small amount to follow the hands-on sections. You will also require Command-line Interface (CLI) AWS access. The AWS guide at https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html explains the steps you must follow, but I will summarize them here:

  1. Create an AWS account if you have not already done so.
  2. Download the latest version of the AWS CLI from https://docs.aws.amazon.com/cli/latest/userguide/welcome-versions.html#welcome-versions-v2.
  3. Create an admin user at https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html.
  4. Create an access key for your administration user: https://docs.aws.amazon.com/IAM/latest/UserGuide/getting-started_create-admin-group.html#getting-started_create-admin-group-cli.
  5. Run the aws configure command...

Overview of automation techniques

One of the fundamental benefits of cloud technologies is the ability to use code to describe and build your infrastructure. This is called Infrastructure as Code (IaC). You can use IaC techniques on-premises as well but often, you will be limited by physical restrictions such as running out of storage within your storage arrays or running out of physical CPU cores on your virtual machine coordinators (hypervisors, for example). While the same physical restrictions can impact a cloud deployment, a capacity outage on a cloud platform is extremely rare. Using IaC on-premises is also often complex due to a wide variety of technologies that do not use a command interface, programming language, or application programming interfaces (APIs).

IaC allows you to create code that can be run multiple times to create exact copies of the same infrastructure, which is extremely useful when you're creating test and development environments. You can use code...

Understanding AWS automation

AWS offers a wide range of automation tools that you can use to achieve different things. Some of the tools specialize in working with application functionality, while some are used with containers. Containers are self-contained modules in which an application can be deployed, along with all the dependencies needed to run it, such as a Java runtime environment. Containers are not covered within the Database Specialty exam, but there is a link about this in the Further reading section if you'd like to know more.

First, let's look at a tool we have used previously in this book – the AWS command-line interface (AWS CLI).

AWS command-line interface (AWS CLI)

The AWS CLI is a command-line tool you can download from AWS. It runs on Windows, macOS, and most Linux distributions. Once downloaded, installed, and configured, the AWS CLI allows you to interact with AWS services using text-based commands. The CLI is very powerful and can be...

Creating infrastructure using CloudFormation

Now, let's create a CloudFormation template that will create a full database stack for us. The template we are going to make and then launch will create and configure the following:

  • An RDS MySQL instance
  • A parameter group for the database
  • Security group rules to let anyone access the database on port 3306

To do this, we are going to use a template from within this book's GitHub repository that can be modified if required. This template contains variables called parameters, which allow us to pass values to the CloudFormation service at runtime. This allows us to reuse the same template and create multiple databases.

Before you begin, download the Chapter11.yaml file from GitHub. You will also need to know which VPC to deploy in and which subnets to use. If you have more than one VPC, you will need to ensure you chose the correct ones when creating the stack. If you have do not have a VPC with at least...

AWS Glue

AWS Glue is a fully managed, serverless data integration and ETL service. It can extract, manipulate, and transform data from a wide range of sources, allowing you to create accurate data models that can be imported into a database, loaded into an analytics platform, or used for machine learning models.

AWS Glue can be controlled using both the Console and CLI commands to allow you to configure automated data handling and data loading into your databases.

There are three components that AWS Glue uses:

  • AWS Glue Data Catalog: This is a central repository that holds information about your data. It acts as an index to your schema and data stores, which helps control your ETL jobs.
  • Job Scheduling System: This is a highly customizable scheduler. It can handle not only time-based scheduling but also contains options to allow it to watch for new files or new data to be processed, as well as event-driven scheduling.
  • ETL Engine: AWS Glue's ETL engine is the...

Amazon Athena

Amazon Athena is a serverless, data querying service. It is designed to allow you to run queries against data stored within an AWS S3 bucket without needing to import it into a database first. Athena uses a SQL programming language called Presto, which supports common SQL syntax such as joins and where clauses. Athena can connect to data within an S3 bucket on its own, or it can use a schema that's been created by AWS Glue. If you do not use AWS Glue, then Athena cannot use indexes or partitions to help speed up your queries, so Athena without Glue is only suitable for smaller datasets.

Athena offers a lot of benefits around querying data without you having to import it into a database first, but it also has some restrictions that you'll need to know for the Database Specialty exam. Let's look at some of the benefits and limitations of using Amazon Athena:

  • The following are the benefits:
    • Uses SQL: You can use SQL syntax to run the queries. This...

Querying data within an S3 bucket using AWS Glue and Amazon Athena

In this hands-on lab, we are going to use some public sample flight data that is stored within a public S3 bucket to create an AWS Glue table. Then, we are going to run queries against that AWS Glue table to find out some flight information. Let's get started:

  1. Log in to the AWS Console and navigate to AWS Glue.
  2. Click on Crawlers from the main left-hand menu and then click Add crawler.
  3. Enter DBCertCrawler for Crawler name and click Next.
  4. Leave all the defaults on the Specify crawler source type page as-is:

Figure 11.8 – Specify crawler source type

  1. On the next page, leave the data source as S3 and click Add connection. Complete the popup by using the following details:
    1. Name: DBCertFlight
    2. Include path: s3://athena-examples/flight/

The following screenshot shows how the form should be completed:

Figure 11.9 – Add a data store...

Summary

In this chapter, we learned about three different tools that are commonly used with AWS to automate infrastructure creation and administration – that is, the AWS CLI, CloudFormation, and CDK. Then, we learned how to automate how to load and handle data from S3 using AWS Glue and Amazon Athena.

Regarding automation, we learned how to create a CloudFormation stack using YAML or JSON templates and how to launch those stacks using both the AWS Console and the AWS CLI. We learned how we can use parameters within our stacks to allow the same code to be reused to create a controlled and automated method to create databases.

We finished this chapter by learning how to create an ETL job using AWS Glue and how to use Amazon Athena to query the data that's held within S3 without having to import it into a database first.

In the next chapter, we are going to learn about database security. We came across a few different database security tools and features earlier in...

Cheat sheet

This cheat sheet summarizes the key points from this chapter:

  • You can use a variety of tools to automate your AWS processes by using the AWS CLI, CloudFormation, and the CDK, depending on the use case.
  • The AWS CLI is well suited for running creation tasks or for obtaining the status and information about your AWS infrastructure and services.
  • CloudFormation is used to create stacks. Stacks are groups of AWS components that should be deployed together. They can be used to create a full application stack containing a VPC, security groups, EC2 servers, RDS databases, and almost all other AWS services.
  • CloudFormation can offer deletion protection to stop someone from accidentally deleting a stack and its components.
  • AWS Glue is used to create a metadata schema of a wide variety of data sources, such as CSV files within S3 or Amazon Redshift tables. It can also be used to create more complex ETL jobs by adding data transformation rules.
  • AWS Glue supports...

Review

Now, let's practice a few exam-style questions:

  1. Amazon Athena is being used by a large company to query data that's being held in S3 buckets in the eu-central-1 and eu-west-1 regions. The company wants to use Athena in eu-west-1 to query data from Amazon S3 in both regions. The solution must be as low-cost as possible.

What is the best solution?

  1. Enable S3 cross-region replication from eu-central-1 to eu-west-1. Run the AWS Glue crawler in eu-west-1 to create the AWS Glue Data Catalog and run Athena queries.
  2. Use AWS DMS to migrate the AWS Glue Data Catalog from eu-central-1 to eu-west-1. Run Athena queries in eu-west-1.
  3. Update the AWS Glue resource policy's IAM permissions to provide the eu-central-1 AWS Glue Data Catalog with access to eu-west-1. Once the catalog in eu-west-1 has access to the catalog in eu-central-1, run Athena queries in eu-west-1.
  4. Run the AWS Glue crawler in eu-west-1 to catalog the datasets in all regions...

Further reading

For more information on the topics that were covered in this chapter, please refer to the following resources:

lock icon The rest of the chapter is locked
You have been reading a chapter from
AWS Certified Database – Specialty (DBS-C01) Certification Guide
Published in: May 2022 Publisher: Packt ISBN-13: 9781803243108
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}