You're reading from AWS Certified Database – Specialty (DBS-C01) Certification Guide

Product type Book

Published in May 2022

Publisher Packt

ISBN-13 9781803243108

Pages 472 pages

Edition 1st Edition

Languages

Concepts

IT Certification

Author (1):

Kate Gawron

Table of Contents (24) Chapters

Preface

1. Part 1: Introduction to Databases on AWS

2. Chapter 1: AWS Certified Database – Specialty Overview

3. Chapter 2: Understanding Database Fundamentals

4. Chapter 3: Understanding AWS Infrastructure

5. Part 2: Workload-Specific Database Design

6. Chapter 4: Relational Database Service

7. Chapter 5: Amazon Aurora

8. Chapter 6: Amazon DynamoDB

9. Chapter 7: Redshift and DocumentDB

10. Chapter 8: Neptune, Quantum Ledger Database, and Timestream

11. Chapter 9: Amazon ElastiCache

12. Part 3: Deployment and Migration and Database Security

13. Chapter 10: The AWS Schema Conversion Tool and AWS Database Migration Service

14. Chapter 11: Database Task Automation

15. Chapter 12: AWS Database Security

16. Part 4: Monitoring and Optimization

17. Chapter 13: CloudWatch and Logging

18. Chapter 14: Backup and Restore

19. Chapter 15: Troubleshooting Tools and Techniques

20. Part 5: Assessment

21. Chapter 16: Exam Practice

22. Chapter 17: Answers

23. Other Books You May Enjoy

Chapter 6: Amazon DynamoDB

In this chapter, we are going to look at the first of the NoSQL databases that AWS offers, DynamoDB. DynamoDB is a major topic in the AWS Certified Database – Specialty exam, and for a large number of Database Administrators (DBAs) who have come from a relational database background, it can be one of the most difficult to understand given how differently it works to a SQL database.

This chapter will include hands-on labs where we will deploy, configure, and explore a DynamoDB table, and we will spend some time learning how to interact with a DynamoDB table using code. DynamoDB does not require a Virtual Private Cloud (VPC) to be deployed in.

In this chapter, we are going to cover the following main topics:

Overview of DynamoDB
Querying and scanning a
DynamoDB table
Working with DynamoDB records
Understanding consistency modes
Understanding high availability and backups
Understanding DynamoDBadvanced features

Technical requirements

You will require an AWS account with root access. Everything we will do in this chapter will be available with the Free Tier, which means you can run all the code examples without spending any money as long as your account has only been opened within the last 12 months. You will also require a command-line interface (CLI) with AWS access. The AWS CLI Configuration Guide, found at https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html, explains the steps required, but I will summarize them here:

Open an AWS account if you have not already done so.
Download the AWS CLI latest version from here: https://docs.aws.amazon.com/cli/latest/userguide/welcome-versions.html#welcome-versions-v2.
Create an access key for your administration user here: https://docs.aws.amazon.com/IAM/latest/UserGuide/getting-started_create-admin-group.html#getting-started_create-admin-group-cli.
Run the aws configure command to set up a profile for your...

Overview of DynamoDB

Amazon DynamoDB is a fully managed NoSQL and serverless database service that supports key-value and document data structures. It is a proprietary database engine only offered by AWS. You may recall from Chapter 2, Understanding Database Fundamentals, that a NoSQL database is a database designed to store semi-structured or non-structured data without a concrete schema. DynamoDB is a key-value database, meaning that all data is stored with a key that acts as an identifier for the data, and the values, which are the attributes. A serverless database is one for which you do not need to define the compute requirements. When you provision an RDS instance, you need to calculate the number of CPUs and amount of memory you will need. When you provision DynamoDB, you do not need to do so and you can opt to run in on-demand mode, where AWS will manage your table capacity for you. DynamoDB uses the amount of data that your application reads and writes to work out your charges...

Querying and scanning a DynamoDB table

DynamoDB has two different methods of retrieving data:

Query
Scan

DynamoDB is designed to be queried only by the key attributes. This means that if you wanted to query an attribute that wasn't a part of the key, then you would need to scan the entire table. This is fine for small tables, but as they grow in size the performance of the queries will rapidly decline. If you are from a SQL database background you can think of this in similar terms to a query being run against a table without an index. In addition to performance concerns, in DynamoDB, the more data you access in a table the more it costs, so queries that involve scanning the entire table can become costly. A query method can only be used if you are querying against the partition key, and a scan method is used if you are not using the partition key.

Let's take a look at an example using our high score table. If we want to get the high score for a player...

Working with DynamoDB records

DynamoDB uses an application programming interface (API) to control how you access it. We've used the AWS API before in previous chapters to create and work with other databases, but unlike an RDS instance, you can also use the API to run queries and to create and modify data within a DynamoDB table.

DynamoDB has seven main API methods for data manipulation and retrieval:

PutItem
GetItem
UpdateItem
DeleteItem
ExecuteStatement
Query
Scan

PutItem

The PutItem API allows you to load records into the database. You can use the following syntax:

aws dynamodb put-item \
    --table-name GameScores \
    --item '{
      "PlayerID": {"S": "KateG"}
    }' \

The first line tells DynamoDB what action you will be taking: put-item. The next line identifies the table you will be writing...

Understanding consistency modes

DynamoDB offers two different data consistency modes to handle different use cases:

Eventually consistent reads – This is the default.
Strongly consistent reads

Let's start by learning about eventually consistent reads.

DynamoDB is typically used for cases where data consistency is not critical to the application. For example, website session data can be lost without major impact; the user may have to log in again or they may have to add items back into their shopping cart but unlike a banking transaction, which must fully succeed and be consistent without exception, session data is classified as transient. As a result, DynamoDB defaults to what is called an eventually consistent read. An eventually consistent read means that each read request might not get data that has recently been updated. This is due to how DynamoDB stores its data; it does not wait for the write request to be written to each storage location before...

Understanding high availability and backups

Like other databases, DynamoDB will often become a critical part of your application, and your data needs to be resilient and recoverable to meet your application service level agreements. DynamoDB offers two methods to improve resilience and reliability:

Global tables
Backups

Let's start by looking at global tables.

Global tables

As DynamoDB is serverless and doesn't run in a VPC, the options you have for making it highly available are different from other AWS services. You cannot have a multi-AZ deployment here. DynamoDB offers a service called global tables to overcome this. Global tables allow you to configure a multi-region and active-active database deployment. DynamoDB will create an exact replica of your database across all the regions you specify, allowing you to create a highly available database system. If a table fails or becomes unavailable in one region, the traffic will automatically be routed...

Understanding DynamoDB advanced features

DynamoDB has several additional features that can be used to help support auditing and compliance requirements. Often, companies have a requirement to audit all changes made to database tables, especially the ones containing personally identifiable information (PII) such as customer names and addresses. These are the three tools that can be used:

DynamoDB Streams
CloudTrail
Time to live (TTL)

Additionally, for some very large (multi-TB) datasets or applications needing extremely fast microsecond response times, they can consider using DynamoDB Accelerator (DAX).

Let's start by looking at DynamoDB Streams and how it can be used for auditing purposes and to trigger other actions.

DynamoDB Streams

DynamoDB Streams is a time-ordered sequence of events affecting the items in your DynamoDB table. This includes inserts, deletes, and updates. The changes are written to a log, which can then be read and used by other...

Maintaining and monitoring a DynamoDB table

The main tools you will use for monitoring a DynamoDB table are CloudWatch and CloudTrail. CloudWatch monitors the performance metrics of the table, such as the number of reads and writes and throughput metrics, while CloudTrail watches and records the actual data access patterns, stores, and audit trail of changes made.

One of the main areas you will need to closely monitor with DynamoDB is the amount of data being read and written to the table. We will look at the pricing in the next section, but DynamoDB is billed based on reads and will start getting errors stating 'ProvisionedThroughputExceededException'. For the full list of common DynamoDB errors, please see https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/CommonErrors.html#CommonErrors-ThrottlingException.

The following figure shows an example of the metrics you can monitor in CloudWatch. This diagram shows that we are using more write capacity than...

Understanding DynamoDB pricing and limits

DynamoDB is a serverless managed service, which means you do not pick an instance size to control the performance. Instead, DynamoDB is charged based on how much data you read and write to your table. DynamoDB has four main components to its pricing:

Read request units
Write request units
Storage
Additional features such as DAX, global tables, and streams

Let's start by looking at read and write capacity units.

Request units

Request units are the main usage mechanism within DynamoDB. The number of requests you need for each task will depend on the amount of data being returned as well as the read/write type:

One read request will give you one strongly consistent read request or two eventually consistent requests for every 4 KB of data.
Two read requests will give you one transactional read for every 4 KB of data.
One write request will give you one standard write for every 1 KB of data...

Deploying and querying a DynamoDB table

Now that we have learned about DynamoDB and its features, let's deploy our own DynamoDB table to practice using the console and API. We will be using the GameScores table we've seen in some of the examples in this chapter to build a simple leaderboard database. We'll be using both the console and the AWS CLI for these steps.

Provisioning a DynamoDB table

We'll start by provisioning a DynamoDB table. We'll be using the Ohio (us-east-1) region:

Open the AWS console in an internet browser and log in using an account that has privileges to create and modify a DynamoDB table.
Navigate to the DynamoDB section.
Click the orange Create table button on the right side of the screen:

Figure 6.4 – Screenshot of Create resources

This will open the Create table page, allowing us to enter the details of our table. Choose the following options. Any options that are not mentioned...

Summary

In this chapter, we have learned about Amazon DynamoDB. We have learned how to create a DynamoDB table and how to use different index types to query the data. We also learned how to scan the table for cases where we cannot use an index. We learned how DynamoDB is priced and some techniques to minimize costs.

In the AWS Certified Database – Specialty exam, your knowledge of DynamoDB will be tested heavily with questions around common error codes, service limits, index types and their key features, and backup and restore methods.

In the next chapter, we will be learning about Redshift and DocumentDB, which are both AWS databases with specific use cases. We will continue to use the knowledge learned in this chapter to interact with Redshift and DocumentDB, as they have many similarities with DynamoDB.

Cheat sheet

This cheat sheet reminds you of the high-level topics and points covered in this chapter and should act as a revision guide and refresher:

Amazon DynamoDB is a low-latency, managed, and serverless NoSQL database created by AWS offering millisecond response times.
DynamoDB is serverless, which means you do not need to specify the compute needed for your workload; instead, you define capacity units that control how much data can be read and written.
You can run the table in on-demand mode, but the costs are generally higher than provisioned mode.
If you exceed the amount of capacity reserved, you can receive errors around throttling and the performance of your queries will drop.
DynamoDB stores data in items that must have a key to define them.
DynamoDB relies on two different index types, GSI and LSI, to control access to the items.
You can take manual backups or use PITR backups.
You can use global tables to provision your DynamoDB table...

Review

To check your knowledge from this chapter, here are five questions that you should now be able to answer. Remember the exam techniques from Chapter 1, AWS Certified Database – Specialty Exam Overview, and remove the clearly incorrect answers first to help you:

You are working as a developer for a small company with a DynamoDB table. The company is complaining of poor performance after increasing the number of records in the table and they say they are seeing "throttling errors." What is the most cost-efficient option for them to consider?
1. Enable TTL against a timestamp attribute.
2. Implement DAX.
3. Turn on Dynamo Streams.
4. Turn on autoscaling.
You are designing a new DynamoDB table and need to calculate how many Capacity Units (CUs) to provision. Each item is 3 KB in size and you expect to read a maximum of 100 items and write 10 per second. You will only be using eventually consistent reads and standard writes.
1. 50 RCUs and 30 WCUs
2. 100 RCUs and 10 WCUs
3. 50 RCUs...