Reader small image

You're reading from  AWS Certified Database – Specialty (DBS-C01) Certification Guide

Product typeBook
Published inMay 2022
PublisherPackt
ISBN-139781803243108
Edition1st Edition
Right arrow
Author (1)
Kate Gawron
Kate Gawron
author image
Kate Gawron

Kate Gawron is a full-time senior database consultant and part-time future racing driver. She was a competitor in Formula Woman, and she aspires to become a professional Gran Turismo (GT) racing driver. Away from the racetrack, Kate has worked with Oracle databases for 18 years and AWS for five years. She holds four AWS certifications, including the AWS Certified Database – Specialty certification as well as two professional Oracle qualifications. Kate currently works as a senior database architect, where she works with customers to migrate and refactor their databases to work optimally within the AWS cloud.
Read more about Kate Gawron

Right arrow

Chapter 8: Neptune, Quantum Ledger Database, and Timestream

In this chapter, we are going to explore and learn about three different Amazon Web Services (AWS) database technologies: Neptune, Quantum Ledger Database (QLDB), and Timestream. Each of these databases supports a specific workload type. All three are fully managed, and QLDB and Timestream are serverless databases.

Neptune is a graph database that allows you to run queries to quickly find out the connections and relationships between data items. QLDB is a database that works like an audit trail and does not allow any data to be deleted or changed. Timestream is a time-series database that allows you to work with data closely connected to timestamps, allowing you to keep an ordered record of events.

This chapter includes a hands-on lab where we will deploy, configure, and explore Neptune, QLDB, and Timestream instances, including how we can monitor and access them.

In this chapter, we're going to cover the following...

Technical requirements

You will require an AWS account with root access; everything we will do in this chapter will be unavailable in free tier, which means it will cost a small amount to follow the hands-on sections. You will also require command-line interface (CLI) AWS access. The AWS guide (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) will explain the steps required, but I will summarize here, as follows:

  1. Open an AWS account if you have not already done so.
  2. Download the AWS CLI latest version from here:

https://docs.aws.amazon.com/cli/latest/userguide/welcome-versions.html#welcome-versions-v2

  1. Create an admin user by going to the following link:

https://docs.aws.amazon.com/IAM/latest/UserGuide/getting-started_create-admin-group.html#getting-started_create-admin-group-cli

  1. Create an access key for your administration user by visiting this link:

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access...

Overview of Amazon Neptune

Amazon Neptune is a graph database. As we learned in Chapter 2, Understanding Database Fundamentals, a graph database stores information as nodes and relationships rather than in tables and indexes or documents. You use a graph database when you need to know how things connect together, or if you need to store data that has a large number of links between records and you want to improve performance when running queries to find out those links. You can have queries in a relational database management system (RDBMS) that traverse multiple tables, but the more tables and links you add to the query, the worse the performance becomes, and this is where a graph database can make a big difference.

Let's start by looking at Neptune architecture and how it is deployed within AWS in the Cloud.

Neptune architecture and features

Amazon Neptune is deployed within a VPC. When it is deployed, you control access to it using subnetworks (subnets) and security...

Working with Neptune

One of the first things to understand about graph databases is how they store data and, specifically, how Neptune stores data. Unlike RDBMS and some NoSQL systems (such as DynamoDB), graph databases do not use Structured Query Language (SQL) for querying. Instead, Neptune supports two different graph query languages: Gremlin and SPARQL Protocol and RDF Query Language (SPARQL). You can only use one language at a time in your database, and each language has its own requirements for how the data will be stored within Neptune and how you can utilize it. If you use Gremlin, the data stored will be using the Property Graph data framework, and if you choose SPARQL, you will be using the Resource Description Framework (RDF). SPARQL looks similar to SQL with SELECT and INSERT statements, but has some major differences with how it handles WHERE clauses and the syntax. Gremlin will appear unfamiliar to database administrators (DBAs) as it uses a structure more similar to...

Deploying a Neptune cluster

The first step we are going to do is to create a single, standalone Neptune cluster. Once the cluster is provisioned, we will load some data and install and configure a Gremlin client to allow us to query the data in the database.

Note

Neptune is not included in the AWS free tier, so following this lab will be chargeable by AWS. This lab is designed to keep costs at a minimum while learning the critical elements required for the exam.

Let's start by creating our Amazon Neptune cluster, as follows:

  1. Log in to the AWS console and navigate to Neptune.
  2. Click Launch Amazon Neptune, as illustrated in the following screenshot:

Figure 8.4 – Launch Amazon Neptune

  1. Complete the Create database form, as follows. If a value is not mentioned, please leave it as its default:
    • Settings | DB cluster identifier: dbcert-neptune
    • Templates: Development and Testing
    • DB instance size | DB instance class: db.t3.medium...

Overview of Amazon QLDB

Amazon QLDB is a fully managed, transparent, immutable, and cryptographically verifiable transaction log database. What does this really mean? If you consider running an UPDATE or DELETE statement against a typical RDBMS, what happens? If you have logging enabled, then that transaction should be stored in the logs, but there will be no record of the change within the database. It would be fairly simple for someone to make changes and for those logs to be deleted or lost (how long are transaction logs kept for before being deleted?), and then all record of that change is also lost. QLDB not only stores the latest version of a record after it's been updated or deleted but also stores all the previous versions within the database itself. Additionally, the database ensures that every new version of a record contains an algorithmic reference to the previous version, meaning that any attempt to modify a record without making a record of the change will cause...

Accessing a QLDB database

QLDB has three methods to query data, as follows:

  • AWS console—QLDB has a built-in graphical query tool.
  • Amazon QLDB shell—You can use a downloadable shell and connect from your local machine to the QLDB instance and run queries.
  • AWS application programming interface (API)—You can download a QLDB driver and make calls to the QLDB instance using a variety of coding languages such as Java, .NET, and Python.

These methods all use a language called PartiQL (pronounced particle) to run queries. PartiQL uses a similar structure to SQL queries, allowing you to run SELECT, UPDATE, and DELETE statements complete with WHERE clauses. Here's an example of this:

SELECT * FROM Cars AS c WHERE c.Reg IN ('BG12 YHG', 'D150 GWE');

Here is the output for the previous query. It follows a syntax called Amazon Ion, which closely resembles JavaScript Object Notation (JSON) syntax:

{
   &...

Deploying a QLDB database

Let's now use the AWS console to deploy, load data into, and query a QLDB database. First, we will create our ledger.

Note

QLDB is not included in the AWS free tier, so following this lab will be chargeable by AWS. This lab is designed to keep costs at a minimum while learning the critical elements required for the exam.

Proceed as follows:

  1. Go to the AWS console and navigate to Amazon QLDB.
  2. Click Create ledger.
  3. Enter dbcert-qldb as a value for Ledger name and leave all other values at their default settings. Click Create ledger at the bottom of the page.
  4. The database will take a few minutes to create, so wait until the Status column shows as Available.

Now, we need to load data into our ledger. We will use sample data provided by AWS for testing.

  1. Click on Getting started from the menu on the left-hand side and scroll down until you find the Sample application data section, as illustrated in the following screenshot...

Overview of Amazon Timestream

Amazon Timestream is a time-series database. A time-series database is optimized for storing and querying data saved in key pairs of time and value. It is often used when data is being stored from sensors or operations with a timestamp and associated value that need to be tracked for trending analysis.

Timestream is a fully-managed, serverless, and scalable database service specifically customized and optimized for Internet of Things (IOT) devices and application sensors, allowing you to store trillions of events per day up to 1,000 times faster than via an RDBMS. Being serverless means you do not need to define your compute values, as Timestream will automatically scale up and down depending on the current workload.

Timestream features a tier-storage solution that moves older and less frequently accessed data to a cheaper storage tier, saving costs. Timestream has its own adaptive query engine that learns your data access patterns to optimize query...

Accessing a Timestream database

Timestream can be queried using three different methods, as follows:

  • AWS console—Timestream has a built-in graphical query tool.
  • AWS CLI—You can use the AWS CLI to run both write and read queries from your local computer.
  • AWS API/software development kit (SDK)—You can download a Java Database Connectivity (JDBC) driver and make calls to Timestream using a variety of coding languages such as Java, .NET, and Python.

Timestream supports queries written in SQL, allowing you to run SELECT and INSERT statements combined with WHERE clauses to filter. You cannot run a DELETE statement from Timestream, nor run an UPDATE statement against an existing entry, as this will break the time pattern. You can also run a scheduled query. Given the time-sensitive nature of Timestream use cases, it's common to create daily or weekly reports showing trends, patterns, and exceptions. Scheduled queries are used to create views...

Loading data into Timestream

Timestream doesn't have any method to bulk-load data as it is designed to receive a large amount of data from sensors in real time rather than as a bulk data load. You can import a sample dataset for testing when you deploy Timestream.

Let's do that now—we'll learn how to deploy a Timestream database, import some sample data, and then run some queries.

Deploying a Timestream database

Let's now use the AWS console to deploy, load data into, and query a Timestream database. First, we will create our table.

Note

Timestream is not included in the AWS free tier, so following this lab will be chargeable by AWS. This lab is designed to keep costs at a minimum while learning the critical elements required for the exam.

Proceed as follows:

  1. Go to the AWS console and navigate to Amazon Timestream.
  2. Click Create database.
  3. Choose Sample database. Enter dbcert-timestream as the Database name value and leave all other values at their default values. Click Create database at the bottom of the page.
  4. A database will take be created immediately, complete with sample data loaded.
  5. Click Query editor on the left-hand menu.
  6. Select your database from the dropdown and wait for a list of tables to be loaded.
  7. Click on the table names to view columns from which you can build your queries. Type the following command...

Summary

In this chapter, we have learned about the final three new databases offered by AWS: Neptune, QLDB, and Timestream. We have learned that Neptune is a graph database fully managed by AWS and is used to define connections between records. We also learned how to use Gremlin and the Neptune Bulk Loader to query and load data.

For QLDB, we discovered what immutable means and how QLDB stores data and all historical versions, making it impossible for it to be changed without leaving a record.

Finally, we learned how to store large amounts of time-value data in Timestream that optimizes storage and queries of data from sensors or IoT devices.

We have now learned about all the different databases that AWS offers and that are covered in the exam. We have practiced working with the AWS console and the AWS CLI to create, query, and delete the databases. We have also learned how to work with other AWS services such as S3 and IAM.

In the next chapter, we are going to learn about...

Cheat sheet

This cheat sheet summarizes the main key points from this chapter, as follows:

  • Neptune is a graph database optimized for storing and querying connections between items.
  • You can use the Neptune Bulk Loader to import data in various formats from an S3 bucket using S3 endpoints.
  • Neptune supports querying using the SPARQL language, which is similar to SQL, as well as Gremlin, which is a specific graph querying language.
  • Neptune is a highly redundant, fully managed database system with options for both Multi-AZ and cross-region replication using Neptune Streams.
  • QLDB is an immutable centralized ledger database optimized for workloads that require verifiable data chains with all historic versions and modifications.
  • QLDB uses the PartiQL query language and returns data in Amazon ION format.
  • QLDB does not offer any backup or restore functionality, but you can export to S3.
  • QLDB scales automatically, so you do not need to provision compute or...

Review

Let's now review your knowledge with this quiz:

  1. You are working as a database consultant for a health insurance company. You are constructing a new Amazon Neptune database cluster, and you try to load data from Amazon S3 using the Neptune Bulk Loader from an EC2 instance in the same VPC as the Neptune database, but you receive the following error message: Unable to establish a connection to the s3 endpoint. The source URL is s3://dbcert-neptune/ and the region code is us-east-1. Kindly confirm your S3 configuration.

Which of the following activities should you take to resolve the issue? (Select two)

  1. Check that a Neptune VPC endpoint exists.
  2. Check that an Amazon S3 VPC endpoint exists.
  3. Check that Amazon EC2 has an IAM role granting read access to Amazon S3.
  4. Check that Neptune has an IAM role granting read access to Amazon S3.
  5. Check that Amazon S3 has an IAM role granting read access to Neptune.
  1. You are working with an Amazon...

Further reading

To learn the topics of this chapter in detail, you can refer to the following resources:

  • Apache TinkerPop documentation—Gremlin language:

https://tinkerpop.apache.org/gremlin.html

  • curl reference guide:

https://curl.se/docs/httpscripting.html

  • Learn Amazon SageMaker – Second Edition:

https://www.packtpub.com/product/learn-amazon-sagemaker-second-edition/9781801817950

lock icon
The rest of the chapter is locked
You have been reading a chapter from
AWS Certified Database – Specialty (DBS-C01) Certification Guide
Published in: May 2022Publisher: PacktISBN-13: 9781803243108
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Kate Gawron

Kate Gawron is a full-time senior database consultant and part-time future racing driver. She was a competitor in Formula Woman, and she aspires to become a professional Gran Turismo (GT) racing driver. Away from the racetrack, Kate has worked with Oracle databases for 18 years and AWS for five years. She holds four AWS certifications, including the AWS Certified Database – Specialty certification as well as two professional Oracle qualifications. Kate currently works as a senior database architect, where she works with customers to migrate and refactor their databases to work optimally within the AWS cloud.
Read more about Kate Gawron