What do you get with Print?

Instant access to your digital copy whilst your Print order is Shipped

Paperback book shipped to your preferred address

Redeem a companion digital copy on all Print orders

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

Mastering DynamoDB

Chapter 1. Getting Started

Amazon DynamoDB is a fully managed, cloud-hosted, NoSQL database. It provides fast and predictable performance with the ability to scale seamlessly. It allows you to store and retrieve any amount of data, serving any level of network traffic without having any operational burden. DynamoDB gives numerous other advantages like consistent and predictable performance, flexible data modeling, and durability.

With just few clicks on the Amazon Web Services console, you are able create your own DynamoDB table and scale up or scale down provision throughput without taking down your application even for a millisecond. DynamoDB uses Solid State Disks (SSD) to store the data which confirms the durability of the work you are doing. It also automatically replicates the data across other AWS Availability Zones, which provides built-in high availability and reliability.

In this chapter, we are going to revise our concepts about the DynamoDB and will try to discover more about its features and implementation.

Before we start discussing details about DynamoDB, let's try to understand what NoSQL databases are and when to choose DynamoDB over Relational Database Management System (RDBMS). With the rise in data volume, variety, and velocity, RDBMSes were neither designed to cope up with the scale and flexibility challenges developers are facing to build the modern day applications, nor were they able to take advantage of cheap commodity hardware. Also, we need to provide a schema before we start adding data, and this restricted developers from making their application flexible. On the other hand, NoSQL databases are fast, provide flexible schema operations, and make effective use of cheap storage.

Considering all these things, NoSQL is becoming popular very quickly amongst the developer community. However, one has to be very cautious about when to go for NoSQL and when to stick to RDBMS. Sticking to relational databases makes sense when you know that the schema is more over static, strong consistency is must, and the data is not going to be that big in volume.

However, when you want to build an application that is Internet scalable, the schema is more likely to get evolved over time, the storage is going to be really big, and the operations involved are okay to be eventually consistent. Then, NoSQL is the way to go.

There are various types of NoSQL databases. The following is the list of NoSQL database types and popular examples:

Document Store: MongoDB, CouchDB, MarkLogic
Column Store: Hbase, Cassandra
Key Value Store: DynamoDB, Azure, Redis
Graph Databases: Neo4J, DEX

Most of these NoSQL solutions are open source except for a few like DynamoDB and Azure, which are available as a service over the Internet. DynamoDB being a key-value store indexes data only upon primary keys, and one has to go through the primary key to access certain attributes. Let's start learning more about DynamoDB by having a look at its history.

DynamoDB's history

Amazon's e-commerce platform had a huge set of decoupled services developed and managed individually, and each and every service had an API to be used and consumed by others. Earlier, each service had direct database access, which was a major bottleneck. In terms of scalability, Amazon's requirements were more than any third-party vendors could provide at that time.

DynamoDB was built to address Amazon's high availability, extreme scalability, and durability needs. Earlier, Amazon used to store its production data in relational databases and services had been provided for all required operations. However, they later realized that most of the services access data only through its primary key and they need not use complex queries to fetch the required data, plus maintaining these RDBMS systems required high-end hardware and skilled personnel. So, to overcome all such issues, Amazon's engineering team built a NoSQL database that addresses all the previously mentioned issues.

In 2007, Amazon released one research paper on Dynamo that combined the best of ideas from the database and key-value store worlds, which was inspiration for many open source projects at the time. Cassandra, Voldemort, and Riak were a few of them. You can find the this paper at http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf.

Even though Dynamo had great features that took care of all engineering needs, it was not widely accepted at that time in Amazon, as it was not a fully managed service. When Amazon released S3 and SimpleDB, engineering teams were quite excited to adopt these compared to Dynamo, as DynamoDB was a bit expensive at that time due to SSDs. So, finally after rounds of improvement, Amazon released Dynamo as a cloud-based service, and since then, it is one the most widely used NoSQL databases.

Before releasing to a public cloud in 2012, DynamoDB was the core storage service for Amazon's e-commerce platform, which started the shopping cart and session management service. Any downtime or degradation in performance had a major impact on Amazon's business, and any financial impact was strictly not acceptable, and DynamoDB proved itself to be the best choice in the end. Now, let's try to understand in more detail about DynamoDB.

Data model concepts

To understand DynamoDB better, we need to understand its data model first. DynamoDB's data model includes Tables, Items, and Attributes. A table in DynamoDB is nothing but what we have in relational databases. DynamoDB tables need not have fixed schema (number of columns, column names, their data types, column order, and column size). It needs only the fixed primary key, its data type, and a secondary index if needed, and the remaining attributes can be decided at runtime. Items in DynamoDB are individual records of the table. We can have any number of attributes in an item.

DynamoDB stores the item attributes as key-value pairs. Item size is calculated by adding the length of attribute names and their values.

Tip

DynamoDB has an item-size limit of 64 KB; so, while designing your data model, you have to keep this thing in mind that your item size must not cross this limitation. There are various ways of avoiding the over spill, and we will discuss such best practices in Chapter 4, Best Practices.

The following diagram shows the data model hierarchy of DynamoDB:

Here, we have a table called Student, which can have multiple items in it. Each item can have multiple attributes that are stored in key–value pairs. We will see more details about the data models in Chapter 2, Data Models.

Operations

DynamoDB supports various operations to play with tables, items, and attributes.

Table operations

DynamoDB supports the create, update, and delete operations at the table level. It also supports the UpdateTable operation, which can be used to increase or decrease the provisioned throughput. We have the ListTables operation to get the list of all available tables associated with your account for a specific endpoint. The DescribeTable operation can be used to get detailed information about the given table.

Item operations

Item operations allows you to add, update, or delete an item from the given table. The UpdateItem operation allows us to add, update, or delete existing attributes from a given item.

The Query and Scan operations

The Query and Scan operations are used to retrieve information from tables. The Query operation allows us to query the given table with provided hash key and range key. We can also query tables for secondary indexes. The Scan operation reads all items from a given table. More information on operations can be found in Chapter 2, Data Models.

Provisioned throughput

Provisioned throughput is a special feature of DynamoDB that allows us to have consistent and predictable performance. We need to specify the read and write capacity units. A read capacity unit is one strongly consistent read and two eventually consistent reads per second unit for an item as large as 4 KB, whereas one write capacity unit is one strongly consistent write unit for an item as large as 1 KB. A consistent read reflects all successful writes prior to that read request, whereas a consistent write updates all replications of a given data object so that a read on this object after this write will always reflect the same value.

For items whose size is more than 4 KB, the required read capacity units are calculated by summing it up to the next closest multiple of 4. For example, if we want to read an item whose size is 11 KB, then the number of read capacity units required is three, as the nearest multiple of 4 to 11 is 12. So, 12/4 = 3 is the required number of read capacity units.

Required Capacity Units For	Consistency	Formula
Reads	Strongly consistent	No. of Item reads per second * Item Size
Reads	Eventually consistent	Number of Item reads per second * Item Size/2
Writes	NA	Number of Item writes per second * Item Size

If our application exceeds the maximum provisioned throughput for a given table, then we get notified with a proper exception. We can also monitor the provisioned and actual throughput from the AWS management console, which will give us the exact idea of our application behavior. To understand it better, let's take an example. Suppose, we have set the write capacity units to 100 for a certain table and if your application starts writing to the table by 1,500 capacity units, then DynamoDB allows the first 1,000 writes and throttles the rest. As all DynamoDB operations work as RESTful services, it gives the error code 400 (Bad Request).

If you have items smaller than 4 KB, even then it will consider it to be a single read capacity unit. We cannot group together multiple items smaller than 4 KB into a single read capacity unit. For instance, if your item size is 3 KB and if you want to read 50 items per second, then you need to provision 50 read capacity units in a table definition for strong consistency and 25 read capacity units for eventual consistency.

If you have items larger than 4 KB, then you have to round up the size to the next multiple of 4. For example, if your item size is 7 KB (~8KB) and you need to read 100 items per second, then the required read capacity units would be 200 for strong consistency and 100 capacity units for eventual consistency.

In the case of write capacity units, the same logic is followed. If the item size is less than 1 KB, then it is rounded up to 1 KB, and if item size is more than 1 KB, then it is rounded up to next multiple of 1.

The AWS SDK provides auto-retries on ProvisionedThroughputExceededException when configured though client configuration. This configuration option allows us to set the maximum number of times HttpClient should retry sending the request to DynamoDB. It also implements the default backoff strategy that decides the retry interval.

The following is a sample code to set a maximum of three auto retries:

   // Create a configuration objectfinal ClientConfiguration cfg = new ClientConfiguration();// Set the maximum auto-reties to 3cfg.setMaxErrorRetry(3);
    // Set configuration object in Clientclient.setConfiguration(cfg);

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

DynamoDB features

Like we said earlier, DynamoDB comes with enormous scalability and high availability with predictable performance, which makes it stand out strong compared to other NoSQL databases. It has tons of features; we will discuss some of them.

Fully managed

DynamoDB allows developers to focus on the development part rather than deciding which hardware to provision, how to do administration, how to set up the distributed cluster, how to take care of fault tolerance, and so on. DynamoDB handles all scaling needs; it partitions your data in such a manner that the performance requirements get taken care of. Any distributed system that starts scaling is an overhead to manage but DynamoDB is a fully managed service, so you don't need to bother about hiring an administrator to take care of this system.

Durable

Once data is loaded into DynamoDB, it automatically replicates the data into different availability zones in a region. So, even if your data from one data center gets lost, there is always a backup in another data center. DynamoDB does this automatically and synchronously. By default, DynamoDB replicates your data to three different data centers.

Scalable

DynamoDB distributes your data on multiple servers across multiple availability zones automatically as the data size grows. The number of servers could be easily from hundreds to thousands. Developers can easily write and read data of any size and there are no limitations on data size. DynamoDB follows the shared-nothing architecture.

Fast

DynamoDB serves at a very high throughput, providing single-digit millisecond latency. It uses SSD for consistent and optimized performance at a very high scale. DynamoDB does not index all attributes of a table, saving costs, as it only needs to index the primary key, and this makes read and write operations superfast. Any application running on an EC2 instance will show single-digit millisecond latency for an item of size 1 KB. The latencies remain constant even at scale due to the highly distributed nature and optimized routing algorithms.

Simple administration

DynamoDB is very easy to manage. The Amazon web console has a user-friendly interface to create tables and provide necessary details. You can simply start using the table within a few minutes. Once the data load starts, you don't need to do anything as rest is taken care by DynamoDB. You can monitor Amazon CloudWatch for the provision throughput and can make changes to read and write capacity units accordingly if needed.

Fault tolerance

DynamoDB automatically replicates the data to multiple availability zones which helps in reducing any risk associated with failures.

Flexible

DynamoDB, being a NoSQL database, does not force users to define the table schema beforehand. Being a key-value data store, it allows users to decide what attributes need to be there in an item, on the fly. Each item of a table can have different number of attributes.

Rich Data ModelDynamoDB has a rich data model, which allows a user to define the attributes with various data types, for example, number, string, binary, number set, string set, and binary set. We are going to talk about these data types in Chapter 2, Data Models, in detail.

Indexing

DynamoDB indexes the primary key of each item, which allows us to access any element in a faster and efficient manner. It also allows global and local secondary indexes, which allows the user to query on any non-primary key attribute.

Secure

Each call to DynamoDB makes sure that only authenticated users can access the data. It also uses the latest and effective cryptographic techniques to see your data. It can be easily integrated with AWS Identity and Access Management (IAM), which allows users to set fine-grained access control and authorization.

Cost effective

DynamoDB provides a very cost-effective pricing model to host an application of any scale. The pay-per-use model gives users the flexibility to control expenditure. It also provides free tier, which allows users 100 MB free data storage with 5 writes/second and 10 reads/second as throughput capacity. More details about pricing can be found at http://aws.amazon.com/dynamodb/pricing/.

How do I get started?

Now that you are aware of all the exciting features of DynamoDB, I am sure you are itching to try out your hands on it. So let's try to create a table using the Amazon DynamoDB management console. The pre-requisite to do this exercise is having a valid Amazon account and a valid credit card for billing purposes. Once the account is active and you have signed up for the DynamoDB service, you can get started directly. If you are new to AWS, more information is available at http://docs.aws.amazon.com/gettingstarted/latest/awsgsg-intro/gsg-aws-intro.html.

Amazon's infrastructure is spread across almost 10 regions worldwide and DynamoDB is available in almost all regions. You can check out more details about it at https://aws.amazon.com/about-aws/globalinfrastructure/regional-product-services/.

Creating a DynamoDB table using the AWS management console

Perform the following steps to create a DynamoDB table using the AWS management console:

Go to the Amazon DynamoDB management console at https://console.aws.amazon.com/dynamodb, and you will get the following screenshot:
Click on the Create Table button and you will see a pop-up window asking for various text inputs. Here, we are creating a table called Employee having emp_id as the hash key and email as the range key, as shown in the following screenshot:
Once you click on the Continue button, you will see the next window asking to create indexes, as shown in the next screenshot. These are optional parameters; so, if you do not wish to create any secondary indexes, you can skip this and click on Continue. We are going to talk about the indexes in Chapter 2, Data Models.
Once you click on the Continue button again, the next page will appear asking for provision throughput capacity units. We have already talked about the read and write capacity; so, depending upon your application requirements, you can give the read and write capacity units in the appropriate text box, as shown in the following screenshot:
The next page will ask whether you want to set any throughput alarm notifications for this particular table. You can provide an e-mail ID on which you wish to get the alarms, as shown in the following screenshot. If not, you can simply skip it.
Once you set the required alarms, the next page would be a summary page confirming the details you have provided. If you see all the given details are correct, you can click on the Create button, as shown in the following screenshot:
Once the Create button is clicked, Amazon starts provisioning the hardware and other logistics in the background and takes a couple of minutes to create the table. In the meantime, you can see the table creations status as CREATING on the screen, as shown in the following screenshot:
Once the table is created, you can see the status changed to ACTIVE on the screen, as shown in the following screenshot:
Now that the table Employee is created and active, let's try to put an item in it. Once you double-click on the Explore Table button, you will see the following screen:
You can click on the New Item button to add a new record to the table, which will open up a pop up asking for various attributes that we wish to add in this record. Earlier, we had added emp_id and email as hash and range key, respectively. These are mandatory attributes we have to provide with some optional attributes if you want to, as shown in the following screenshot:
Here, I have added two extra attributes, name and company, with some relevant values. Once done, you can click on the Put Item button to actually add the item to the table.
You can go to the Browse Items tab to see whether the item has been added. You can select Scan to list down all items in the Employee table, which is shown in the following screenshot:

In Chapter 2, Data Models, we will be looking for various examples in Java, .Net, and PHP to play around with tables, items, and attributes.

DynamoDB Local

DynamoDB is a lightweight client-side database that mimics the actual DynamoDB database. It enables users to develop and test their code in house, without consuming actual DynamoDB resources. DynamoDB Local supports all DynamoDB APIs, with which you can run your code like running on an actual DynamoDB.

To use DynamoDB Local, you need to run a Java service on the desired port and direct your calls from code to this service. Once you try to test your code, you can simply redirect it to an actual DynamoDB.

So, using this, you can code your application without having full Internet connectivity all the time, and once you are ready to deploy your application, simply make a single line change to point your code to an actual DynamoDB and that's it.

Installing and running DynamoDB Local is quite easy and quick; you just have to perform the following steps and you can get started with it:

Download the DynamoDB Local executable JAR, which can be run on Windows, Mac, or Linux. You can download this JAR file from http://dynamodb-local.s3-website-us-west-2.amazonaws.com/dynamodb_local_latest.
This JAR file is compiled on version 7 of JRE, so it might not be suitable to run on the older JRE version.
The given ZIP file contains two important things: a DynamoDBLocal_lib folder that contains various third-party JAR files that are being used, and DynamoDBLocal.jar which contains the actual entry point.
Once you unzip the file, simply run the following command to get started with the local instance:
```
java -Djava.library.path=. -jar DynamoDBLocal.jar
```
Once you press Enter, the DynamoDB Local instance gets started, as shown in the following screenshot:
By default, the DynamoDB Local service runs on port 8000.
In case you are using port 8000 for some other service, you can simply choose your own port number by running the following command:
```
java -Djava.library.path=. -jar DynamoDBLocal.jar --port <YourPortNumber>
```

Now, let's see how to use DynamoDB Local in the Java API. The complete implementation remains the same; the only thing that we need to do is set the endpoint in the client configuration as http://localhost:8000.

Using DynamoDB for development in Java is quite easy; you just need to set the previous URL as the endpoint while creating DynamoDB Client, as shown in the following code:

// Instantiate AWS Client with proper credentials
AmazonDynamoDBClient dynamoDBClient = new AmazonDynamoDBClient(
  new ClasspathPropertiesFileCredentialsProvider());
Region usWest2 = Region.getRegion(Regions.US_WEST_2);
  dynamoDBClient.setRegion(usWest2);
// Set DynamoDB Local Endpoint
  dynamoDBClient.setEndpoint("http://localhost:8000");

Once you are comfortable with your development and you are ready to use the actual DynamoDB, simply remove the highlighted line from the previous code snippet and you are done. Everything will work as expected.

DynamoDB Local is useful but before using it, we should make a note of following things:

DynamoDB Local ignores the credentials you have provided.
The values provided in the access key and regions are used to create only the local database file. The DB file gets created in the same folder from where you are running your DynamoDB Local.
DynamoDB Local ignores the settings provided for provision throughput. So, even if you specify the same at table creation, it will simply ignore it. It does not prepare you to handle provision throughput exceeded exceptions, so you need to be cautious about handling it in production.
Last but not least, DynamoDB Local is meant to be used for development and unit testing purposes only and should not be used for production purposes, as it does not have durability or availability SLAs.

Description

If you have interest in DynamoDB and want to know what DynamoDB is all about and become proficient in using it, this is the book for you. If you are an intermediate user who wishes to enhance your knowledge of DynamoDB, this book is aimed at you. Basic familiarity with programming, NoSQL, and cloud computing concepts would be helpful.

What you will learn

Comprehend the DynamoDB data model and how to build the efficient schema of DynamoDB tables

Decipher the architecture of DynamoDB and its core features

Understand how DynamoDB manages ring membership and handles partial failures

Get acquainted with the AWS security token service and learn how DynamoDB deals with authentication and authorization

Integrate DynamoDB with other AWS services in order to form a complete application ecosystem on AWS Cloud

Explore thirdparty tools and libraries to efficiently use DynamoDB to help to autoscale, test, and back up/archive

Familiarize yourself with mobile application development using DynamoDB at the backend

What do you get with Print?

Instant access to your digital copy whilst your Print order is Shipped

Paperback book shipped to your preferred address

Redeem a companion digital copy on all Print orders

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

Frequently bought together

$48.99

$48.99

$47.99

Total $ 145.97

A. Zubarev Nov 04, 2014

Databases are very much in the spotlight lately and especially the NoSQL breed. While there are dozens of offerings on the market only a handful tops the list, one such offspring in the key-value area is Amazon's DynamoDB. Being a close relative to such popular players on this arena as Redis or Voldemort DynamoDB I figured has many unique points, add-ons and a strong backing by the user community, not only the mighty Amazon corporation. Mastering DynamoDB as a book came out at a very strategic time.It is a great technical read, too. Tanmay (the author) walks you gently into the wonderful NoSQL database world. Then the book takes you, arm with DynamoDB, and make a fearless traveller sailing through high seas of today’s turbulent and fierce data streams and make you prowl the dark alleys of handling the data in the Cloud.The book is structured so it devotes its several first chapters to the nitty-gritties of the DynamoDB and then explains on best practices and best usage scenarios. The book has an advanced chapter for those who like the extremes. For example relational integrity is suddenly discussed in a book about NoSQL (no schema or structure supposed to be there the core, alas not so fast). The book tastefully ends with an overview of the top 10 or so of the sheer third party offerings from either Amazon itself or GitHubers.The best one I liked is the local DynamoDB and the ability to conduct transactions. The module that allows to scale the database appeared to be very much of value, but frankly I was surprised it is not written by Amazon itself. To say more, the design decision of having a developer (or perhaps an admin) being responsible for assigning and provisioning compute throughput for each table made my eyebrows raise.The author appeared very savvy in the subject of Cloud Data (perhaps I coined it), I actually learned quite a few interesting techniques and found out that Amazon has SLAs for each component, even for their internal systems and especially such a crucial piece as DynamoDB. And they are tight SLAs. Yet, make a lot of sense to me. Nobody argues Amazon does not successfully process huge volumes of data, fast.Anyway, I liked the book and the author much, heck, perhaps even more than the DynamoDB as a database itself.

Amazon Verified review

FRANC C CARTER Oct 18, 2014

Mastering DynamoDB is a medium level introduction to most of the features, uses cases and concepts of DynamoDB. The only obvious omission is that configuration with Cloudformation is not covered, however you could argue that this is best left to a book covering Cloudformation.The section I found most valuable was the third party tools which fill useful gaps not covered by DynamoDB itself at the moment. Given the range of coverage I would expect that most people will find useful pieces of information unless they already have comprehensive experience with DynamoDB.The main weakness is the colloquial and conversational style. This along with a lack of diagrams made the indexing section hard to follow. I would have liked to see more diagrams showing the index designs along with more succinct language. I found Amazon's description of indexing to be clearer than Mastering DynamoDB.

Steven Chu Jan 12, 2023

First the good things:- Limitations (e.g. 10GB table max size when using local indexes, max item size of 400KB, etc.)- A nice enumeration of interesting libraries (transaction libraries, geo libraries, etc.)Now the many, many bad things:- LOTs of wasted space showing the same query API in 4 different programming languages- Wasted space showing architecture diagrams that essentially repeat previous chapters -- and do so for essentially the same exact type of web architectures- No depth on the internals of DynamoDB; I understand that this is AWS proprietary software, but the author did not even take a practical approach to discuss things that other DBs do; for example, I come from a MySQL world where heuristics such as no DB over 1TB since performance can degrade beyond this point, not using foreign keys because they make online schema changes hard, etc. are all lessons learned from years of battle-tested DBA experience; I'd hope for the same lessons in a book dedicated to DynamoDBSave your money and try and read articles and watch YouTube or maybe wait for a better book.

Chris Snow Aug 13, 2015

This book has poor editing and concepts are not clearly explained. After the confusing description of Hash and Range keys in chapter 2, I put the book away and instead followed a DynamoDB tutorial on youtube and also the hands on tutorial in the Dynamo DB Local Javascript Shell.

Omar Nov 08, 2017

Either book was rushed and/or review was poor. One too many confusing statements that'll force you to go online to make sense of the topic anyway. Used the book as more of an outline rather than an in-depth analysis into Dynamodb. Most of the content seems bloated/irrelevant in order to fill up pages.

Mastering DynamoDB: Master the intricacies of the NoSQL database DynamoDB to take advantage of its fast performance and seamless scalability

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Table operations

Item operations

The Query and Scan operations

Fully managed

Durable

Scalable

Fast

Simple administration

Fault tolerance

Flexible

Indexing

Secure

Cost effective

Description

What you will learn

Product Details

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the author

FAQs

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access