Mastering DynamoDB

By Tanmay Deshpande
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this book

This book is a practical, example-oriented guide that begins with an introduction to DynamoDB, how it started, what it is, and its features. It then introduces you to DynamoDB's data model, demonstrating CRUD operations on the data model. Once you get an understanding of the data model, you will be able to dive deep into the DynamoDB architecture to understand its flexibility, scalability, and reliability.

The book also gives you plenty of best practices you should follow in order to achieve time and cost efficiency. Later, you will explore some advanced topics such as CloudWatch Monitoring, the AWS security token service, and the use of IAM to perform access control management.

The book discusses a variety of use cases that will help you get a practical sense of DynamoDB. Finally, the book ends with a discussion on using DynamoDB as a backend for Android/iOS mobile applications with sample code that will help you build your own applications.

Publication date:
August 2014
Publisher
Packt
Pages
236
ISBN
9781783551958

 

Chapter 1. Getting Started

Amazon DynamoDB is a fully managed, cloud-hosted, NoSQL database. It provides fast and predictable performance with the ability to scale seamlessly. It allows you to store and retrieve any amount of data, serving any level of network traffic without having any operational burden. DynamoDB gives numerous other advantages like consistent and predictable performance, flexible data modeling, and durability.

With just few clicks on the Amazon Web Services console, you are able create your own DynamoDB table and scale up or scale down provision throughput without taking down your application even for a millisecond. DynamoDB uses Solid State Disks (SSD) to store the data which confirms the durability of the work you are doing. It also automatically replicates the data across other AWS Availability Zones, which provides built-in high availability and reliability.

In this chapter, we are going to revise our concepts about the DynamoDB and will try to discover more about its features and implementation.

Before we start discussing details about DynamoDB, let's try to understand what NoSQL databases are and when to choose DynamoDB over Relational Database Management System (RDBMS). With the rise in data volume, variety, and velocity, RDBMSes were neither designed to cope up with the scale and flexibility challenges developers are facing to build the modern day applications, nor were they able to take advantage of cheap commodity hardware. Also, we need to provide a schema before we start adding data, and this restricted developers from making their application flexible. On the other hand, NoSQL databases are fast, provide flexible schema operations, and make effective use of cheap storage.

Considering all these things, NoSQL is becoming popular very quickly amongst the developer community. However, one has to be very cautious about when to go for NoSQL and when to stick to RDBMS. Sticking to relational databases makes sense when you know that the schema is more over static, strong consistency is must, and the data is not going to be that big in volume.

However, when you want to build an application that is Internet scalable, the schema is more likely to get evolved over time, the storage is going to be really big, and the operations involved are okay to be eventually consistent. Then, NoSQL is the way to go.

There are various types of NoSQL databases. The following is the list of NoSQL database types and popular examples:

  • Document Store: MongoDB, CouchDB, MarkLogic

  • Column Store: Hbase, Cassandra

  • Key Value Store: DynamoDB, Azure, Redis

  • Graph Databases: Neo4J, DEX

Most of these NoSQL solutions are open source except for a few like DynamoDB and Azure, which are available as a service over the Internet. DynamoDB being a key-value store indexes data only upon primary keys, and one has to go through the primary key to access certain attributes. Let's start learning more about DynamoDB by having a look at its history.

 

DynamoDB's history


Amazon's e-commerce platform had a huge set of decoupled services developed and managed individually, and each and every service had an API to be used and consumed by others. Earlier, each service had direct database access, which was a major bottleneck. In terms of scalability, Amazon's requirements were more than any third-party vendors could provide at that time.

DynamoDB was built to address Amazon's high availability, extreme scalability, and durability needs. Earlier, Amazon used to store its production data in relational databases and services had been provided for all required operations. However, they later realized that most of the services access data only through its primary key and they need not use complex queries to fetch the required data, plus maintaining these RDBMS systems required high-end hardware and skilled personnel. So, to overcome all such issues, Amazon's engineering team built a NoSQL database that addresses all the previously mentioned issues.

In 2007, Amazon released one research paper on Dynamo that combined the best of ideas from the database and key-value store worlds, which was inspiration for many open source projects at the time. Cassandra, Voldemort, and Riak were a few of them. You can find the this paper at http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf.

Even though Dynamo had great features that took care of all engineering needs, it was not widely accepted at that time in Amazon, as it was not a fully managed service. When Amazon released S3 and SimpleDB, engineering teams were quite excited to adopt these compared to Dynamo, as DynamoDB was a bit expensive at that time due to SSDs. So, finally after rounds of improvement, Amazon released Dynamo as a cloud-based service, and since then, it is one the most widely used NoSQL databases.

Before releasing to a public cloud in 2012, DynamoDB was the core storage service for Amazon's e-commerce platform, which started the shopping cart and session management service. Any downtime or degradation in performance had a major impact on Amazon's business, and any financial impact was strictly not acceptable, and DynamoDB proved itself to be the best choice in the end. Now, let's try to understand in more detail about DynamoDB.

 

What is DynamoDB?


DynamoDB is a fully managed, Internet scalable, easily administered, and cost effective NoSQL database. It is a part of database as a service-offering pane of Amazon Web Services.

The next diagram shows how Amazon offers its various cloud services and where DynamoDB is exactly placed. AWS RDS is a relational database as a service over the Internet from Amazon, while Simple DB and DynamoDB are NoSQL databases as services. Both SimpleDB and DynamoDB are fully managed, nonrelational services. DynamoDB is build considering fast, seamless scalability, and high performance. It runs on SSDs to provide faster responses and has no limits on request capacity and storage. It automatically partitions your data throughout the cluster to meet expectations, while in SimpleDB, we have a storage limit of 10 GB and can only take limited requests per second.

Also, in SimpleDB, we have to manage our own partitions. So, depending upon your need, you have to choose the correct solution.

To use DynamoDB, the first and foremost requirement is an AWS account. Through the easy-to-use AWS management console, you can directly create new tables, providing necessary information and can start loading data into the tables in few minutes.

 

Data model concepts


To understand DynamoDB better, we need to understand its data model first. DynamoDB's data model includes Tables, Items, and Attributes. A table in DynamoDB is nothing but what we have in relational databases. DynamoDB tables need not have fixed schema (number of columns, column names, their data types, column order, and column size). It needs only the fixed primary key, its data type, and a secondary index if needed, and the remaining attributes can be decided at runtime. Items in DynamoDB are individual records of the table. We can have any number of attributes in an item.

DynamoDB stores the item attributes as key-value pairs. Item size is calculated by adding the length of attribute names and their values.

Tip

DynamoDB has an item-size limit of 64 KB; so, while designing your data model, you have to keep this thing in mind that your item size must not cross this limitation. There are various ways of avoiding the over spill, and we will discuss such best practices in Chapter 4, Best Practices.

The following diagram shows the data model hierarchy of DynamoDB:

Here, we have a table called Student, which can have multiple items in it. Each item can have multiple attributes that are stored in key–value pairs. We will see more details about the data models in Chapter 2, Data Models.

Operations

DynamoDB supports various operations to play with tables, items, and attributes.

Table operations

DynamoDB supports the create, update, and delete operations at the table level. It also supports the UpdateTable operation, which can be used to increase or decrease the provisioned throughput. We have the ListTables operation to get the list of all available tables associated with your account for a specific endpoint. The DescribeTable operation can be used to get detailed information about the given table.

Item operations

Item operations allows you to add, update, or delete an item from the given table. The UpdateItem operation allows us to add, update, or delete existing attributes from a given item.

The Query and Scan operations

The Query and Scan operations are used to retrieve information from tables. The Query operation allows us to query the given table with provided hash key and range key. We can also query tables for secondary indexes. The Scan operation reads all items from a given table. More information on operations can be found in Chapter 2, Data Models.

Provisioned throughput

Provisioned throughput is a special feature of DynamoDB that allows us to have consistent and predictable performance. We need to specify the read and write capacity units. A read capacity unit is one strongly consistent read and two eventually consistent reads per second unit for an item as large as 4 KB, whereas one write capacity unit is one strongly consistent write unit for an item as large as 1 KB. A consistent read reflects all successful writes prior to that read request, whereas a consistent write updates all replications of a given data object so that a read on this object after this write will always reflect the same value.

For items whose size is more than 4 KB, the required read capacity units are calculated by summing it up to the next closest multiple of 4. For example, if we want to read an item whose size is 11 KB, then the number of read capacity units required is three, as the nearest multiple of 4 to 11 is 12. So, 12/4 = 3 is the required number of read capacity units.

Required Capacity Units For

Consistency

Formula

Reads

Strongly consistent

No. of Item reads per second * Item Size

Reads

Eventually consistent

Number of Item reads per second * Item Size/2

Writes

NA

Number of Item writes per second * Item Size

If our application exceeds the maximum provisioned throughput for a given table, then we get notified with a proper exception. We can also monitor the provisioned and actual throughput from the AWS management console, which will give us the exact idea of our application behavior. To understand it better, let's take an example. Suppose, we have set the write capacity units to 100 for a certain table and if your application starts writing to the table by 1,500 capacity units, then DynamoDB allows the first 1,000 writes and throttles the rest. As all DynamoDB operations work as RESTful services, it gives the error code 400 (Bad Request).

If you have items smaller than 4 KB, even then it will consider it to be a single read capacity unit. We cannot group together multiple items smaller than 4 KB into a single read capacity unit. For instance, if your item size is 3 KB and if you want to read 50 items per second, then you need to provision 50 read capacity units in a table definition for strong consistency and 25 read capacity units for eventual consistency.

If you have items larger than 4 KB, then you have to round up the size to the next multiple of 4. For example, if your item size is 7 KB (~8KB) and you need to read 100 items per second, then the required read capacity units would be 200 for strong consistency and 100 capacity units for eventual consistency.

In the case of write capacity units, the same logic is followed. If the item size is less than 1 KB, then it is rounded up to 1 KB, and if item size is more than 1 KB, then it is rounded up to next multiple of 1.

The AWS SDK provides auto-retries on ProvisionedThroughputExceededException when configured though client configuration. This configuration option allows us to set the maximum number of times HttpClient should retry sending the request to DynamoDB. It also implements the default backoff strategy that decides the retry interval.

The following is a sample code to set a maximum of three auto retries:

   // Create a configuration objectfinal ClientConfiguration cfg = new ClientConfiguration();// Set the maximum auto-reties to 3cfg.setMaxErrorRetry(3);
    // Set configuration object in Clientclient.setConfiguration(cfg);

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

DynamoDB features

Like we said earlier, DynamoDB comes with enormous scalability and high availability with predictable performance, which makes it stand out strong compared to other NoSQL databases. It has tons of features; we will discuss some of them.

Fully managed

DynamoDB allows developers to focus on the development part rather than deciding which hardware to provision, how to do administration, how to set up the distributed cluster, how to take care of fault tolerance, and so on. DynamoDB handles all scaling needs; it partitions your data in such a manner that the performance requirements get taken care of. Any distributed system that starts scaling is an overhead to manage but DynamoDB is a fully managed service, so you don't need to bother about hiring an administrator to take care of this system.

Durable

Once data is loaded into DynamoDB, it automatically replicates the data into different availability zones in a region. So, even if your data from one data center gets lost, there is always a backup in another data center. DynamoDB does this automatically and synchronously. By default, DynamoDB replicates your data to three different data centers.

Scalable

DynamoDB distributes your data on multiple servers across multiple availability zones automatically as the data size grows. The number of servers could be easily from hundreds to thousands. Developers can easily write and read data of any size and there are no limitations on data size. DynamoDB follows the shared-nothing architecture.

Fast

DynamoDB serves at a very high throughput, providing single-digit millisecond latency. It uses SSD for consistent and optimized performance at a very high scale. DynamoDB does not index all attributes of a table, saving costs, as it only needs to index the primary key, and this makes read and write operations superfast. Any application running on an EC2 instance will show single-digit millisecond latency for an item of size 1 KB. The latencies remain constant even at scale due to the highly distributed nature and optimized routing algorithms.

Simple administration

DynamoDB is very easy to manage. The Amazon web console has a user-friendly interface to create tables and provide necessary details. You can simply start using the table within a few minutes. Once the data load starts, you don't need to do anything as rest is taken care by DynamoDB. You can monitor Amazon CloudWatch for the provision throughput and can make changes to read and write capacity units accordingly if needed.

Fault tolerance

DynamoDB automatically replicates the data to multiple availability zones which helps in reducing any risk associated with failures.

Flexible

DynamoDB, being a NoSQL database, does not force users to define the table schema beforehand. Being a key-value data store, it allows users to decide what attributes need to be there in an item, on the fly. Each item of a table can have different number of attributes.

Rich Data ModelDynamoDB has a rich data model, which allows a user to define the attributes with various data types, for example, number, string, binary, number set, string set, and binary set. We are going to talk about these data types in Chapter 2, Data Models, in detail.

Indexing

DynamoDB indexes the primary key of each item, which allows us to access any element in a faster and efficient manner. It also allows global and local secondary indexes, which allows the user to query on any non-primary key attribute.

Secure

Each call to DynamoDB makes sure that only authenticated users can access the data. It also uses the latest and effective cryptographic techniques to see your data. It can be easily integrated with AWS Identity and Access Management (IAM), which allows users to set fine-grained access control and authorization.

Cost effective

DynamoDB provides a very cost-effective pricing model to host an application of any scale. The pay-per-use model gives users the flexibility to control expenditure. It also provides free tier, which allows users 100 MB free data storage with 5 writes/second and 10 reads/second as throughput capacity. More details about pricing can be found at http://aws.amazon.com/dynamodb/pricing/.

 

How do I get started?


Now that you are aware of all the exciting features of DynamoDB, I am sure you are itching to try out your hands on it. So let's try to create a table using the Amazon DynamoDB management console. The pre-requisite to do this exercise is having a valid Amazon account and a valid credit card for billing purposes. Once the account is active and you have signed up for the DynamoDB service, you can get started directly. If you are new to AWS, more information is available at http://docs.aws.amazon.com/gettingstarted/latest/awsgsg-intro/gsg-aws-intro.html.

Amazon's infrastructure is spread across almost 10 regions worldwide and DynamoDB is available in almost all regions. You can check out more details about it at https://aws.amazon.com/about-aws/globalinfrastructure/regional-product-services/.

Creating a DynamoDB table using the AWS management console

Perform the following steps to create a DynamoDB table using the AWS management console:

  1. Go to the Amazon DynamoDB management console at https://console.aws.amazon.com/dynamodb, and you will get the following screenshot:

  2. Click on the Create Table button and you will see a pop-up window asking for various text inputs. Here, we are creating a table called Employee having emp_id as the hash key and email as the range key, as shown in the following screenshot:

  3. Once you click on the Continue button, you will see the next window asking to create indexes, as shown in the next screenshot. These are optional parameters; so, if you do not wish to create any secondary indexes, you can skip this and click on Continue. We are going to talk about the indexes in Chapter 2, Data Models.

  4. Once you click on the Continue button again, the next page will appear asking for provision throughput capacity units. We have already talked about the read and write capacity; so, depending upon your application requirements, you can give the read and write capacity units in the appropriate text box, as shown in the following screenshot:

  5. The next page will ask whether you want to set any throughput alarm notifications for this particular table. You can provide an e-mail ID on which you wish to get the alarms, as shown in the following screenshot. If not, you can simply skip it.

  6. Once you set the required alarms, the next page would be a summary page confirming the details you have provided. If you see all the given details are correct, you can click on the Create button, as shown in the following screenshot:

  7. Once the Create button is clicked, Amazon starts provisioning the hardware and other logistics in the background and takes a couple of minutes to create the table. In the meantime, you can see the table creations status as CREATING on the screen, as shown in the following screenshot:

  8. Once the table is created, you can see the status changed to ACTIVE on the screen, as shown in the following screenshot:

  9. Now that the table Employee is created and active, let's try to put an item in it. Once you double-click on the Explore Table button, you will see the following screen:

  10. You can click on the New Item button to add a new record to the table, which will open up a pop up asking for various attributes that we wish to add in this record. Earlier, we had added emp_id and email as hash and range key, respectively. These are mandatory attributes we have to provide with some optional attributes if you want to, as shown in the following screenshot:

    Here, I have added two extra attributes, name and company, with some relevant values. Once done, you can click on the Put Item button to actually add the item to the table.

  11. You can go to the Browse Items tab to see whether the item has been added. You can select Scan to list down all items in the Employee table, which is shown in the following screenshot:

In Chapter 2, Data Models, we will be looking for various examples in Java, .Net, and PHP to play around with tables, items, and attributes.

DynamoDB Local

DynamoDB is a lightweight client-side database that mimics the actual DynamoDB database. It enables users to develop and test their code in house, without consuming actual DynamoDB resources. DynamoDB Local supports all DynamoDB APIs, with which you can run your code like running on an actual DynamoDB.

To use DynamoDB Local, you need to run a Java service on the desired port and direct your calls from code to this service. Once you try to test your code, you can simply redirect it to an actual DynamoDB.

So, using this, you can code your application without having full Internet connectivity all the time, and once you are ready to deploy your application, simply make a single line change to point your code to an actual DynamoDB and that's it.

Installing and running DynamoDB Local is quite easy and quick; you just have to perform the following steps and you can get started with it:

  1. Download the DynamoDB Local executable JAR, which can be run on Windows, Mac, or Linux. You can download this JAR file from http://dynamodb-local.s3-website-us-west-2.amazonaws.com/dynamodb_local_latest.

  2. This JAR file is compiled on version 7 of JRE, so it might not be suitable to run on the older JRE version.

  3. The given ZIP file contains two important things: a DynamoDBLocal_lib folder that contains various third-party JAR files that are being used, and DynamoDBLocal.jar which contains the actual entry point.

  4. Once you unzip the file, simply run the following command to get started with the local instance:

    java -Djava.library.path=. -jar DynamoDBLocal.jar
    
  5. Once you press Enter, the DynamoDB Local instance gets started, as shown in the following screenshot:

    By default, the DynamoDB Local service runs on port 8000.

  6. In case you are using port 8000 for some other service, you can simply choose your own port number by running the following command:

    java -Djava.library.path=. -jar DynamoDBLocal.jar --port <YourPortNumber>
    

Now, let's see how to use DynamoDB Local in the Java API. The complete implementation remains the same; the only thing that we need to do is set the endpoint in the client configuration as http://localhost:8000.

Using DynamoDB for development in Java is quite easy; you just need to set the previous URL as the endpoint while creating DynamoDB Client, as shown in the following code:

// Instantiate AWS Client with proper credentials
AmazonDynamoDBClient dynamoDBClient = new AmazonDynamoDBClient(
  new ClasspathPropertiesFileCredentialsProvider());
Region usWest2 = Region.getRegion(Regions.US_WEST_2);
  dynamoDBClient.setRegion(usWest2);
// Set DynamoDB Local Endpoint
  dynamoDBClient.setEndpoint("http://localhost:8000");

Once you are comfortable with your development and you are ready to use the actual DynamoDB, simply remove the highlighted line from the previous code snippet and you are done. Everything will work as expected.

DynamoDB Local is useful but before using it, we should make a note of following things:

  • DynamoDB Local ignores the credentials you have provided.

  • The values provided in the access key and regions are used to create only the local database file. The DB file gets created in the same folder from where you are running your DynamoDB Local.

  • DynamoDB Local ignores the settings provided for provision throughput. So, even if you specify the same at table creation, it will simply ignore it. It does not prepare you to handle provision throughput exceeded exceptions, so you need to be cautious about handling it in production.

  • Last but not least, DynamoDB Local is meant to be used for development and unit testing purposes only and should not be used for production purposes, as it does not have durability or availability SLAs.

 

Summary


In this chapter, we talked about DynamoDB's history, its features, the concept of provision throughput, and why it is important from the DynamoDB usage point of view. We also saw how you can get started with AWS DynamoDB and create a table and load data. We also learned about installing and running a DynamoDB Local instance utility and how to use it for development.

In the next chapter, we will discuss the DynamoDB data model in more detail and how to use DynamoDB APIs to perform the table, item, and attribute level operations.

About the Author

  • Tanmay Deshpande

    Tanmay Deshpande is a Hadoop and big data evangelist. He currently works with Schlumberger as a Big Data Architect in Pune, India. He has interest in a wide range of technologies, such as Hadoop, Hive, Pig, NoSQL databases, Mahout, Sqoop, Java, cloud computing, and so on. He has vast experience in application development in various domains, such as oil and gas, finance, telecom, manufacturing, security, and retail. He enjoys solving machine-learning problems and spends his time reading anything that he can get his hands on. He has great interest in open source technologies and has been promoting them through his talks. Before Schlumberger, he worked with Symantec, Lumiata, and Infosys. Through his innovative thinking and dynamic leadership, he has successfully completed various projects.

    He regularly blogs on his website http://hadooptutorials.co.in. You can connect with him on LinkedIn at https://www.linkedin.com/in/deshpandetanmay/.

    He has also authored Mastering DynamoDB, published in August 2014, DynamoDB Cookbook, published in September 2015, Hadoop Real World Solutions Cookbook-Second Edition, published in March 2016, Hadoop: Data Processing and Modelling, published in August, 2016, and Hadoop Blueprints, published in September 2016, all by Packt Publishing.

    Browse publications by this author