Amazon DynamoDB is a fully managed, cloud-hosted, NoSQL database. It provides fast and predictable performance with the ability to scale seamlessly. It allows you to store and retrieve any amount of data, serving any level of network traffic without having any operational burden. DynamoDB gives numerous other advantages like consistent and predictable performance, flexible data modeling, and durability.
With just few clicks on the Amazon Web Services console, you are able create your own DynamoDB table and scale up or scale down provision throughput without taking down your application even for a millisecond. DynamoDB uses Solid State Disks (SSD) to store the data which confirms the durability of the work you are doing. It also automatically replicates the data across other AWS Availability Zones, which provides built-in high availability and reliability.
In this chapter, we are going to revise our concepts about the DynamoDB and will try to discover more about its features and implementation.
Before we start discussing details about DynamoDB, let's try to understand what NoSQL databases are and when to choose DynamoDB over Relational Database Management System (RDBMS). With the rise in data volume, variety, and velocity, RDBMSes were neither designed to cope up with the scale and flexibility challenges developers are facing to build the modern day applications, nor were they able to take advantage of cheap commodity hardware. Also, we need to provide a schema before we start adding data, and this restricted developers from making their application flexible. On the other hand, NoSQL databases are fast, provide flexible schema operations, and make effective use of cheap storage.
Considering all these things, NoSQL is becoming popular very quickly amongst the developer community. However, one has to be very cautious about when to go for NoSQL and when to stick to RDBMS. Sticking to relational databases makes sense when you know that the schema is more over static, strong consistency is must, and the data is not going to be that big in volume.
However, when you want to build an application that is Internet scalable, the schema is more likely to get evolved over time, the storage is going to be really big, and the operations involved are okay to be eventually consistent. Then, NoSQL is the way to go.
Document Store: MongoDB, CouchDB, MarkLogic
Column Store: Hbase, Cassandra
Key Value Store: DynamoDB, Azure, Redis
Graph Databases: Neo4J, DEX
Most of these NoSQL solutions are open source except for a few like DynamoDB and Azure, which are available as a service over the Internet. DynamoDB being a key-value store indexes data only upon primary keys, and one has to go through the primary key to access certain attributes. Let's start learning more about DynamoDB by having a look at its history.
Amazon's e-commerce platform had a huge set of decoupled services developed and managed individually, and each and every service had an API to be used and consumed by others. Earlier, each service had direct database access, which was a major bottleneck. In terms of scalability, Amazon's requirements were more than any third-party vendors could provide at that time.
DynamoDB was built to address Amazon's high availability, extreme scalability, and durability needs. Earlier, Amazon used to store its production data in relational databases and services had been provided for all required operations. However, they later realized that most of the services access data only through its primary key and they need not use complex queries to fetch the required data, plus maintaining these RDBMS systems required high-end hardware and skilled personnel. So, to overcome all such issues, Amazon's engineering team built a NoSQL database that addresses all the previously mentioned issues.
In 2007, Amazon released one research paper on Dynamo that combined the best of ideas from the database and key-value store worlds, which was inspiration for many open source projects at the time. Cassandra, Voldemort, and Riak were a few of them. You can find the this paper at http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf.
Even though Dynamo had great features that took care of all engineering needs, it was not widely accepted at that time in Amazon, as it was not a fully managed service. When Amazon released S3 and SimpleDB, engineering teams were quite excited to adopt these compared to Dynamo, as DynamoDB was a bit expensive at that time due to SSDs. So, finally after rounds of improvement, Amazon released Dynamo as a cloud-based service, and since then, it is one the most widely used NoSQL databases.
Before releasing to a public cloud in 2012, DynamoDB was the core storage service for Amazon's e-commerce platform, which started the shopping cart and session management service. Any downtime or degradation in performance had a major impact on Amazon's business, and any financial impact was strictly not acceptable, and DynamoDB proved itself to be the best choice in the end. Now, let's try to understand in more detail about DynamoDB.
The next diagram shows how Amazon offers its various cloud services and where DynamoDB is exactly placed. AWS RDS is a relational database as a service over the Internet from Amazon, while Simple DB and DynamoDB are NoSQL databases as services. Both SimpleDB and DynamoDB are fully managed, nonrelational services. DynamoDB is build considering fast, seamless scalability, and high performance. It runs on SSDs to provide faster responses and has no limits on request capacity and storage. It automatically partitions your data throughout the cluster to meet expectations, while in SimpleDB, we have a storage limit of 10 GB and can only take limited requests per second.
Also, in SimpleDB, we have to manage our own partitions. So, depending upon your need, you have to choose the correct solution.
To use DynamoDB, the first and foremost requirement is an AWS account. Through the easy-to-use AWS management console, you can directly create new tables, providing necessary information and can start loading data into the tables in few minutes.
To understand DynamoDB better, we need to understand its data model first. DynamoDB's data model includes Tables, Items, and Attributes. A table in DynamoDB is nothing but what we have in relational databases. DynamoDB tables need not have fixed schema (number of columns, column names, their data types, column order, and column size). It needs only the fixed primary key, its data type, and a secondary index if needed, and the remaining attributes can be decided at runtime. Items in DynamoDB are individual records of the table. We can have any number of attributes in an item.
DynamoDB stores the item attributes as key-value pairs. Item size is calculated by adding the length of attribute names and their values.
DynamoDB has an item-size limit of 64 KB; so, while designing your data model, you have to keep this thing in mind that your item size must not cross this limitation. There are various ways of avoiding the over spill, and we will discuss such best practices in Chapter 4, Best Practices.
Here, we have a table called Student, which can have multiple items in it. Each item can have multiple attributes that are stored in key–value pairs. We will see more details about the data models in Chapter 2, Data Models.
DynamoDB supports the create, update, and delete operations at the table level. It also supports the
UpdateTable operation, which can be used to increase or decrease the provisioned throughput. We have the
ListTables operation to get the list of all available tables associated with your account for a specific endpoint. The
DescribeTable operation can be used to get detailed information about the given table.
Scan operations are used to retrieve information from tables. The
Query operation allows us to query the given table with provided hash key and range key. We can also query tables for secondary indexes. The
Scan operation reads all items from a given table. More information on operations can be found in Chapter 2, Data Models.
Provisioned throughput is a special feature of DynamoDB that allows us to have consistent and predictable performance. We need to specify the read and write capacity units. A read capacity unit is one strongly consistent read and two eventually consistent reads per second unit for an item as large as 4 KB, whereas one write capacity unit is one strongly consistent write unit for an item as large as 1 KB. A consistent read reflects all successful writes prior to that read request, whereas a consistent write updates all replications of a given data object so that a read on this object after this write will always reflect the same value.
For items whose size is more than 4 KB, the required read capacity units are calculated by summing it up to the next closest multiple of 4. For example, if we want to read an item whose size is 11 KB, then the number of read capacity units required is three, as the nearest multiple of 4 to 11 is 12. So, 12/4 = 3 is the required number of read capacity units.
Required Capacity Units For
No. of Item reads per second * Item Size
Number of Item reads per second * Item Size/2
Number of Item writes per second * Item Size
If our application exceeds the maximum provisioned throughput for a given table, then we get notified with a proper exception. We can also monitor the provisioned and actual throughput from the AWS management console, which will give us the exact idea of our application behavior. To understand it better, let's take an example. Suppose, we have set the write capacity units to 100 for a certain table and if your application starts writing to the table by 1,500 capacity units, then DynamoDB allows the first 1,000 writes and throttles the rest. As all DynamoDB operations work as RESTful services, it gives the error code 400 (Bad Request).
If you have items smaller than 4 KB, even then it will consider it to be a single read capacity unit. We cannot group together multiple items smaller than 4 KB into a single read capacity unit. For instance, if your item size is 3 KB and if you want to read 50 items per second, then you need to provision 50 read capacity units in a table definition for strong consistency and 25 read capacity units for eventual consistency.
If you have items larger than 4 KB, then you have to round up the size to the next multiple of 4. For example, if your item size is 7 KB (~8KB) and you need to read 100 items per second, then the required read capacity units would be 200 for strong consistency and 100 capacity units for eventual consistency.
In the case of write capacity units, the same logic is followed. If the item size is less than 1 KB, then it is rounded up to 1 KB, and if item size is more than 1 KB, then it is rounded up to next multiple of 1.
The AWS SDK provides auto-retries on
ProvisionedThroughputExceededException when configured though client configuration. This configuration option allows us to set the maximum number of times
HttpClient should retry sending the request to DynamoDB. It also implements the default backoff strategy that decides the retry interval.
// Create a configuration objectfinal ClientConfiguration cfg = new ClientConfiguration();// Set the maximum auto-reties to 3cfg.setMaxErrorRetry(3); // Set configuration object in Clientclient.setConfiguration(cfg);
Like we said earlier, DynamoDB comes with enormous scalability and high availability with predictable performance, which makes it stand out strong compared to other NoSQL databases. It has tons of features; we will discuss some of them.
DynamoDB allows developers to focus on the development part rather than deciding which hardware to provision, how to do administration, how to set up the distributed cluster, how to take care of fault tolerance, and so on. DynamoDB handles all scaling needs; it partitions your data in such a manner that the performance requirements get taken care of. Any distributed system that starts scaling is an overhead to manage but DynamoDB is a fully managed service, so you don't need to bother about hiring an administrator to take care of this system.
Once data is loaded into DynamoDB, it automatically replicates the data into different availability zones in a region. So, even if your data from one data center gets lost, there is always a backup in another data center. DynamoDB does this automatically and synchronously. By default, DynamoDB replicates your data to three different data centers.
DynamoDB distributes your data on multiple servers across multiple availability zones automatically as the data size grows. The number of servers could be easily from hundreds to thousands. Developers can easily write and read data of any size and there are no limitations on data size. DynamoDB follows the shared-nothing architecture.
DynamoDB serves at a very high throughput, providing single-digit millisecond latency. It uses SSD for consistent and optimized performance at a very high scale. DynamoDB does not index all attributes of a table, saving costs, as it only needs to index the primary key, and this makes read and write operations superfast. Any application running on an EC2 instance will show single-digit millisecond latency for an item of size 1 KB. The latencies remain constant even at scale due to the highly distributed nature and optimized routing algorithms.
DynamoDB is very easy to manage. The Amazon web console has a user-friendly interface to create tables and provide necessary details. You can simply start using the table within a few minutes. Once the data load starts, you don't need to do anything as rest is taken care by DynamoDB. You can monitor Amazon CloudWatch for the provision throughput and can make changes to read and write capacity units accordingly if needed.
DynamoDB, being a NoSQL database, does not force users to define the table schema beforehand. Being a key-value data store, it allows users to decide what attributes need to be there in an item, on the fly. Each item of a table can have different number of attributes.
Rich Data ModelDynamoDB has a rich data model, which allows a user to define the attributes with various data types, for example, number, string, binary, number set, string set, and binary set. We are going to talk about these data types in Chapter 2, Data Models, in detail.
DynamoDB indexes the primary key of each item, which allows us to access any element in a faster and efficient manner. It also allows global and local secondary indexes, which allows the user to query on any non-primary key attribute.
Each call to DynamoDB makes sure that only authenticated users can access the data. It also uses the latest and effective cryptographic techniques to see your data. It can be easily integrated with AWS Identity and Access Management (IAM), which allows users to set fine-grained access control and authorization.
DynamoDB provides a very cost-effective pricing model to host an application of any scale. The pay-per-use model gives users the flexibility to control expenditure. It also provides free tier, which allows users 100 MB free data storage with 5 writes/second and 10 reads/second as throughput capacity. More details about pricing can be found at http://aws.amazon.com/dynamodb/pricing/.
Now that you are aware of all the exciting features of DynamoDB, I am sure you are itching to try out your hands on it. So let's try to create a table using the Amazon DynamoDB management console. The pre-requisite to do this exercise is having a valid Amazon account and a valid credit card for billing purposes. Once the account is active and you have signed up for the DynamoDB service, you can get started directly. If you are new to AWS, more information is available at http://docs.aws.amazon.com/gettingstarted/latest/awsgsg-intro/gsg-aws-intro.html.
Amazon's infrastructure is spread across almost 10 regions worldwide and DynamoDB is available in almost all regions. You can check out more details about it at https://aws.amazon.com/about-aws/globalinfrastructure/regional-product-services/.
Go to the Amazon DynamoDB management console at https://console.aws.amazon.com/dynamodb, and you will get the following screenshot:
Click on the Create Table button and you will see a pop-up window asking for various text inputs. Here, we are creating a table called
emp_idas the hash key and
Once you click on the Continue button, you will see the next window asking to create indexes, as shown in the next screenshot. These are optional parameters; so, if you do not wish to create any secondary indexes, you can skip this and click on Continue. We are going to talk about the indexes in Chapter 2, Data Models.
Once you click on the Continue button again, the next page will appear asking for provision throughput capacity units. We have already talked about the read and write capacity; so, depending upon your application requirements, you can give the read and write capacity units in the appropriate text box, as shown in the following screenshot:
The next page will ask whether you want to set any throughput alarm notifications for this particular table. You can provide an e-mail ID on which you wish to get the alarms, as shown in the following screenshot. If not, you can simply skip it.
Once you set the required alarms, the next page would be a summary page confirming the details you have provided. If you see all the given details are correct, you can click on the Create button, as shown in the following screenshot:
Once the Create button is clicked, Amazon starts provisioning the hardware and other logistics in the background and takes a couple of minutes to create the table. In the meantime, you can see the table creations status as CREATING on the screen, as shown in the following screenshot:
You can click on the New Item button to add a new record to the table, which will open up a pop up asking for various attributes that we wish to add in this record. Earlier, we had added
Here, I have added two extra attributes,
company, with some relevant values. Once done, you can click on the Put Item button to actually add the item to the table.
In Chapter 2, Data Models, we will be looking for various examples in Java, .Net, and PHP to play around with tables, items, and attributes.
DynamoDB is a lightweight client-side database that mimics the actual DynamoDB database. It enables users to develop and test their code in house, without consuming actual DynamoDB resources. DynamoDB Local supports all DynamoDB APIs, with which you can run your code like running on an actual DynamoDB.
To use DynamoDB Local, you need to run a Java service on the desired port and direct your calls from code to this service. Once you try to test your code, you can simply redirect it to an actual DynamoDB.
So, using this, you can code your application without having full Internet connectivity all the time, and once you are ready to deploy your application, simply make a single line change to point your code to an actual DynamoDB and that's it.
Download the DynamoDB Local executable JAR, which can be run on Windows, Mac, or Linux. You can download this JAR file from http://dynamodb-local.s3-website-us-west-2.amazonaws.com/dynamodb_local_latest.
The given ZIP file contains two important things: a
DynamoDBLocal_libfolder that contains various third-party JAR files that are being used, and
DynamoDBLocal.jarwhich contains the actual entry point.
Once you unzip the file, simply run the following command to get started with the local instance:
java -Djava.library.path=. -jar DynamoDBLocal.jar
Once you press Enter, the DynamoDB Local instance gets started, as shown in the following screenshot:
By default, the DynamoDB Local service runs on port 8000.
In case you are using port 8000 for some other service, you can simply choose your own port number by running the following command:
java -Djava.library.path=. -jar DynamoDBLocal.jar --port <YourPortNumber>
Now, let's see how to use DynamoDB Local in the Java API. The complete implementation remains the same; the only thing that we need to do is set the endpoint in the client configuration as
Using DynamoDB for development in Java is quite easy; you just need to set the previous URL as the endpoint while creating DynamoDB Client, as shown in the following code:
// Instantiate AWS Client with proper credentials AmazonDynamoDBClient dynamoDBClient = new AmazonDynamoDBClient( new ClasspathPropertiesFileCredentialsProvider()); Region usWest2 = Region.getRegion(Regions.US_WEST_2); dynamoDBClient.setRegion(usWest2); // Set DynamoDB Local Endpoint dynamoDBClient.setEndpoint("http://localhost:8000");
Once you are comfortable with your development and you are ready to use the actual DynamoDB, simply remove the highlighted line from the previous code snippet and you are done. Everything will work as expected.
DynamoDB Local ignores the credentials you have provided.
The values provided in the access key and regions are used to create only the local database file. The DB file gets created in the same folder from where you are running your DynamoDB Local.
DynamoDB Local ignores the settings provided for provision throughput. So, even if you specify the same at table creation, it will simply ignore it. It does not prepare you to handle provision throughput exceeded exceptions, so you need to be cautious about handling it in production.
Last but not least, DynamoDB Local is meant to be used for development and unit testing purposes only and should not be used for production purposes, as it does not have durability or availability SLAs.
In this chapter, we talked about DynamoDB's history, its features, the concept of provision throughput, and why it is important from the DynamoDB usage point of view. We also saw how you can get started with AWS DynamoDB and create a table and load data. We also learned about installing and running a DynamoDB Local instance utility and how to use it for development.
In the next chapter, we will discuss the DynamoDB data model in more detail and how to use DynamoDB APIs to perform the table, item, and attribute level operations.