Packt+ | Advance your knowledge in tech

You're reading from DynamoDB Applied Design Patterns

Product typeBook

Published inSep 2014

Publisher

ISBN-139781783551897

Edition1st Edition

Tools

DynamoDB

Concepts

Design Patterns

Author (1)

Uchit Hamendra Vyas

Appendix A. Comparing DynamoDB

A web-based applications era is in progress everywhere, and through web-based applications, a massive amount of data is getting generated. By leveraging the Amazon DynamoDB platform, you can simply dial up your requirement based on the request capacity of a table, without incurring downtime. Amazon DynamoDB will take care of your entire administrative and management burden, and you can concentrate on your business logic.

With reference to other relational databases' robust characteristics and functionality, it is a very complex and time-consuming process. If your requirements are dynamic and the changes are based on traffic and usage scenarios, plus if you need the desired throughput while accessing data from the database, Amazon DynamoDB will work best for you. So let's make some comparisons with other data storage services that can come under the NoSQL category.

DynamoDB versus MongoDB

Both DynamoDB and MongoDB are NoSQL databases used to build for high scalability, high performance, and high availability. The main difference between the two is that DynamoDB is a service provided by Amazon AWS, so it can only run in AWS. MongoDB is a software application provided by the database company MongoDB Inc (formerly known as 10gen Inc), which can be installed and run anywhere. From the data model point of view, DynamoDB is a key-value database, which means that it's a columnar database, whereas MongoDB is a document-oriented Database. DynamoDB abstracts all the operations details of replication and sharding of the database from the end user, but in MongoDB we have full access to the source code and can dig into the file formats. This might be an advantage or a disadvantage. MongoDB uses internal memory to store the (windowed) working set, enabling faster access to data. So if our datasets are much larger than the accessible memory, then DynamoDB scales to much larger datasets.

DynamoDB is suitable for use cases where data access is by one or two dimensions of data, but if your data access patterns state more than two dimensions of data, then MongoDB is a better option, because it supports any number of indexes. MongoDB has major limitations when running MapReduce jobs, but DynamoDB integrates with Elastic MapReduce (EMR) and reduces the complexity of analyzing unstructured data.

If you have an AWS account, then DynamoDB is very simple to use, which means that we have to work only on applications, whereas other management of the database server would be handled by AWS. This means that if you have less manpower, then it's good to use DynamoDB. If we use DynamoDB, then other Amazon services, such as CloudSearch, Elastic MapReduce, and other services for database backup and restore can be easily integrated with it so that it can speed up development and reduce the cost of server management. In MongoDB, we must have the right servers, installation, and configurations. AWS provides excellent performance with DynamoDB by giving single-digit latency on very heavy data traffic. All data is replicated synchronously across all availability zones without any downtime, even while there are frequent throughput updates. The Amazon DynamoDB pricing policy is pay only for what you use, which means you can buy on operations-per-second capability instead of CPU hours or storage space. You have to specify the request throughput of your table you want to achieve (the capacity you request to reserve for reads and writes). The official AWS SDK supports Java, JavaScript, Ruby, PHP, Python, and .NET, while MongoDB mostly supports the likes of C, C++, Perl, Erlang, PowerShell, ProLog, MATLAB, and so on.

Let's look more closely into the comparison of DynamoDB and MongoDB:

Specification	DynamoDB	MongoDB
Data model	Key value	Document store
Operating system	Cross platform (hosted)	Linux Windows Solaris OS X
License	Commercial	Open source
Data storage	Solid-state drive (SSD)	Any
Secondary indexes	Yes	Yes
Accessing method	REST API	JSON
Server-side script	No	JavaScript
Triggers	No	No
Partitioning	Sharding	Sharding
Integrity model supports	BASE MVCC ACID Eventual consistency Log replication Read committed	BASE
Atomicity	Yes	Conditional
Transaction	No	Yes
Full text search	No	Yes
Geospatial indexes	No	Yes
Horizontal scalability	Yes	Yes
Replication method	Master-slave replica	Master-slave replica
Max. size value	64 KB	16 MB
Object-relational mapping	No	Yes
Function-based index	Yes	No
Log Support	No	Yes
Operation performed per second	1,000	10,000
User concepts	Access rights for users and roles can be defined via the AWS Identity and Access Management (IAM)	Users can be defined with full access or read-only access

In the previous table, the Integrity model supports row mentions a few values. They are as follows:

BASE: It stands for Basically Available, Soft state, Eventual consistency
MVCC: It stands for Multiversion Concurrency Control
ACID: It stands for Atomicity, Consistency, Isolation, Durability

DynamoDB versus Cassandra

Let's start with a data model such as DynamoDB's storage model, which is very similar to Cassandra's model, in which data is hashed on the row key, and the data inside the key is ordered by a specific column of the insert. In DynamoDB, a column can be single valued or scalar, which means that attributes can be multivalued. Cassandra has various attribute types, such as Integer, BigInteger, ASCII, UTF8, and Double, and it also offers composite and dynamic composite columns. It provide the full range of data formats, which include structured, semi-structured and unstructured data that can be run on recent applications, whereas DynamoDB has only two attribute types, namely String and Number.

Multi-datacenter across all regions is supported by Cassandra, whereas DynamoDB replicates data across multiple availability zones in the same region, but cross-region is not supported. So if we want to provide local data latencies in any regions across the world, then Cassandra provides full control over data consistency.

Let's take a scenario in which we require a large number of increments with a few counters, with the ability to read the current counter. Scaling the throughput on an individual counter is quite difficult, because there is a direct read/write operation performed. So if we need more than one node to handle one count, then the read operation becomes slow, and it involves all the nodes. In this case, we retry this operation in the event, because we don't know whether our previous request succeeded. We are performing the same update twice so that it frequently causes a long latency or load spikes across the cluster. In DynamoDB, there is an atomic counter that is more reliable with low latency, and it supports as many increments as the operations we have performed.

In DynamoDB the overload is effectively handled. If we exceed the predicted/mentioned throughput, we rapidly get the ThroughputExceeded error; meanwhile, no other requests are affected. This is very useful for a heavily-loaded site where thousands of requests come at a time, and, because of the latency spikes, queues will be generated with great thrust.

In Cassandra, the virtual node scale-up is pretty easy, but the scale-down operation remains slow, manual, and error prone. In data streaming, the transmitting nodes are joining or leaving the rings. This causes a group failure of nodes, so it requires repairing. Also, data lost during the decommissioning operation requires data restore through a backup. In DynamoDB, scale-up becomes effortless with a single-line command that waits for a while to be scaled, while in Cassandra Cluster, it's a multistep as well as multihour process. In Dynamo DB, scale-down is also better and much less time consuming, with low latency.

DynamoDB has the capability to insert or delete an item from a set without using complex code. Its operational cost will be zero as, once we set up backup jobs at specific time intervals, there is no need to manage the database, no disk space monitoring, no need to check memory usage, and no need to replace or repair the failed node. DynamoDB saves costs too. Cassandra basically supports a logically unlimited amount of data at a time with a specific key. This means that this limit is up to the disk space on a particular node, but DynamoDB's limit is up to 64 KB, so it might be tricky to handle overflow. Cassandra supports transactions very well by delivering ACID compliance using a commit log to capture all read and write operations, with built-in redundancy that ensures data durability if the hardware fails.

Now take a look at the tabular comparison between these two databases:

Specification	DynamoDB	Cassandra
Data model	Key-value store	Key-value with wide column store
Operating system	Cross platform (hosted)	BSD Linux OS X Windows
License	Commercial (Amazon)	Commercial(Apache)
Data storage	Solid-state drive (SSD)	Filesystem
Secondary indexes	Yes	No
Accessing method	API call	API call CQL (short for Cassandra Query Language) Apache Thrift
Server-side script	No	No
Triggers	No	Yes
Partitioning	Sharding	Sharding
MapReduce	No (can be done with other services of AWS)	Yes
Integrity model supports	BASE MVCC ACID Eventual consistency Log replication Read committed	BASE
Composite key support	Yes	Yes
Data consistency	Yes	Most operations
Distributed counters	Yes	Yes
Idempotent write batches	No	Yes
Time to live support	No	Yes
Conditional updates	Yes	No
Indexes on column value	No	Yes
Hadoop integration	M/R, Hive	M/R, Hive, Pig
Monitorable	Yes	Yes
Backups	Low impact snapshot with incremental	incremental
Deployment policy	Only with AWS	Anywhere
Transaction	No	Yes
Full text search	No	No
Geospatial indexes	No	No
Horizontal scalability	Yes	Yes
Replication method	Master-slave replica	Master-slave replica
Largest value supported	64 KB	2 GB
Object-relational mapping	No	Yes
Log support	No	Yes
User concepts	Access rights for users and roles can be defined via AWS Identity and Access Management (IAM)	Users can be defined per object

DynamoDB versus S3

DynamoDB and S3 (short for Simple Storage Service) are both storage/database services provided by Amazon to be exposed to the NoSQL database. S3 is used for wide data storage as it has capacity to store data up to a 5 TB maximum item size, where as DynamoDB has capacity to store data up to 64 KB, but both have no limit on attributes for a particular item. S3 creates a bucket that stores data. Buckets are the essential containers in Amazon S3 for data storage. S3 is for large data storage, which we are rarely going to use. For example, data stored for analysis purposes or data stored for the backup and restore procedure. It has a slower response time. DynamoDB is used for high performance. In DynamoDB, we need a key-value data store that can handle thousands of read/write operations per second.

DynamoDB is a flexible NoSQL database that we use to store small features, metadata, and index information—in short, data that is more dynamic. In DynamoDB, data retrieval is very fast because the data is stored on SSDs, and the data is replicated across many servers that are located in availability zones.

While using S3, we don't need to configure anything to get started. We need a key, and we can upload the data; it's the way to store data in S3. Now you only need to keep track of the key we used for a particular application. In DynamoDB, we need to configure a few things, such as when creating tables, we need to specify the primary key as well as make provision for the read/write throughput value. S3 doesn't support object locking, but if we need it, then we have to build it manually, whereas DynamoDB supports optimistic object locking. Optimistic locking is an approach to ensure that the client-side item that we are updating is the same as the item in DynamoDB. If we use this approach, then our database writes are protected from being overwritten by other writes.

One of the good features of S3 is that every S3 object or document that is stored in it has a web address. By using this web address, a document can be accessed without any impact on the web server or the database.

S3 uses the eventual consistency model in which, if no new updates are made on a given data item, all accesses to that item will return the last updated value. DynamoDB follows eventual consistency and strong consistency, in which all accesses are seen by all parallel processes, by all parallel nodes, and by all processors in sequential order. DynamoDB doesn't support spatial data. It can store points and extend it to build a Geohash-based index that also indexes lines (for example, roads) and areas (for example, boundaries) that represent arbitrary area features, which means each Geohash-index record stored in DynamoDB. But we can store this information if it's a small geographical area, otherwise the spatial feature is stored as an object in S3, with a pointer to the S3 location that is stored in DynamoDB.

DynamoDB is presently provisioned completely on SSD devices. SSD devices can read requests in a small fraction of time as compared to a magnetic disk (about 100 times faster) by servicing individual data, though the cost per GB of storage is much higher (approximately 10 times) than that of a magnetic disk. Hence DynamoDB can provide very low latency and high throughput compared to S3 but at a higher cost per unit of storage. Let's take a scenario in which 1 GB of DynamoDB storage costs $1 per month, while S3 storage costs between 4 cents and 12 cents per GB per month. This means it is eight times cheaper than DynamoDB. DynamoDB also provides a flexible fee structure based on the IO capacity. In this, 1,000 read operation per second will cost around 20 cents per hour, and the write operation is about 5 times more expensive, because SSDs can perform a read operation much faster than a write operation.

Now take a look at the tabular comparison between DynamoDB and S3:

Specification	DynamoDB	S3
Data model	Key-value store	Store object/data in bucket
Operating system	Cross platform (hosted)	Cross platform (hosted)
License	Commercial (Amazon)	Commercial(Amazon)
Data storage	Solid-state drive (SSD)	Magnetic disk
Secondary indexes	Yes	No
Accessing method	API call	HTTP web address (API + publicly accessible URL)
Server-side script	No	No
Composite key support	Yes	No
Data consistency	Yes	Yes
Distributed counters	Yes	No
Largest value supported per item	64 KB	5 TB

DynamoDB versus Redis

Both DynamoDB and Redis are NoSQL databases that store the data in key-value format. But Redis is an open source database provided by BSD. Redis means Remote Dictionary Server. It is often called a data structure server as its key contains many data types, such as string hashes, in which the key and values are string, sorted sets of strings, stored sets, and lists of strings. We can perform atomic operations in Redis. Redis stores the whole dataset in memory, so we can persist it by dumping the data to disk, because Redis synchronizes data to the disk every 2 seconds. So if the system fails, we lose the data for only a few seconds. Another way to persist it is by appending each command to the log.

Redis supports master-slave replication, which allows the slave Redis server to be the same copy of the master Redis server with non-blocking replication. On the master-side server, it will handle queries when one or more slaves perform initial synchronization. On the other side, while the slave server performs initial synchronization, it will still handle queries using the old dataset. But in DynamoDB, replication is done at a scheduled time; a continuous data replication is not performed. So in this case, if a primary DynamoDB table loses its data, there can be data loss while restoring from the backups. In Redis, slaves, are able to accept connections from several slaves apart from slaves from the same master server. But in DynamoDB, the tables should be on the same AWS account.

Other features of Redis include transactions, pub/sub, Lua scripting capabilities, keys with a restricted time to live, and other configuration settings, which will allow Redis to work like a cache. Redis is written in ANSI C and works in almost all operating systems, such as Linux, BSD, and OS X without any external dependencies.

While data durability is not the major concern, the in-memory environment of Redis allows it to perform extremely well compared to database systems that write every update or change to disk before allowing a committed transaction. There is no prominent speed difference between read and write IOs. Redis works as a single process and is single-threaded. Hence a single Redis instance cannot apply parallel execution of tasks such as stored procedures. Redis is mainly used for rapidly changing data with a predictable database size that should mostly fit in-memory. So Redis is used in real-time applications such as storing real-time stock prices, real-time communication, real-time analytics, and leaderboards. We can use it as an option for memory cache too. Let's move on to the tabular comparison between these databases, as follows:

Specification	DynamoDB	Redis
Data model	Key-value store	Key-value store
Operating system	Cross platform (hosted)	BSD Linux OS X Solaris
Programming language	Ruby	C
License	Commercial (Amazon)	Open source with BSD-license
Data storage	Solid-state drive (SSD)	In-memory dataset (RAM)
Secondary indexes	Yes	No
Accessing method	RESTful HTTP API call	API call Lua
Server-side script	No	Lua
Triggers	No	No
Partitioning	Sharding	None
MapReduce	No (can be done with other services of AWS)	No
Composite key support	Yes	No
Atomicity	Yes	Yes
Data consistency	Yes	Yes
Isolation	Yes	Yes
Durability	Yes	Yes
Transactions	No	Optimistic locking
Concurrency control	ACID	Locks
Partition tolerance	No	Yes
Persistence	No	Yes
High availability	Yes	No
Referential integrity	No	No
Revision control	Yes	No
Function-based index	Yes	No
Full text search	No	No
Geospatial indexes	No	No
Horizontal scalability	Yes	Yes
Replication method	Master-slave replica (asynchronized)	Master-slave replica (synchronized)
Largest value supported	64 KB	512 MB
Object-relational mapping	No	Yes
Log support	No	Yes
Operations per second	1000	140000
Free for commercial use	No	Yes (Up to some memory usage)
Deployment policy	Only with AWS	On premises
Easy to use	No	Yes
Backup	Scheduled (to configure)	Autosync frequently
User concepts	Access rights for users and roles can be defined via the AWS Identity and Access Management (IAM)	A very simple password-based access control
Best use	Rapidly varying data, frequently written, rarely read statistical data	Large to small database solution

So as per the previous comparisons, you can easily identify the most suitable NoSQL data service to work with your dynamic applications. In short, it provides the following features on the Amazon-distributed infrastructure and robust platform:

Seamless scaling
Secondary indexes
Schema-less
Strong consistency, atomic counters
Integrated monitoring
Secure
Elastic MapReduce and Redshift, and data-pipeline integration
Management console and APIs

The rest of the chapter is locked

You have been reading a chapter from

DynamoDB Applied Design Patterns

Published in: Sep 2014Publisher: ISBN-13: 9781783551897

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Uchit Hamendra Vyas

Personalised recommendations for you

Based on your interests and search pattern

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

BookSep 2023288 pages

Expert C++

Discover advanced programming techniques, the latest features of C++17 and C++20, and best practices for memory management, debugging, testing, and large-scale application design with Expert C++. Ideal for experienced developers advancing to proficient programmers and building professional-grade C++ applications.

BookAug 2023604 pages

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

BookOct 2023604 pages4

Developer Career Masterplan

Written by industry experts that have spent the last 20+ years helping developers grow their career path towards senior developer positions and beyond. This book provides a comprehensive guide, sharing examples and stories from their global careers. By the end, you’ll have the knowledge to create a clear career progression plan as a technical professional.

BookSep 2023310 pages

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

BookNov 2023434 pages

Python Real-World Projects

Amplify your developer journey by curating a dynamic project portfolio that outshines traditional resumes. Delve into the Python realm through immersive projects, mastering core concepts while constructing comprehensive modules and applications. From data acquisition prowess to impactful data visualization, Python Real-World Projects arms you with essential skills to beat the competition.

BookSep 2023478 pages5

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

BookNov 2023386 pages

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Quantum Computing Algorithms

The book emphasizes intuitive ideas behind quantum algorithms in ways that other books don’t cover, striking a careful balance between no math and too much math. To get the most from this book, you should be comfortable with basic algebra and writing simple computer code. No prior understanding of quantum physics is needed to get started.

BookSep 2023342 pages

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5