Reader small image

You're reading from  DynamoDB Applied Design Patterns

Product typeBook
Published inSep 2014
Publisher
ISBN-139781783551897
Edition1st Edition
Right arrow

Appendix A. Comparing DynamoDB

A web-based applications era is in progress everywhere, and through web-based applications, a massive amount of data is getting generated. By leveraging the Amazon DynamoDB platform, you can simply dial up your requirement based on the request capacity of a table, without incurring downtime. Amazon DynamoDB will take care of your entire administrative and management burden, and you can concentrate on your business logic.

With reference to other relational databases' robust characteristics and functionality, it is a very complex and time-consuming process. If your requirements are dynamic and the changes are based on traffic and usage scenarios, plus if you need the desired throughput while accessing data from the database, Amazon DynamoDB will work best for you. So let's make some comparisons with other data storage services that can come under the NoSQL category.

DynamoDB versus MongoDB


Both DynamoDB and MongoDB are NoSQL databases used to build for high scalability, high performance, and high availability. The main difference between the two is that DynamoDB is a service provided by Amazon AWS, so it can only run in AWS. MongoDB is a software application provided by the database company MongoDB Inc (formerly known as 10gen Inc), which can be installed and run anywhere. From the data model point of view, DynamoDB is a key-value database, which means that it's a columnar database, whereas MongoDB is a document-oriented Database. DynamoDB abstracts all the operations details of replication and sharding of the database from the end user, but in MongoDB we have full access to the source code and can dig into the file formats. This might be an advantage or a disadvantage. MongoDB uses internal memory to store the (windowed) working set, enabling faster access to data. So if our datasets are much larger than the accessible memory, then DynamoDB scales to much larger datasets.

DynamoDB is suitable for use cases where data access is by one or two dimensions of data, but if your data access patterns state more than two dimensions of data, then MongoDB is a better option, because it supports any number of indexes. MongoDB has major limitations when running MapReduce jobs, but DynamoDB integrates with Elastic MapReduce (EMR) and reduces the complexity of analyzing unstructured data.

If you have an AWS account, then DynamoDB is very simple to use, which means that we have to work only on applications, whereas other management of the database server would be handled by AWS. This means that if you have less manpower, then it's good to use DynamoDB. If we use DynamoDB, then other Amazon services, such as CloudSearch, Elastic MapReduce, and other services for database backup and restore can be easily integrated with it so that it can speed up development and reduce the cost of server management. In MongoDB, we must have the right servers, installation, and configurations. AWS provides excellent performance with DynamoDB by giving single-digit latency on very heavy data traffic. All data is replicated synchronously across all availability zones without any downtime, even while there are frequent throughput updates. The Amazon DynamoDB pricing policy is pay only for what you use, which means you can buy on operations-per-second capability instead of CPU hours or storage space. You have to specify the request throughput of your table you want to achieve (the capacity you request to reserve for reads and writes). The official AWS SDK supports Java, JavaScript, Ruby, PHP, Python, and .NET, while MongoDB mostly supports the likes of C, C++, Perl, Erlang, PowerShell, ProLog, MATLAB, and so on.

Let's look more closely into the comparison of DynamoDB and MongoDB:

Specification

DynamoDB

MongoDB

Data model

Key value

Document store

Operating system

Cross platform (hosted)

Linux

Windows

Solaris

OS X

License

Commercial

Open source

Data storage

Solid-state drive (SSD)

Any

Secondary indexes

Yes

Yes

Accessing method

REST API

JSON

Server-side script

No

JavaScript

Triggers

No

No

Partitioning

Sharding

Sharding

Integrity model supports

  • BASE

  • MVCC

  • ACID

  • Eventual consistency

  • Log replication

  • Read committed

BASE

Atomicity

Yes

Conditional

Transaction

No

Yes

Full text search

No

Yes

Geospatial indexes

No

Yes

Horizontal scalability

Yes

Yes

Replication method

Master-slave replica

Master-slave replica

Max. size value

64 KB

16 MB

Object-relational mapping

No

Yes

Function-based index

Yes

No

Log Support

No

Yes

Operation performed per second

1,000

10,000

User concepts

Access rights for users and roles can be defined via the AWS Identity and Access Management (IAM)

Users can be defined with full access or read-only access

In the previous table, the Integrity model supports row mentions a few values. They are as follows:

  • BASE: It stands for Basically Available, Soft state, Eventual consistency

  • MVCC: It stands for Multiversion Concurrency Control

  • ACID: It stands for Atomicity, Consistency, Isolation, Durability

DynamoDB versus Cassandra


Let's start with a data model such as DynamoDB's storage model, which is very similar to Cassandra's model, in which data is hashed on the row key, and the data inside the key is ordered by a specific column of the insert. In DynamoDB, a column can be single valued or scalar, which means that attributes can be multivalued. Cassandra has various attribute types, such as Integer, BigInteger, ASCII, UTF8, and Double, and it also offers composite and dynamic composite columns. It provide the full range of data formats, which include structured, semi-structured and unstructured data that can be run on recent applications, whereas DynamoDB has only two attribute types, namely String and Number.

Multi-datacenter across all regions is supported by Cassandra, whereas DynamoDB replicates data across multiple availability zones in the same region, but cross-region is not supported. So if we want to provide local data latencies in any regions across the world, then Cassandra provides full control over data consistency.

Let's take a scenario in which we require a large number of increments with a few counters, with the ability to read the current counter. Scaling the throughput on an individual counter is quite difficult, because there is a direct read/write operation performed. So if we need more than one node to handle one count, then the read operation becomes slow, and it involves all the nodes. In this case, we retry this operation in the event, because we don't know whether our previous request succeeded. We are performing the same update twice so that it frequently causes a long latency or load spikes across the cluster. In DynamoDB, there is an atomic counter that is more reliable with low latency, and it supports as many increments as the operations we have performed.

In DynamoDB the overload is effectively handled. If we exceed the predicted/mentioned throughput, we rapidly get the ThroughputExceeded error; meanwhile, no other requests are affected. This is very useful for a heavily-loaded site where thousands of requests come at a time, and, because of the latency spikes, queues will be generated with great thrust.

In Cassandra, the virtual node scale-up is pretty easy, but the scale-down operation remains slow, manual, and error prone. In data streaming, the transmitting nodes are joining or leaving the rings. This causes a group failure of nodes, so it requires repairing. Also, data lost during the decommissioning operation requires data restore through a backup. In DynamoDB, scale-up becomes effortless with a single-line command that waits for a while to be scaled, while in Cassandra Cluster, it's a multistep as well as multihour process. In Dynamo DB, scale-down is also better and much less time consuming, with low latency.

DynamoDB has the capability to insert or delete an item from a set without using complex code. Its operational cost will be zero as, once we set up backup jobs at specific time intervals, there is no need to manage the database, no disk space monitoring, no need to check memory usage, and no need to replace or repair the failed node. DynamoDB saves costs too. Cassandra basically supports a logically unlimited amount of data at a time with a specific key. This means that this limit is up to the disk space on a particular node, but DynamoDB's limit is up to 64 KB, so it might be tricky to handle overflow. Cassandra supports transactions very well by delivering ACID compliance using a commit log to capture all read and write operations, with built-in redundancy that ensures data durability if the hardware fails.

Now take a look at the tabular comparison between these two databases:

Specification

DynamoDB

Cassandra

Data model

Key-value store

Key-value with wide column store

Operating system

Cross platform (hosted)

BSD

Linux

OS X

Windows

License

Commercial (Amazon)

Commercial(Apache)

Data storage

Solid-state drive (SSD)

Filesystem

Secondary indexes

Yes

No

Accessing method

API call

API call

CQL (short for Cassandra Query Language)

Apache Thrift

Server-side script

No

No

Triggers

No

Yes

Partitioning

Sharding

Sharding

MapReduce

No (can be done with other services of AWS)

Yes

Integrity model supports

  • BASE

  • MVCC

  • ACID

  • Eventual consistency

  • Log replication

  • Read committed

BASE

Composite key support

Yes

Yes

Data consistency

Yes

Most operations

Distributed counters

Yes

Yes

Idempotent write batches

No

Yes

Time to live support

No

Yes

Conditional updates

Yes

No

Indexes on column value

No

Yes

Hadoop integration

M/R, Hive

M/R, Hive, Pig

Monitorable

Yes

Yes

Backups

Low impact snapshot with incremental

incremental

Deployment policy

Only with AWS

Anywhere

Transaction

No

Yes

Full text search

No

No

Geospatial indexes

No

No

Horizontal scalability

Yes

Yes

Replication method

Master-slave replica

Master-slave replica

Largest value supported

64 KB

2 GB

Object-relational mapping

No

Yes

Log support

No

Yes

User concepts

Access rights for users and roles can be defined via AWS Identity and Access Management (IAM)

Users can be defined per object

DynamoDB versus S3


DynamoDB and S3 (short for Simple Storage Service) are both storage/database services provided by Amazon to be exposed to the NoSQL database. S3 is used for wide data storage as it has capacity to store data up to a 5 TB maximum item size, where as DynamoDB has capacity to store data up to 64 KB, but both have no limit on attributes for a particular item. S3 creates a bucket that stores data. Buckets are the essential containers in Amazon S3 for data storage. S3 is for large data storage, which we are rarely going to use. For example, data stored for analysis purposes or data stored for the backup and restore procedure. It has a slower response time. DynamoDB is used for high performance. In DynamoDB, we need a key-value data store that can handle thousands of read/write operations per second.

DynamoDB is a flexible NoSQL database that we use to store small features, metadata, and index information—in short, data that is more dynamic. In DynamoDB, data retrieval is very fast because the data is stored on SSDs, and the data is replicated across many servers that are located in availability zones.

While using S3, we don't need to configure anything to get started. We need a key, and we can upload the data; it's the way to store data in S3. Now you only need to keep track of the key we used for a particular application. In DynamoDB, we need to configure a few things, such as when creating tables, we need to specify the primary key as well as make provision for the read/write throughput value. S3 doesn't support object locking, but if we need it, then we have to build it manually, whereas DynamoDB supports optimistic object locking. Optimistic locking is an approach to ensure that the client-side item that we are updating is the same as the item in DynamoDB. If we use this approach, then our database writes are protected from being overwritten by other writes.

One of the good features of S3 is that every S3 object or document that is stored in it has a web address. By using this web address, a document can be accessed without any impact on the web server or the database.

S3 uses the eventual consistency model in which, if no new updates are made on a given data item, all accesses to that item will return the last updated value. DynamoDB follows eventual consistency and strong consistency, in which all accesses are seen by all parallel processes, by all parallel nodes, and by all processors in sequential order. DynamoDB doesn't support spatial data. It can store points and extend it to build a Geohash-based index that also indexes lines (for example, roads) and areas (for example, boundaries) that represent arbitrary area features, which means each Geohash-index record stored in DynamoDB. But we can store this information if it's a small geographical area, otherwise the spatial feature is stored as an object in S3, with a pointer to the S3 location that is stored in DynamoDB.

DynamoDB is presently provisioned completely on SSD devices. SSD devices can read requests in a small fraction of time as compared to a magnetic disk (about 100 times faster) by servicing individual data, though the cost per GB of storage is much higher (approximately 10 times) than that of a magnetic disk. Hence DynamoDB can provide very low latency and high throughput compared to S3 but at a higher cost per unit of storage. Let's take a scenario in which 1 GB of DynamoDB storage costs $1 per month, while S3 storage costs between 4 cents and 12 cents per GB per month. This means it is eight times cheaper than DynamoDB. DynamoDB also provides a flexible fee structure based on the IO capacity. In this, 1,000 read operation per second will cost around 20 cents per hour, and the write operation is about 5 times more expensive, because SSDs can perform a read operation much faster than a write operation.

Now take a look at the tabular comparison between DynamoDB and S3:

Specification

DynamoDB

S3

Data model

Key-value store

Store object/data in bucket

Operating system

Cross platform (hosted)

Cross platform (hosted)

License

Commercial (Amazon)

Commercial(Amazon)

Data storage

Solid-state drive (SSD)

Magnetic disk

Secondary indexes

Yes

No

Accessing method

API call

HTTP web address (API + publicly accessible URL)

Server-side script

No

No

Composite key support

Yes

No

Data consistency

Yes

Yes

Distributed counters

Yes

No

Largest value supported per item

64 KB

5 TB

DynamoDB versus Redis


Both DynamoDB and Redis are NoSQL databases that store the data in key-value format. But Redis is an open source database provided by BSD. Redis means Remote Dictionary Server. It is often called a data structure server as its key contains many data types, such as string hashes, in which the key and values are string, sorted sets of strings, stored sets, and lists of strings. We can perform atomic operations in Redis. Redis stores the whole dataset in memory, so we can persist it by dumping the data to disk, because Redis synchronizes data to the disk every 2 seconds. So if the system fails, we lose the data for only a few seconds. Another way to persist it is by appending each command to the log.

Redis supports master-slave replication, which allows the slave Redis server to be the same copy of the master Redis server with non-blocking replication. On the master-side server, it will handle queries when one or more slaves perform initial synchronization. On the other side, while the slave server performs initial synchronization, it will still handle queries using the old dataset. But in DynamoDB, replication is done at a scheduled time; a continuous data replication is not performed. So in this case, if a primary DynamoDB table loses its data, there can be data loss while restoring from the backups. In Redis, slaves, are able to accept connections from several slaves apart from slaves from the same master server. But in DynamoDB, the tables should be on the same AWS account.

Other features of Redis include transactions, pub/sub, Lua scripting capabilities, keys with a restricted time to live, and other configuration settings, which will allow Redis to work like a cache. Redis is written in ANSI C and works in almost all operating systems, such as Linux, BSD, and OS X without any external dependencies.

While data durability is not the major concern, the in-memory environment of Redis allows it to perform extremely well compared to database systems that write every update or change to disk before allowing a committed transaction. There is no prominent speed difference between read and write IOs. Redis works as a single process and is single-threaded. Hence a single Redis instance cannot apply parallel execution of tasks such as stored procedures. Redis is mainly used for rapidly changing data with a predictable database size that should mostly fit in-memory. So Redis is used in real-time applications such as storing real-time stock prices, real-time communication, real-time analytics, and leaderboards. We can use it as an option for memory cache too. Let's move on to the tabular comparison between these databases, as follows:

Specification

DynamoDB

Redis

Data model

Key-value store

Key-value store

Operating system

Cross platform (hosted)

BSD

Linux

OS X

Solaris

Programming language

Ruby

C

License

Commercial (Amazon)

Open source with BSD-license

Data storage

Solid-state drive (SSD)

In-memory dataset (RAM)

Secondary indexes

Yes

No

Accessing method

RESTful HTTP API call

API call

Lua

Server-side script

No

Lua

Triggers

No

No

Partitioning

Sharding

None

MapReduce

No (can be done with other services of AWS)

No

Composite key support

Yes

No

Atomicity

Yes

Yes

Data consistency

Yes

Yes

Isolation

Yes

Yes

Durability

Yes

Yes

Transactions

No

Optimistic locking

Concurrency control

ACID

Locks

Partition tolerance

No

Yes

Persistence

No

Yes

High availability

Yes

No

Referential integrity

No

No

Revision control

Yes

No

Function-based index

Yes

No

Full text search

No

No

Geospatial indexes

No

No

Horizontal scalability

Yes

Yes

Replication method

Master-slave replica

(asynchronized)

Master-slave replica

(synchronized)

Largest value supported

64 KB

512 MB

Object-relational mapping

No

Yes

Log support

No

Yes

Operations per second

1000

140000

Free for commercial use

No

Yes (Up to some memory usage)

Deployment policy

Only with AWS

On premises

Easy to use

No

Yes

Backup

Scheduled (to configure)

Autosync frequently

User concepts

Access rights for users and roles can be defined via the AWS Identity and Access Management (IAM)

A very simple password-based access control

Best use

Rapidly varying data, frequently written, rarely read statistical data

Large to small database solution

So as per the previous comparisons, you can easily identify the most suitable NoSQL data service to work with your dynamic applications. In short, it provides the following features on the Amazon-distributed infrastructure and robust platform:

  • Seamless scaling

  • Secondary indexes

  • Schema-less

  • Strong consistency, atomic counters

  • Integrated monitoring

  • Secure

  • Elastic MapReduce and Redshift, and data-pipeline integration

  • Management console and APIs

lock icon
The rest of the chapter is locked
You have been reading a chapter from
DynamoDB Applied Design Patterns
Published in: Sep 2014Publisher: ISBN-13: 9781783551897
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime