Packt+ | Advance your knowledge in tech

You're reading from DynamoDB Applied Design Patterns

Product type Book

Published in Sep 2014

Publisher

ISBN-13 9781783551897

Pages 202 pages

Edition 1st Edition

Languages

Concepts

Design Patterns

Author (1):

Uchit Hamendra Vyas

Table of Contents (17) Chapters

DynamoDB Applied Design Patterns

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Preface

1. Data Modeling with DynamoDB

2. DynamoDB Interfaces

3. Tools and Libraries of AWS DynamoDB

4. Working with Secondary Indexes

5. Query and Scan Operations in DynamoDB

6. Working with the DynamoDB API

7. Distributed Locking with DynamoDB

8. DynamoDB with Redshift, Data Pipeline, and MapReduce

9. DynamoDB – Best Practices

Comparing DynamoDB

Index

Chapter 5. Query and Scan Operations in DynamoDB

In the previous chapter, we learned to create a secondary index for a table and its role in retrieving the items efficiently. In the long run, knowledge of the secondary index is useful only if we know how to use it for retrieval. Item retrieval can be done in DynamoDB using two operations called query and scan. Similarly we also discussed sharding. In this chapter, we will learn about parallel scanning, which makes use of the sharding concept. The primary objective of any database (whether it be NoSQL or SQL) is to provide easy storage and faster retrieval of data. So far, we have discussed various configurations that can be added to our table, such as adding an index, specifying the primary key, and so on. In this chapter, we will cover the following topics:

Querying table items
Scanning table items
Parallel scanning

First, we will discuss the query operation, which makes use of the hash and range key values to retrieve the items. Then we will...

Querying tables

One of the most efficient ways of retrieving data from a DynamoDB table is by using the query operation on the table. One of the mandatory parameters or conditions to be provided while performing a query operation is performing a comparison operation on the primary key attribute value. The query operation supports the following comparison operations, namely:

EQ: This stands for equal to
LE: This stands for less than or equal to
LT: This stands for less than
GE: This stands for greater than or equal to
GT: This stands for greater than
BETWEEN: This retrieves items whose primary key value is between the specified values
BEGINS_WITH: This retrieves items whose primary key begins with the specified value

These seven comparison operations can be performed directly on primary key values, which will retrieve only the necessary items (without even bothering the partitions/items that don't have this value). There are six more comparison operations that can be performed on the items...

Scanning tables

A scan operation evaluates each and every item in the table. Usually, it retrieves every item (with all the attributes along with all the items) of the table. This is the reason why the scan operation is not preferred. It is always recommended that you use query whenever possible. However, it is possible for us to retrieve only specific attributes using the AttributesToGet parameter, similar to the way we saw with query. Additionally, we can filter the number of items retrieved by the scan using the scan filter condition. For instance, if we assume that there are 100 items available in the table, and if the scan filter filters out 10 items using strong consistent read (which consumes a maximum of 1 KB capacity units per item), can you tell how many capacity units were eaten up by this scan operation? If you think it consumes 100 capacity units, then you're in the right boat, because the capacity unit is not a measure of how many items (hoping that every item is less than...

Parallel scanning

As we discussed in DynamoDB sharding, the table data is partitioned based on the hash key value. Even though this sharding will smoothen the read and write operations, it doesn't help us to scan the partitions in parallel. For example, if the table data is available in five partitions (each partition has a throughput capacity of five units), then even if the table could provision more than five capacity units, it cannot do so. The maximum throughput capacity of the table cannot exceed the fastest (having high throughput) partition. So based on these facts, what we infer is:

A scan operation will return maximum 1 MB of data at a time
Scan operations can read data from only one partition at a time
For a larger table, no matter how large the throughput is, a sequential scan will always take too much time
The scanning speed can never be faster than the fastest partition (having high throughput)

To put it simply, even if our television has one hundred channels, we will be able to...

Summary

In this chapter, we learned to perform simple query and scan operations on the DynamoDB table and its secondary indexes. Finally, we have also seen parallel scanning, which is good for growing and high-priority tables.

Web services and REST API are becoming more and more advanced with every passing day, mainly because of their platform-independent language. So in the next chapter, we will learn the basics of REST and how to effectively perform DynamoDB operations using REST.