Chapter 5. Query and Scan Operations in DynamoDB
In the previous chapter, we learned to create a secondary index for a table and its role in retrieving the items efficiently. In the long run, knowledge of the secondary index is useful only if we know how to use it for retrieval. Item retrieval can be done in DynamoDB using two operations called query and scan. Similarly we also discussed sharding. In this chapter, we will learn about parallel scanning, which makes use of the sharding concept. The primary objective of any database (whether it be NoSQL or SQL) is to provide easy storage and faster retrieval of data. So far, we have discussed various configurations that can be added to our table, such as adding an index, specifying the primary key, and so on. In this chapter, we will cover the following topics:
Querying table items
Scanning table items
Parallel scanning
First, we will discuss the query operation, which makes use of the hash and range key values to retrieve the items. Then we will...
One of the most efficient ways of retrieving data from a DynamoDB table is by using the query operation on the table. One of the mandatory parameters or conditions to be provided while performing a query operation is performing a comparison operation on the primary key attribute value. The query operation supports the following comparison operations, namely:
EQ
: This stands for equal to
LE
: This stands for less than or equal to
LT
: This stands for less than
GE
: This stands for greater than or equal to
GT
: This stands for greater than
BETWEEN
: This retrieves items whose primary key value is between the specified values
BEGINS_WITH
: This retrieves items whose primary key begins with the specified value
These seven comparison operations can be performed directly on primary key values, which will retrieve only the necessary items (without even bothering the partitions/items that don't have this value). There are six more comparison operations that can be performed on the items...
A scan operation evaluates each and every item in the table. Usually, it retrieves every item (with all the attributes along with all the items) of the table. This is the reason why the scan operation is not preferred. It is always recommended that you use query whenever possible. However, it is possible for us to retrieve only specific attributes using the AttributesToGet
parameter, similar to the way we saw with query. Additionally, we can filter the number of items retrieved by the scan using the scan filter condition. For instance, if we assume that there are 100 items available in the table, and if the scan filter filters out 10 items using strong consistent read (which consumes a maximum of 1 KB capacity units per item), can you tell how many capacity units were eaten up by this scan operation? If you think it consumes 100 capacity units, then you're in the right boat, because the capacity unit is not a measure of how many items (hoping that every item is less than...
As we discussed in DynamoDB sharding, the table data is partitioned based on the hash key value. Even though this sharding will smoothen the read and write operations, it doesn't help us to scan the partitions in parallel. For example, if the table data is available in five partitions (each partition has a throughput capacity of five units), then even if the table could provision more than five capacity units, it cannot do so. The maximum throughput capacity of the table cannot exceed the fastest (having high throughput) partition. So based on these facts, what we infer is:
A scan operation will return maximum 1 MB of data at a time
Scan operations can read data from only one partition at a time
For a larger table, no matter how large the throughput is, a sequential scan will always take too much time
The scanning speed can never be faster than the fastest partition (having high throughput)
To put it simply, even if our television has one hundred channels, we will be able to...
In this chapter, we learned to perform simple query and scan operations on the DynamoDB table and its secondary indexes. Finally, we have also seen parallel scanning, which is good for growing and high-priority tables.
Web services and REST API are becoming more and more advanced with every passing day, mainly because of their platform-independent language. So in the next chapter, we will learn the basics of REST and how to effectively perform DynamoDB operations using REST.