Reader small image

You're reading from  Mastering DynamoDB

Product typeBook
Published inAug 2014
PublisherPackt
ISBN-139781783551958
Edition1st Edition
Concepts
Right arrow
Author (1)
Tanmay Deshpande
Tanmay Deshpande
author image
Tanmay Deshpande

Tanmay Deshpande is a Hadoop and big data evangelist. He currently works with Schlumberger as a Big Data Architect in Pune, India. He has interest in a wide range of technologies, such as Hadoop, Hive, Pig, NoSQL databases, Mahout, Sqoop, Java, cloud computing, and so on. He has vast experience in application development in various domains, such as oil and gas, finance, telecom, manufacturing, security, and retail. He enjoys solving machine-learning problems and spends his time reading anything that he can get his hands on. He has great interest in open source technologies and has been promoting them through his talks. Before Schlumberger, he worked with Symantec, Lumiata, and Infosys. Through his innovative thinking and dynamic leadership, he has successfully completed various projects. He regularly blogs on his website http://hadooptutorials.co.in. You can connect with him on LinkedIn at https://www.linkedin.com/in/deshpandetanmay/. He has also authored Mastering DynamoDB, published in August 2014, DynamoDB Cookbook, published in September 2015, Hadoop Real World Solutions Cookbook-Second Edition, published in March 2016, Hadoop: Data Processing and Modelling, published in August, 2016, and Hadoop Blueprints, published in September 2016, all by Packt Publishing.
Read more about Tanmay Deshpande

Right arrow

Chapter 4. Best Practices

When it comes to public cloud, most of the time each operation means money, be it a read operation or a write. Each operation gets counted in terms of capacity units or in terms of the number of calls made to the database. So while working on cloud, we have to be extremely careful about the usage, and we also need to make sure that the bills are constant and do not end up as a surprise to any organization.

Until now, we have seen various features of DynamoDB, its internals and how they work, and how to add/delete/update data to and from DynamoDB. Now that you have learned most of the details from DynamoDB's usage point view, it's time to learn some best practices one should follow in order to make the most of DynamoDB. I am sure the best practices we are going to cover in this chapter would certainly help in saving some bucks for you and your organization.

In this chapter, we will cover the following topics:

  • Table-level best practices

  • Item-level best practices

  • Index...

Table level best practices


We have already seen what a table means and how it used. There are various techniques with which we can maximize the table read/write efficiency.

Choosing a primary key

We have seen the primary key representations of DynamoDB, that is, the hash key and composite hash and range key. The hash key value decides how the items would get distributed across multiple nodes and the level parallelism. It's quite possible that some of the items in a table would be used heavily compared to others. In that case, one particular partition would be used frequently, and the rest of the partitions would range from unused to less-used, which is a bad thing considering the performance and throughput of the system. Now let's discuss some best practices in choosing the right hash key and composite hash and range key.

It is recommended that you should design your tables such that hash key of the table would be having the variety of data. It does not mean that your application must access...

Item best practices


There can be various ways in which we can improve item access, some of which we are going to discuss in this section.

Caching

Sometimes, we might need to use a certain item or set of items more frequently than others. Also, there is a good chance that lesser value updates will be made for such items. In this case, you can use caching to store items at cache level, and whenever required, you can simply fetch that from cache. The use of cache reduces the number of calls made to DynamoDB, hence improving the time and cost efficiency.

For example, you have a lookup table whose values are fixed and do not change over time, and there are a few items in that table that are very popular. In that case, you can simply use caching to store these items. For the very first time, when cache is blank, we would be fetching the data from the actual table itself.

The next time onwards, the program should check if the entry is present for the item in cache. If yes, then directly use that value...

Query and scan best practices


Query and scan, as we know, are heavy operations and mostly deal with read capacity units provisioned for the particular table. It is very important to take care of even distribution of load considering that the read capacity units get utilized properly. Here are some best practices that you should follow in order to avoid getting exceptions about exceeding provisioned throughput.

Maintaining even read activity

We know that a scan operation fetches 1 MB of data for a single request per page. We also know that an eventually consistent read operation consumes two 4 KB read capacity units per second. This means that a single scan operation costs (1 MB / 4 KB items / two eventually consistent reads) = 128 reads, which would be quite high if you have set your provisioned throughput very low. This sudden burst of data would cause throttling of the provisioned throughput for the given table. Also, meanwhile, if you get a very important request, that request would get...

Local secondary indexes best practices


We have seen what local secondary indexes mean in Chapter 2, Data Models. Just to revise, they are secondary indexes that you can define on certain attributes, and which can be used as another range key along with your table hash key. As we have seen, since DynamoDB needs to maintain a complete separate index for these indexes, we have to allocate more resources to it, which makes it a costly affair. So, it is very important to decide on the attribute on which you wish to define the secondary index. It is recommended that the attribute you are not going to query much should not be defined as local secondary index. Indexing should be done for the tables that do not get heavy writes as maintaining those indexes is quite costly.

Indexes should be put on tables that contains sparse data, and which are infrequently updated. It has been observed that the smaller the index, the better the performance. A secondary index consists of an index plus projected attributes...

Global secondary index best practices


Global secondary indexes allow us to create alternate hash and range keys on non-primary key attributes. Querying is made quite an easy task with secondary indexes. There are various best practices one should follow while using global secondary indexes. We are going to discuss all such best practices in this section.

As we keep saying, it is very important for us to choose the correct hash and range keys attributes, which would be distributing the load evenly across the partitions. We need to choose the attributes having a variety of values as hash and range keys. Consider an example of a student table where we have columns such as roll number, name, grade, and marks. Here, the grade column would have values like A, B, C, and D, while the marks column would have marks obtained by a particular student. Here, we have seen that the grades column has a very limited number of unique values. So, if we create an index on this column, then most of the values...

Summary


In this chapter, we have gone through some best practices that one should follow in order to get the maximum out of DynamoDB. We started with table best practices where we talked about how to choose correct primary keys, how to create table schemas, how to manage the time series data, and so on. In item best practices, we talked about caching, storing large attributes, one-to- many data modeling, and so on. In query and scan best practices, we saw how to maintain even data load to improve query performance. We also discussed the use of parallel scans and its benefits.

In the last section, we talked about local and global secondary best practices. A good understanding of DynamoDB architecture would help you to find more such best practices, which in turn would help you reduce cost and improve performance. So keep learning and keep exploring.

In the next chapter, we will cover some advanced topics, such as DynamoDB monitoring, common useful tools, libraries, AWS authentication service...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering DynamoDB
Published in: Aug 2014Publisher: PacktISBN-13: 9781783551958
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Tanmay Deshpande

Tanmay Deshpande is a Hadoop and big data evangelist. He currently works with Schlumberger as a Big Data Architect in Pune, India. He has interest in a wide range of technologies, such as Hadoop, Hive, Pig, NoSQL databases, Mahout, Sqoop, Java, cloud computing, and so on. He has vast experience in application development in various domains, such as oil and gas, finance, telecom, manufacturing, security, and retail. He enjoys solving machine-learning problems and spends his time reading anything that he can get his hands on. He has great interest in open source technologies and has been promoting them through his talks. Before Schlumberger, he worked with Symantec, Lumiata, and Infosys. Through his innovative thinking and dynamic leadership, he has successfully completed various projects. He regularly blogs on his website http://hadooptutorials.co.in. You can connect with him on LinkedIn at https://www.linkedin.com/in/deshpandetanmay/. He has also authored Mastering DynamoDB, published in August 2014, DynamoDB Cookbook, published in September 2015, Hadoop Real World Solutions Cookbook-Second Edition, published in March 2016, Hadoop: Data Processing and Modelling, published in August, 2016, and Hadoop Blueprints, published in September 2016, all by Packt Publishing.
Read more about Tanmay Deshpande