Packt+ | Advance your knowledge in tech

You're reading from Mastering DynamoDB

Product typeBook

Published inAug 2014

PublisherPackt

ISBN-139781783551958

Edition1st Edition

Tools

DynamoDB

Concepts

Databases

Author (1)

Tanmay Deshpande

Chapter 4. Best Practices

When it comes to public cloud, most of the time each operation means money, be it a read operation or a write. Each operation gets counted in terms of capacity units or in terms of the number of calls made to the database. So while working on cloud, we have to be extremely careful about the usage, and we also need to make sure that the bills are constant and do not end up as a surprise to any organization.

Until now, we have seen various features of DynamoDB, its internals and how they work, and how to add/delete/update data to and from DynamoDB. Now that you have learned most of the details from DynamoDB's usage point view, it's time to learn some best practices one should follow in order to make the most of DynamoDB. I am sure the best practices we are going to cover in this chapter would certainly help in saving some bucks for you and your organization.

In this chapter, we will cover the following topics:

Table-level best practices
Item-level best practices
Index...

Table level best practices

We have already seen what a table means and how it used. There are various techniques with which we can maximize the table read/write efficiency.

Choosing a primary key

We have seen the primary key representations of DynamoDB, that is, the hash key and composite hash and range key. The hash key value decides how the items would get distributed across multiple nodes and the level parallelism. It's quite possible that some of the items in a table would be used heavily compared to others. In that case, one particular partition would be used frequently, and the rest of the partitions would range from unused to less-used, which is a bad thing considering the performance and throughput of the system. Now let's discuss some best practices in choosing the right hash key and composite hash and range key.

It is recommended that you should design your tables such that hash key of the table would be having the variety of data. It does not mean that your application must access...

Item best practices

There can be various ways in which we can improve item access, some of which we are going to discuss in this section.

Caching

Sometimes, we might need to use a certain item or set of items more frequently than others. Also, there is a good chance that lesser value updates will be made for such items. In this case, you can use caching to store items at cache level, and whenever required, you can simply fetch that from cache. The use of cache reduces the number of calls made to DynamoDB, hence improving the time and cost efficiency.

For example, you have a lookup table whose values are fixed and do not change over time, and there are a few items in that table that are very popular. In that case, you can simply use caching to store these items. For the very first time, when cache is blank, we would be fetching the data from the actual table itself.

The next time onwards, the program should check if the entry is present for the item in cache. If yes, then directly use that value...

Query and scan best practices

Query and scan, as we know, are heavy operations and mostly deal with read capacity units provisioned for the particular table. It is very important to take care of even distribution of load considering that the read capacity units get utilized properly. Here are some best practices that you should follow in order to avoid getting exceptions about exceeding provisioned throughput.

Maintaining even read activity

We know that a scan operation fetches 1 MB of data for a single request per page. We also know that an eventually consistent read operation consumes two 4 KB read capacity units per second. This means that a single scan operation costs (1 MB / 4 KB items / two eventually consistent reads) = 128 reads, which would be quite high if you have set your provisioned throughput very low. This sudden burst of data would cause throttling of the provisioned throughput for the given table. Also, meanwhile, if you get a very important request, that request would get...

Local secondary indexes best practices

We have seen what local secondary indexes mean in Chapter 2, Data Models. Just to revise, they are secondary indexes that you can define on certain attributes, and which can be used as another range key along with your table hash key. As we have seen, since DynamoDB needs to maintain a complete separate index for these indexes, we have to allocate more resources to it, which makes it a costly affair. So, it is very important to decide on the attribute on which you wish to define the secondary index. It is recommended that the attribute you are not going to query much should not be defined as local secondary index. Indexing should be done for the tables that do not get heavy writes as maintaining those indexes is quite costly.

Indexes should be put on tables that contains sparse data, and which are infrequently updated. It has been observed that the smaller the index, the better the performance. A secondary index consists of an index plus projected attributes...

Global secondary index best practices

Global secondary indexes allow us to create alternate hash and range keys on non-primary key attributes. Querying is made quite an easy task with secondary indexes. There are various best practices one should follow while using global secondary indexes. We are going to discuss all such best practices in this section.

As we keep saying, it is very important for us to choose the correct hash and range keys attributes, which would be distributing the load evenly across the partitions. We need to choose the attributes having a variety of values as hash and range keys. Consider an example of a student table where we have columns such as roll number, name, grade, and marks. Here, the grade column would have values like A, B, C, and D, while the marks column would have marks obtained by a particular student. Here, we have seen that the grades column has a very limited number of unique values. So, if we create an index on this column, then most of the values...

Summary

In this chapter, we have gone through some best practices that one should follow in order to get the maximum out of DynamoDB. We started with table best practices where we talked about how to choose correct primary keys, how to create table schemas, how to manage the time series data, and so on. In item best practices, we talked about caching, storing large attributes, one-to- many data modeling, and so on. In query and scan best practices, we saw how to maintain even data load to improve query performance. We also discussed the use of parallel scans and its benefits.

In the last section, we talked about local and global secondary best practices. A good understanding of DynamoDB architecture would help you to find more such best practices, which in turn would help you reduce cost and improve performance. So keep learning and keep exploring.

In the next chapter, we will cover some advanced topics, such as DynamoDB monitoring, common useful tools, libraries, AWS authentication service...

The rest of the chapter is locked

You have been reading a chapter from

Mastering DynamoDB

Published in: Aug 2014Publisher: PacktISBN-13: 9781783551958

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Tanmay Deshpande

Tanmay Deshpande is a Hadoop and big data evangelist. He currently works with Schlumberger as a Big Data Architect in Pune, India. He has interest in a wide range of technologies, such as Hadoop, Hive, Pig, NoSQL databases, Mahout, Sqoop, Java, cloud computing, and so on. He has vast experience in application development in various domains, such as oil and gas, finance, telecom, manufacturing, security, and retail. He enjoys solving machine-learning problems and spends his time reading anything that he can get his hands on. He has great interest in open source technologies and has been promoting them through his talks. Before Schlumberger, he worked with Symantec, Lumiata, and Infosys. Through his innovative thinking and dynamic leadership, he has successfully completed various projects. He regularly blogs on his website http://hadooptutorials.co.in. You can connect with him on LinkedIn at https://www.linkedin.com/in/deshpandetanmay/. He has also authored Mastering DynamoDB, published in August 2014, DynamoDB Cookbook, published in September 2015, Hadoop Real World Solutions Cookbook-Second Edition, published in March 2016, Hadoop: Data Processing and Modelling, published in August, 2016, and Hadoop Blueprints, published in September 2016, all by Packt Publishing.
Read more about Tanmay Deshpande

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages