Packt+ | Advance your knowledge in tech

You're reading from DynamoDB Cookbook

Product typeBook

Published inSep 2015

Publisher

ISBN-139781784393755

Edition1st Edition

Tools

DynamoDB

Concepts

Databases

Author (1)

Tanmay Deshpande

Chapter 8. Integrating DynamoDB with other AWS Services

In this chapter, we will cover the following topics:

Importing data from AWS S3 to DynamoDB using AWS Data Pipeline
Exporting data from AWS S3 to DynamoDB using AWS Data Pipeline
Accessing the DynamoDB data using AWS EMR
Querying the DynamoDB data using AWS EMR
Performing join operations on the DynamoDB data using AWS EMR
Exporting data to AWS S3 from DynamoDB using AWS EMR
Logging DynamoDB operations using AWS CloudTrail
Exporting the DynamoDB data to AWS Redshift
Importing the DynamoDB data to AWS CloudSearch
Performing a full text search on the DynamoDB data using CloudSearch

Introduction

In the previous chapter, we discussed the various best practices that one should follow in order to make the most of DynamoDB's features. In this chapter, we will focus on how to integrate DynamoDB with other AWS services so that we can have our complete application stack in one place itself. Here, we will also explore various data import and export techniques, which can be used easily.

Importing data from AWS S3 to DynamoDB using AWS Data Pipeline

In this recipe, we will see how to import data from AWS S3 and insert it into the DynamoDB table using AWS Data Pipeline. AWS Data Pipeline helps us create various workflows that can be used to manage the data flow between various AWS services. You can read more about AWS pipeline at https://aws.amazon.com/datapipeline/.

Getting ready

To get started with this recipe, you should know how to use the DynamoDB console. Also, you should have created a table called productTable, and you should also have an AWS S3 bucket containing the data to be imported to DynamoDB. The data needs to be in a special format so that the process should be able to identify the attributes and its data types easily. Here is the sample data:

features{"m":{"screen":{"s":"4.7\" LED-backlit IPS LCD Multi-Touchscreen Shatter proof glass"},"camera":{"s":"8MP"},"intMem":{"s":"128GB"},"processor":{"s":"Dual-Core 1.4 GHz Cyclone (ARM v8-based)"}}}id{"s":"2"}price{...

Exporting data from AWS S3 to DynamoDB using AWS Data Pipeline

In this recipe, we will see how to export data from the DynamoDB table to S3 using the AWS Pipeline.

Getting ready

To get started, you need to have a table created and add a few entries to it.

How to do it…

Let's start with creating a pipeline that accepts the details, which can be used anytime we want to execute the export operation:

Go to the AWS Data Pipeline console (https://console.aws.amazon.com/datapipeline/).
Click on the Create Pipeline button. Enter the details of the data pipeline configurations in the form, as shown in the following screenshot. Here, we will add details, such as the pipeline name, description, source DynamoDB table name, target S3 folder, and so on. We select a built-in template to export the DynamoDB data to S3:
Next, we need to provide the details of enabling the logging. Here, we need to provide the S3 folder location so that if there are any errors and issues, we should be able to debug them from the...

Accessing the DynamoDB data using AWS EMR

AWS Elastic MapReduce (EMR) has hosted Hadoop as a service from Amazon. As Hadoop has become one of the most important ETL/analytics tools these days, it is very important to know how to access the DynamoDB data from EMR so that we can use it for analytics. In this recipe, we are going to see how to access the DynamoDB data from EMR for analytics/querying.

Getting ready

To get started, you need to have a DynamoDB table created, and you should have data in it. Also, you need to have a secret key created, which will be used to connect to the EMR cluster using Putty or ssh on the UNIX system. In case you haven't, read the documentation at http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMR_SetUp_KeyPair.html.

This generated .pem key can be converted into a private key (.ppk), which can be used in Putty. You can refer to the following docs:

Querying the DynamoDB data using AWS EMR

In the previous recipe, we have seen how to access the DynamoDB data from AWS EMR. In this recipe, we are going to see how to query DynamoDB using AWS EMR.

Getting ready

To perform this recipe, you should have performed the earlier recipe and have your EMR cluster still running.

How to do it…

Here, we will use productHiveTable, which we created in the previous recipe. In this recipe, we will see how easy it is to query the DynamoDB data using EMR:

To get started, connect to your EMR cluster and start Hive.
In our e-commerce application, we would like to query the product catalogue data in various ways. With DynamoDB being a NoSQL database, we can only query on hash or range keys themselves, which sometimes makes querying difficult. Now, we can use Hive to effectively query the DynamoDB data.
Let's start with our first query to count the total number of products in our DynamoDB table. For this, we need to execute the following query:
```
hive> select count...
```

Performing join operations on the DynamoDB data using AWS EMR

In the previous recipe, we saw how to use EMR to access the DynamoDB data and query the same as well. In this recipe, we will see how to join two DynamoDB tables in order to get the combined view.

Getting ready

To perform this recipe, you should have performed the earlier recipe and should have your EMR cluster still running.

How to do it…

Here, we will use two tables: one is the Customer table, and the other one is the Orders table. The Customer table contains detailed information of the customer, while the Order table contains the details of the order, along with customerId, which provides a link between these two tables. Now we want to execute queries that need information from both tables, which cannot be achieved solely by DynamoDB, and so, we use EMR:

To get started, we need to make sure that we have two tables created, as mentioned earlier. Now, we will connect to the EMR cluster, and we will create two Hive tables corresponding...

Exporting data to AWS S3 from DynamoDB using AWS EMR

In the first recipe of this chapter, we saw how to use the AWS Pipeline to export the DynamoDB data to S3. The AWS Pipeline creation and execution is easy and quick, but we have very little control on things that happen in the pipeline, so we now are going to talk about one recipe that will explain how to export the DynamoDB data to S3 using EMR.

Getting ready

To perform this recipe, you should have performed the earlier recipe and have your EMR cluster still running.

How to do it…

Let's export data to AWS S3 from DynamoDB:

To perform this recipe, we need to create two tables. In the earlier recipes, we have already created productHiveTable, as shown in the following code:

CREATE EXTERNAL TABLE productHiveTable (
id string, type string, mnfr string, name string,
price bigint, stock bigint)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "product","dynamodb.column.mapping" = "id:id,type...

Logging DynamoDB operations using AWS CloudTrail

This is a very simple recipe that helps us enable the logging of any DynamoDB operations using CloudTrail. CloudTrail is a global logging service by AWS, which allows us to create logs of all the events that happened on scribed services. It creates JSON documents for every operation that occurs and saves it in the provided S3 bucket.

Getting ready

To perform this recipe, you need to know how to use the DynamoDB console.

How to do it…

Let's log into DynamoDB by using AWS CloudTrail:

To keep track of events happening in various AWS services, we need to first enable logging of the CloudTrail events. To do so, first login to the AWS CloudTrail console, which is available at: https://console.aws.amazon.com/cloudtrail.
We enable the CloudTrail logs and also provide the S3 bucket location, where we would like to see the event to be saved:
By clicking on the Advanced link, you can also enable publishing the availability of the CloudTrail events in the S3...

Exporting the DynamoDB data to AWS Redshift

AWS provides the petabyte-scale data warehouse as a service in Cloud. It provides us with SQL-like tools to perform business intelligence on virtually any size data. It is quite natural to we use DynamoDB as our application database. We would need to use the data warehouse tools to do analytics. So, in this recipe, we will see how to launch the Redshift cluster and import the DynamoDB data to it.

Getting ready

To perform this recipe, you need to know how to use the DynamoDB console. Also, follow the instructions to install the prerequisites for AWS Redshift from http://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-prereq.html.

How to do it…

To get started, we will first see how to launch the Redshift cluster, and then use the COPY command to import the DynamoDB data into it:

Go to the AWS Redshift console (https://console.aws.amazon.com/redshift/):
Click on the Launch Cluster button. On the next screen, you will need to enter the details for the cluster...

Importing the DynamoDB data to AWS CloudSearch

AWS CloudSearch is a search engine in AWS that allows us to perform full text searches on the documents uploaded. Many times, this functionality is helpful, as you only query hash and range keys in DynamoDB for exact searches. In this recipe, we are going to see how to integrate DynamoDB with CloudSearch so that we can search data more effectively.

Getting ready

To perform this recipe, you need to know how to use the AWS console.

How to do it…

To get started, we will first see how to launch the CloudSearch domain and import the DynamoDB table to it:

Go to the AWS CloudSearch console (http://console.aws.amazon.com/cloudsearch/).
Click on the Create New Domain button, which will prompt you fill in a form to include details. Here, you need to specify the domain name, number of instances, and instance types:
On the next screen, you need to choose where to import the data from. Here, select Analyze sample item(s) from Amazon DynamoDB. I also need to select...

Performing a full text search on the DynamoDB data using CloudSearch

In the previous recipe, we saw how to import DynamoDB to CloudSearch. In this recipe, we will see how to perform a full text search on the same data.

Getting ready

To perform this recipe, you should have performed the earlier recipe.

How to do it…

Let's perform a full text search on the DynamoDB:

CloudSearch gives us a built-in capability to perform a full text, faceted search. To get started, we need to click on the Run a Test Search link:
On this screen, you will see a textbox where you can type your query, for example, I need to search a query 'Samsung'; then, I need to type this in the textbox and click on the Go button. It will search in documents and return the results.
You can also perform a search on a specific attribute by expanding the Options section on same screen. For example, if I want to search only those products that are manufactured by PacktPub, I can do this in the following manner, as shown in the following...

The rest of the chapter is locked

You have been reading a chapter from

DynamoDB Cookbook

Published in: Sep 2015Publisher: ISBN-13: 9781784393755

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Tanmay Deshpande

Tanmay Deshpande is a Hadoop and big data evangelist. He currently works with Schlumberger as a Big Data Architect in Pune, India. He has interest in a wide range of technologies, such as Hadoop, Hive, Pig, NoSQL databases, Mahout, Sqoop, Java, cloud computing, and so on. He has vast experience in application development in various domains, such as oil and gas, finance, telecom, manufacturing, security, and retail. He enjoys solving machine-learning problems and spends his time reading anything that he can get his hands on. He has great interest in open source technologies and has been promoting them through his talks. Before Schlumberger, he worked with Symantec, Lumiata, and Infosys. Through his innovative thinking and dynamic leadership, he has successfully completed various projects. He regularly blogs on his website http://hadooptutorials.co.in. You can connect with him on LinkedIn at https://www.linkedin.com/in/deshpandetanmay/. He has also authored Mastering DynamoDB, published in August 2014, DynamoDB Cookbook, published in September 2015, Hadoop Real World Solutions Cookbook-Second Edition, published in March 2016, Hadoop: Data Processing and Modelling, published in August, 2016, and Hadoop Blueprints, published in September 2016, all by Packt Publishing.
Read more about Tanmay Deshpande

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages