Reader small image

You're reading from  DynamoDB Cookbook

Product typeBook
Published inSep 2015
Publisher
ISBN-139781784393755
Edition1st Edition
Concepts
Right arrow
Author (1)
Tanmay Deshpande
Tanmay Deshpande
author image
Tanmay Deshpande

Tanmay Deshpande is a Hadoop and big data evangelist. He currently works with Schlumberger as a Big Data Architect in Pune, India. He has interest in a wide range of technologies, such as Hadoop, Hive, Pig, NoSQL databases, Mahout, Sqoop, Java, cloud computing, and so on. He has vast experience in application development in various domains, such as oil and gas, finance, telecom, manufacturing, security, and retail. He enjoys solving machine-learning problems and spends his time reading anything that he can get his hands on. He has great interest in open source technologies and has been promoting them through his talks. Before Schlumberger, he worked with Symantec, Lumiata, and Infosys. Through his innovative thinking and dynamic leadership, he has successfully completed various projects. He regularly blogs on his website http://hadooptutorials.co.in. You can connect with him on LinkedIn at https://www.linkedin.com/in/deshpandetanmay/. He has also authored Mastering DynamoDB, published in August 2014, DynamoDB Cookbook, published in September 2015, Hadoop Real World Solutions Cookbook-Second Edition, published in March 2016, Hadoop: Data Processing and Modelling, published in August, 2016, and Hadoop Blueprints, published in September 2016, all by Packt Publishing.
Read more about Tanmay Deshpande

Right arrow

Chapter 8. Integrating DynamoDB with other AWS Services

In this chapter, we will cover the following topics:

  • Importing data from AWS S3 to DynamoDB using AWS Data Pipeline

  • Exporting data from AWS S3 to DynamoDB using AWS Data Pipeline

  • Accessing the DynamoDB data using AWS EMR

  • Querying the DynamoDB data using AWS EMR

  • Performing join operations on the DynamoDB data using AWS EMR

  • Exporting data to AWS S3 from DynamoDB using AWS EMR

  • Logging DynamoDB operations using AWS CloudTrail

  • Exporting the DynamoDB data to AWS Redshift

  • Importing the DynamoDB data to AWS CloudSearch

  • Performing a full text search on the DynamoDB data using CloudSearch

Introduction


In the previous chapter, we discussed the various best practices that one should follow in order to make the most of DynamoDB's features. In this chapter, we will focus on how to integrate DynamoDB with other AWS services so that we can have our complete application stack in one place itself. Here, we will also explore various data import and export techniques, which can be used easily.

Importing data from AWS S3 to DynamoDB using AWS Data Pipeline


In this recipe, we will see how to import data from AWS S3 and insert it into the DynamoDB table using AWS Data Pipeline. AWS Data Pipeline helps us create various workflows that can be used to manage the data flow between various AWS services. You can read more about AWS pipeline at https://aws.amazon.com/datapipeline/.

Getting ready

To get started with this recipe, you should know how to use the DynamoDB console. Also, you should have created a table called productTable, and you should also have an AWS S3 bucket containing the data to be imported to DynamoDB. The data needs to be in a special format so that the process should be able to identify the attributes and its data types easily. Here is the sample data:

features{"m":{"screen":{"s":"4.7\" LED-backlit IPS LCD Multi-Touchscreen Shatter proof glass"},"camera":{"s":"8MP"},"intMem":{"s":"128GB"},"processor":{"s":"Dual-Core 1.4 GHz Cyclone (ARM v8-based)"}}}id{"s":"2"}price{...

Exporting data from AWS S3 to DynamoDB using AWS Data Pipeline


In this recipe, we will see how to export data from the DynamoDB table to S3 using the AWS Pipeline.

Getting ready

To get started, you need to have a table created and add a few entries to it.

How to do it…

Let's start with creating a pipeline that accepts the details, which can be used anytime we want to execute the export operation:

  1. Go to the AWS Data Pipeline console (https://console.aws.amazon.com/datapipeline/).

  2. Click on the Create Pipeline button. Enter the details of the data pipeline configurations in the form, as shown in the following screenshot. Here, we will add details, such as the pipeline name, description, source DynamoDB table name, target S3 folder, and so on. We select a built-in template to export the DynamoDB data to S3:

  3. Next, we need to provide the details of enabling the logging. Here, we need to provide the S3 folder location so that if there are any errors and issues, we should be able to debug them from the...

Accessing the DynamoDB data using AWS EMR


AWS Elastic MapReduce (EMR) has hosted Hadoop as a service from Amazon. As Hadoop has become one of the most important ETL/analytics tools these days, it is very important to know how to access the DynamoDB data from EMR so that we can use it for analytics. In this recipe, we are going to see how to access the DynamoDB data from EMR for analytics/querying.

Getting ready

To get started, you need to have a DynamoDB table created, and you should have data in it. Also, you need to have a secret key created, which will be used to connect to the EMR cluster using Putty or ssh on the UNIX system. In case you haven't, read the documentation at http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMR_SetUp_KeyPair.html.

This generated .pem key can be converted into a private key (.ppk), which can be used in Putty. You can refer to the following docs:

Querying the DynamoDB data using AWS EMR


In the previous recipe, we have seen how to access the DynamoDB data from AWS EMR. In this recipe, we are going to see how to query DynamoDB using AWS EMR.

Getting ready

To perform this recipe, you should have performed the earlier recipe and have your EMR cluster still running.

How to do it…

Here, we will use productHiveTable, which we created in the previous recipe. In this recipe, we will see how easy it is to query the DynamoDB data using EMR:

  1. To get started, connect to your EMR cluster and start Hive.

  2. In our e-commerce application, we would like to query the product catalogue data in various ways. With DynamoDB being a NoSQL database, we can only query on hash or range keys themselves, which sometimes makes querying difficult. Now, we can use Hive to effectively query the DynamoDB data.

  3. Let's start with our first query to count the total number of products in our DynamoDB table. For this, we need to execute the following query:

    hive> select count...

Performing join operations on the DynamoDB data using AWS EMR


In the previous recipe, we saw how to use EMR to access the DynamoDB data and query the same as well. In this recipe, we will see how to join two DynamoDB tables in order to get the combined view.

Getting ready

To perform this recipe, you should have performed the earlier recipe and should have your EMR cluster still running.

How to do it…

Here, we will use two tables: one is the Customer table, and the other one is the Orders table. The Customer table contains detailed information of the customer, while the Order table contains the details of the order, along with customerId, which provides a link between these two tables. Now we want to execute queries that need information from both tables, which cannot be achieved solely by DynamoDB, and so, we use EMR:

  1. To get started, we need to make sure that we have two tables created, as mentioned earlier. Now, we will connect to the EMR cluster, and we will create two Hive tables corresponding...

Exporting data to AWS S3 from DynamoDB using AWS EMR


In the first recipe of this chapter, we saw how to use the AWS Pipeline to export the DynamoDB data to S3. The AWS Pipeline creation and execution is easy and quick, but we have very little control on things that happen in the pipeline, so we now are going to talk about one recipe that will explain how to export the DynamoDB data to S3 using EMR.

Getting ready

To perform this recipe, you should have performed the earlier recipe and have your EMR cluster still running.

How to do it…

Let's export data to AWS S3 from DynamoDB:

  1. To perform this recipe, we need to create two tables. In the earlier recipes, we have already created productHiveTable, as shown in the following code:

    CREATE EXTERNAL TABLE productHiveTable (
    id string, type string, mnfr string, name string,
    price bigint, stock bigint)
    STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
    TBLPROPERTIES ("dynamodb.table.name" = "product","dynamodb.column.mapping" = "id:id,type...

Logging DynamoDB operations using AWS CloudTrail


This is a very simple recipe that helps us enable the logging of any DynamoDB operations using CloudTrail. CloudTrail is a global logging service by AWS, which allows us to create logs of all the events that happened on scribed services. It creates JSON documents for every operation that occurs and saves it in the provided S3 bucket.

Getting ready

To perform this recipe, you need to know how to use the DynamoDB console.

How to do it…

Let's log into DynamoDB by using AWS CloudTrail:

  1. To keep track of events happening in various AWS services, we need to first enable logging of the CloudTrail events. To do so, first login to the AWS CloudTrail console, which is available at: https://console.aws.amazon.com/cloudtrail.

  2. We enable the CloudTrail logs and also provide the S3 bucket location, where we would like to see the event to be saved:

  3. By clicking on the Advanced link, you can also enable publishing the availability of the CloudTrail events in the S3...

Exporting the DynamoDB data to AWS Redshift


AWS provides the petabyte-scale data warehouse as a service in Cloud. It provides us with SQL-like tools to perform business intelligence on virtually any size data. It is quite natural to we use DynamoDB as our application database. We would need to use the data warehouse tools to do analytics. So, in this recipe, we will see how to launch the Redshift cluster and import the DynamoDB data to it.

Getting ready

To perform this recipe, you need to know how to use the DynamoDB console. Also, follow the instructions to install the prerequisites for AWS Redshift from http://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-prereq.html.

How to do it…

To get started, we will first see how to launch the Redshift cluster, and then use the COPY command to import the DynamoDB data into it:

  1. Go to the AWS Redshift console (https://console.aws.amazon.com/redshift/):

  2. Click on the Launch Cluster button. On the next screen, you will need to enter the details for the cluster...

Importing the DynamoDB data to AWS CloudSearch


AWS CloudSearch is a search engine in AWS that allows us to perform full text searches on the documents uploaded. Many times, this functionality is helpful, as you only query hash and range keys in DynamoDB for exact searches. In this recipe, we are going to see how to integrate DynamoDB with CloudSearch so that we can search data more effectively.

Getting ready

To perform this recipe, you need to know how to use the AWS console.

How to do it…

To get started, we will first see how to launch the CloudSearch domain and import the DynamoDB table to it:

  1. Go to the AWS CloudSearch console (http://console.aws.amazon.com/cloudsearch/).

  2. Click on the Create New Domain button, which will prompt you fill in a form to include details. Here, you need to specify the domain name, number of instances, and instance types:

  3. On the next screen, you need to choose where to import the data from. Here, select Analyze sample item(s) from Amazon DynamoDB. I also need to select...

Performing a full text search on the DynamoDB data using CloudSearch


In the previous recipe, we saw how to import DynamoDB to CloudSearch. In this recipe, we will see how to perform a full text search on the same data.

Getting ready

To perform this recipe, you should have performed the earlier recipe.

How to do it…

Let's perform a full text search on the DynamoDB:

  1. CloudSearch gives us a built-in capability to perform a full text, faceted search. To get started, we need to click on the Run a Test Search link:

  2. On this screen, you will see a textbox where you can type your query, for example, I need to search a query 'Samsung'; then, I need to type this in the textbox and click on the Go button. It will search in documents and return the results.

  3. You can also perform a search on a specific attribute by expanding the Options section on same screen. For example, if I want to search only those products that are manufactured by PacktPub, I can do this in the following manner, as shown in the following...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
DynamoDB Cookbook
Published in: Sep 2015Publisher: ISBN-13: 9781784393755
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Tanmay Deshpande

Tanmay Deshpande is a Hadoop and big data evangelist. He currently works with Schlumberger as a Big Data Architect in Pune, India. He has interest in a wide range of technologies, such as Hadoop, Hive, Pig, NoSQL databases, Mahout, Sqoop, Java, cloud computing, and so on. He has vast experience in application development in various domains, such as oil and gas, finance, telecom, manufacturing, security, and retail. He enjoys solving machine-learning problems and spends his time reading anything that he can get his hands on. He has great interest in open source technologies and has been promoting them through his talks. Before Schlumberger, he worked with Symantec, Lumiata, and Infosys. Through his innovative thinking and dynamic leadership, he has successfully completed various projects. He regularly blogs on his website http://hadooptutorials.co.in. You can connect with him on LinkedIn at https://www.linkedin.com/in/deshpandetanmay/. He has also authored Mastering DynamoDB, published in August 2014, DynamoDB Cookbook, published in September 2015, Hadoop Real World Solutions Cookbook-Second Edition, published in March 2016, Hadoop: Data Processing and Modelling, published in August, 2016, and Hadoop Blueprints, published in September 2016, all by Packt Publishing.
Read more about Tanmay Deshpande