Reader small image

You're reading from  AWS Certified Cloud Practitioner Exam Guide

Product typeBook
Published inJan 2022
PublisherPackt
ISBN-139781801075930
Edition1st Edition
Tools
Concepts
Right arrow
Author (1)
Rajesh Daswani
Rajesh Daswani
author image
Rajesh Daswani

Rajesh Daswani is a senior solutions architect, AWS course content creator, and corporate trainer with over 20 years' experience in core IT infrastructure services and cloud computing. He has delivered corporate training programs and online training for several clients across the UK, USA, and India and published courses for Packt Publishing. Rajesh now delivers courses for the IaaS Academy, an online training provider that delivers on-demand cloud computing training and practice exam simulators to help students and IT professionals ace IT certification exams. You will also find extensive blog articles and exam tips on the IaaS Academy website to help you with your study and revision.
Read more about Rajesh Daswani

Right arrow

Chapter 11: Analytics on AWS

In this age of information, understanding your data has become extremely important. With current cutting-edge technologies, extensive amounts of data are generated every second – data that needs to be stored and analyzed. Companies perform data analytics to explain, predict, and ultimately gain a competitive advantage in business. Traditional analytics would include retail analytics, supply chain analytics, or stock rotation analytics. With machine learning (ML) and artificial intelligence taking a firm hold on the economy, new evolutions of analytics have come into play, such as cognitive analytics, fraud analytics, and speech analytics. The list is almost endless but suffice it to say that understanding your raw data has required considerable effort and a whole business unit dedicated to data analytics alone.

AWS offers a vast array of analytics tools that you can use to ingest, store, and effectively understand the data that's generated...

Technical requirements

To complete the exercises in this chapter, you will need access to your AWS account and be logged in as the IAM user Alice.

Learning about data streaming with Amazon Kinesis

To analyze your business data, you need to ingest that data into a service that can perform the required analysis on it. Businesses generate tons of data from a wide range of sources, including logs generated by applications, content such as videos, images, and documents, clickstream data from websites, IoT data, and more. Ingesting this data is the first step toward understanding it.

However, rather than ingesting all the data first and then figuring out how you would go about understanding that data, Amazon Kinesis lets you process and analyze data as it arrives and respond to it instantly. Amazon Kinesis is a fully managed service that enables you to process streaming data at any scale in a cost-effective manner. Furthermore, it is serverless, meaning that you do not need to set up and manage expensive infrastructure to process your data. Amazon Kinesis is comprised of the following four key services:

  • Amazon Kinesis Data...

Learning how to query data stored in Amazon S3 with Amazon Athena

Businesses store vast amounts of data in repositories such as Amazon S3. A lot of this data is not necessarily being hosted on regular Amazon RDS or NoSQL databases. In many cases, this is because the dataset is not being regularly updated and queried. Previously, even if you wanted to perform ad hoc queries or analysis against some of that data, you would need to ingest it into a database and then run your queries against the database.

Amazon Athena is a fully managed serverless solution that allows you to interactively query and analyze data directly in Amazon S3 using standard SQL. There is no infrastructure to provision, and you only pay for the queries you run.

Amazon Athena uses Presto, which is an open source SQL query engine that's designed to allow you to perform ad hoc analysis. You can use standard ANSI SQL, which provides full support for large joins, window functions, and arrays.

Data can...

Introduction to Amazon Elasticsearch

Elasticsearch is an open source text search and analytics engine that's capable of storing, analyzing, and performing search functions against big volumes of data in near real time. You can use Elasticsearch to analyze all types of data such as textual, numerical, geospatial, structured, and unstructured data.

Amazon's offering of Elasticsearch as a service comes as a fully managed service with no need to set up and manage any infrastructure, allowing you to focus on your applications and their functionalities. Following the same pay-as-you-consume model, there are also no upfront costs, although you can reserve instances for a 1- or 3-year term for a significant discount over the on-demand pricing model.

Amazon Elasticsearch is designed to be highly scalable and can index all types of content to help you deliver applications for use cases such as the following:

  • Website search
  • Application search
  • Logging and log analytics...

Overview of Amazon Glue and QuickSight

Business data can often be stored in a wide range of services – databases, storage buckets, spreadsheets, and more. Being able to bring all the relevant data together for analysis can sometimes be a big project. Later, you may wish to extract and present that data in a manner that is easy to digest and understand using BI tools or seamlessly integrate insights from that data into your applications, dashboards, and reporting. Two services offered by AWS that can help with these types of requirements are Amazon Glue and QuickSight. We'll take a quick look at each of these services next.

Overview of Amazon Glue

Amazon Glue is a serverless Extract, Transform, and Load (ETL) service. With Amazon Glue, you can discover, prepare, enrich, clean, and transform your data from various sources. You can then load the data into databases, data warehouses, and data lakes. Data from streaming sources can also be loaded for regular reporting...

Additional analytics services

In this section, we will take a very quick look at some other AWS analytics services that you need to be aware of. Specifically, we will look at the Elastic Map Reduce (EMR) service, CloudSearch, and Data Pipeline:

  • AWS EMR: This provides a managed Hadoop framework to enable you to process vast amounts of big data. You can use open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto. Amazon EMR comes with an integrated development environment (IDE) called EMR Studio to help you develop, visualize, and debug data engineering and data science applications written in R, Python, Scala, and PySpark. You can run your EMR workloads on EC2 Instances, Amazon Elastic Kubernetes Service (EKS) clusters, and on-premises using the AWS Outpost service. In terms of pricing, you are charged at a per-instance rate for every second used, with a 1-minute minimum charge.
  • AWS Data Pipeline: This is a web service that...

Exercise 11.1 – analyzing your sales report with Amazon Athena and AWS Glue

In this exercise, you will need to download a sample CSV file, which is available in the Packt GitHub repository for this chapter https://github.com/PacktPublishing/AWS-Certified-Cloud-Practitioner-Exam-Guide. This is a simple CSV file that contains some sales data for the Vegan Studio, the fictitious company that you have been carrying out a series of exercises for in the previous chapters.

You will need to store this CSV file in an Amazon S3 bucket and then use Amazon Athena to run queries against the data. Ensure that you have downloaded the CSV file and stored it on your computer before you start this exercise.

Step 1 – Amazon S3

  1. Log into your AWS account using the IAM user ID of our senior administrator, Alice.
  2. Navigate to the Amazon S3 dashboard.
  3. Create two new buckets with appropriate names. For example, I have named my buckets vegan-sales-report (to store the CSV...

Exercise 11.2 – cleaning up

In this exercise, you will delete the resources you created in the previous exercise to ensure that there are no unwanted costs:

  1. Navigate to the Amazon Glue console.
  2. From the left-hand menu, click the Crawlers link. In the right-hand pane, select vegan-sales-crawler. From the Actions drop-down list, click the Delete Crawler option and then confirm the delete operation.
  3. Next, from the left-hand menu, click Databases. In the right-hand pane, select the vegansalesdb database. Then, from the Actions drop-down list, click the Delete database option.
  4. Click the Delete button in the Delete Database confirmation dialog box that appears.

Next, you will need to delete the Amazon S3 buckets as they are no longer required:

  1. Navigate to the Amazon S3 console. From the left-hand menu, click on Buckets.
  2. In the right-hand pane, select the vegan-query-results bucket and then click the Empty button. Confirm that you want to empty...

Summary

In this chapter, we discussed several services from AWS that fall within the analytics category. Businesses today possess a vast array of data and being able to analyze and make sense of that data is extremely important. Information that's obtained from this data can help businesses respond to their customers' needs and demands, address potential issues, and even predict future growth. Ultimately, businesses can gain an advantage over competitors.

In this chapter, you learned about services such as Amazon Kinesis, which allows customers to stream and respond in real time and near real time to data. You also learned about services that can be used to quickly query your data, such as Amazon Athena, as well as services to help you present that data using BI tools. Most of these analytical services are also offered as fully managed services on a pay-as-you-consume pricing model, making them very affordable for almost any business.

In the next chapter, you will...

Questions

Answer the following questions to test your knowledge of this chapter:

  1. Which AWS service can help you ingest and deliver massive amounts of streaming data into Amazon Redshift for near real-time analytics?
    1. Amazon Athena
    2. Amazon Kinesis Firehose
    3. Amazon Kinesis Video Streams
    4. Amazon RDS
  2. Which AWS service can help you query streaming data using standard SQL queries in real time?
    1. Amazon Kinesis Data Streams
    2. Amazon Kinesis Data Analytics
    3. Amazon Glue
    4. Amazon QuickSight
  1. You are planning on building an application that will capture video streams from speed cameras on country roads for analysis. You need to be able to capture all the vehicles that break the speed limit and identify the offending drivers via the vehicles' license plates. Which two services on AWS can help you achieve these requirements? (Choose 2 answers.)
    1. Amazon Athena
    2. Amazon Kinesis Data Analytics
    3. Amazon Kinesis Video Streams
    4. Amazon Elasticsearch
    5. Amazon Rekognition
  2. Which AWS service enables you to...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
AWS Certified Cloud Practitioner Exam Guide
Published in: Jan 2022Publisher: PacktISBN-13: 9781801075930
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Rajesh Daswani

Rajesh Daswani is a senior solutions architect, AWS course content creator, and corporate trainer with over 20 years' experience in core IT infrastructure services and cloud computing. He has delivered corporate training programs and online training for several clients across the UK, USA, and India and published courses for Packt Publishing. Rajesh now delivers courses for the IaaS Academy, an online training provider that delivers on-demand cloud computing training and practice exam simulators to help students and IT professionals ace IT certification exams. You will also find extensive blog articles and exam tips on the IaaS Academy website to help you with your study and revision.
Read more about Rajesh Daswani