Reader small image

You're reading from  Data Engineering with AWS - Second Edition

Product typeBook
Published inOct 2023
PublisherPackt
ISBN-139781804614426
Edition2nd Edition
Right arrow
Author (1)
Gareth Eagar
Gareth Eagar
author image
Gareth Eagar

Gareth Eagar has over 25 years of experience in the IT industry, starting in South Africa, working in the United Kingdom for a while, and now based in the USA. Having worked at AWS since 2017, Gareth has broad experience with a variety of AWS services, and deep expertise around building data platforms on AWS. While Gareth currently works as a Solutions Architect, he has also worked in AWS Professional Services, helping architect and implement data platforms for global customers. Gareth frequently speaks on data related topics.
Read more about Gareth Eagar

Right arrow

Data Management Architectures for Analytics

In Chapter 1, An Introduction to Data Engineering, we looked at the challenges introduced by ever-growing datasets, and how the cloud can help solve these analytic challenges. However, there are many different cloud services, open-source frameworks, file formats, and architectures that can be used in analytic projects, depending on the business requirements and objectives.

In this chapter, we will discuss how analytical technologies have evolved and introduce the key technologies and concepts that are foundational for building modern analytical architectures, irrespective of whether you build them on Amazon Web Services (AWS) or elsewhere.

The content in this chapter lays an important foundation, as it will provide an introduction to the concepts that we will build on in the rest of the book.

In this chapter, we will cover the following topics:

  • The evolution of data management for analytics
  • A deeper dive into...

Technical requirements

To complete the hands-on exercises included in this chapter, you will need access to an AWS account in which you have administrator privileges (as covered in Chapter 1, An Introduction to Data Engineering).

You can find the code and other content related to this chapter in the GitHub repository at the following link: https://github.com/PacktPublishing/Data-Engineering-with-AWS-2nd-edition/tree/main/Chapter02.

Join our book community on Discord

https://packt.link/EarlyAccessCommunity

Qr code Description automatically generated

In Chapter 1, An Introduction to Data Engineering, we looked at the challenges introduced by ever-growing data sets, and how the cloud can help solve these analytic challenges. However, there are many different cloud services, open-source frameworks, file formats and architectures that can be used in analytic projects, depending on business requirements and objectives. In this chapter, we will discuss how analytical technologies have evolved and introduce the key technologies and concepts that are foundational for building modern analytical architectures, irrespective of whether you build them on AWS or elsewhere.The content in this chapter lays an important foundation, as it will provide an introduction to concepts that we will build on in the rest of the book.In this chapter, we will cover the following topics:

  • The evolution of data management for analytics
  • A deeper dive into data warehouse concepts and architecture...

Technical requirements

To complete the hands-on exercises included in this chapter, you will need access to an AWS account where you have administrator privileges (as covered in Chapter 1, An Introduction to Data Engineering).You can find code and other content related to this chapter in the GitHub repository using the following link: https://github.com/PacktPublishing/Data-Engineering-with-AWS/tree/main/Chapter02

The evolution of data management for analytics

Innovations in data management and processing over the last several decades have laid the foundations of modern-day analytic systems. When you look at the analytics landscape of a typical mature organization, you will find the footprints of many of these innovations in their data analytics platforms. As a data engineer, you may come across analytic pipelines that were built using technologies from different generations, and you may be expected to understand them. Therefore, it is important to be familiar with some of the key developments in analytics over time, as well as to understand the foundational concepts of analytical data storage, data management, and data pipelines.

Databases and data warehouses

Data processing and analytic systems have evolved over several decades. In the 1980s, the focus was on batch processing, where data would be processed in nightly runs on large mainframe computers.In the 1990s, the use of databases exploded...

A deeper dive into data warehouse concepts and architecture

An Enterprise Data Warehouse (EDW) is the central data repository that contains structured, curated, consistent, and trusted data assets that are organized into a well-modeled schema. The data assets in an EDW are made up of all the relevant information about key business domains and are built by integrating data sourced from the following places:

  • Run-the-business transactional applications (ERPs, CRMs, Line of Business applications) that support all the key business domains across the enterprise.
  • External data sources such as data from partners and third parties.

An enterprise data warehouse provides business users and decision-makers with an easy-to-use, central platform that helps them find and analyze a well-modeled, well-integrated, single version of truth about various business subject areas such as customer, product, sales, marketing, supply chain, and more. Business users analyze data in the warehouse to measure business...

Bringing together the best of data warehouses and data lakes

In today’s highly digitized world, data about customers, products, operations and the supply chain can come from many sources, and can have a diverse set of structures. To gain deeper and more complete data driven insights into a business topic (such as customer journey, customer retention, product performance, etc.), organizations need to analyze all topic relevant data, of all structures, from all sources, together. A data lake is well suited to storing all these different types of data inexpensively, and provides a wide variety of tools to work with and consume the data. This includes the ability to transform data with frameworks such as Apache Spark, to train machine learning models on the data using tools such as Amazon Sagemaker, and to query the data using SQL with tools such as Amazon Athena, Presto or Trino. However, there are some limitations with traditional data lakes. For example, traditional implementations...

Hands-on – Using the AWS Command Line Interface (CLI) to create S3 buckets

In Chapter 1, An Introduction to Data Engineering, you created an AWS account and an AWS administrative user, and then ensured you could access your account. Console access allows you to access AWS services and perform most functions, however it can also be useful to interact with AWS services via the Command Line Interface (CLI) at times. In this hands-on section, you learn how to access the AWS CLI, and then use the CLI to create Amazon S3 buckets (a storage container in the Amazon S3 service).

Accessing the AWS CLI

The AWS CLI can be installed on your personal computer / laptop, or can be accessed from the AWS Console. To access the CLI on your personal computer, you need to generate a set of access keys.Your access keys consist of an Access Key ID (which is comparable to a user name), and a Secret Access Key (which is comparable to a password). With these two pieces of information, you can authenticate...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Engineering with AWS - Second Edition
Published in: Oct 2023Publisher: PacktISBN-13: 9781804614426
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Gareth Eagar

Gareth Eagar has over 25 years of experience in the IT industry, starting in South Africa, working in the United Kingdom for a while, and now based in the USA. Having worked at AWS since 2017, Gareth has broad experience with a variety of AWS services, and deep expertise around building data platforms on AWS. While Gareth currently works as a Solutions Architect, he has also worked in AWS Professional Services, helping architect and implement data platforms for global customers. Gareth frequently speaks on data related topics.
Read more about Gareth Eagar