You're reading from Data Engineering with AWS - Second Edition

Product typeBook

Published inOct 2023

PublisherPackt

ISBN-139781804614426

Edition2nd Edition

Concepts

Data Engineering

Author (1)

Gareth Eagar

Data Management Architectures for Analytics

In this chapter, we will discuss how analytical technologies have evolved and introduce the key technologies and concepts that are foundational for building modern analytical architectures, irrespective of whether you build them on Amazon Web Services (AWS) or elsewhere.

The content in this chapter lays an important foundation, as it will provide an introduction to the concepts that we will build on in the rest of the book.

In this chapter, we will cover the following topics:

The evolution of data management for analytics
A deeper dive into...

Technical requirements

To complete the hands-on exercises included in this chapter, you will need access to an AWS account in which you have administrator privileges (as covered in Chapter 1, An Introduction to Data Engineering).

You can find the code and other content related to this chapter in the GitHub repository at the following link: https://github.com/PacktPublishing/Data-Engineering-with-AWS-2nd-edition/tree/main/Chapter02.

Join our book community on Discord

https://packt.link/EarlyAccessCommunity

Qr code Description automatically generated

In Chapter 1, An Introduction to Data Engineering, we looked at the challenges introduced by ever-growing data sets, and how the cloud can help solve these analytic challenges. However, there are many different cloud services, open-source frameworks, file formats and architectures that can be used in analytic projects, depending on business requirements and objectives. In this chapter, we will discuss how analytical technologies have evolved and introduce the key technologies and concepts that are foundational for building modern analytical architectures, irrespective of whether you build them on AWS or elsewhere.The content in this chapter lays an important foundation, as it will provide an introduction to concepts that we will build on in the rest of the book.In this chapter, we will cover the following topics:

The evolution of data management for analytics
A deeper dive into data warehouse concepts and architecture...

Technical requirements

To complete the hands-on exercises included in this chapter, you will need access to an AWS account where you have administrator privileges (as covered in Chapter 1, An Introduction to Data Engineering).You can find code and other content related to this chapter in the GitHub repository using the following link: https://github.com/PacktPublishing/Data-Engineering-with-AWS/tree/main/Chapter02

The evolution of data management for analytics

Innovations in data management and processing over the last several decades have laid the foundations of modern-day analytic systems. When you look at the analytics landscape of a typical mature organization, you will find the footprints of many of these innovations in their data analytics platforms. As a data engineer, you may come across analytic pipelines that were built using technologies from different generations, and you may be expected to understand them. Therefore, it is important to be familiar with some of the key developments in analytics over time, as well as to understand the foundational concepts of analytical data storage, data management, and data pipelines.

Databases and data warehouses

Data processing and analytic systems have evolved over several decades. In the 1980s, the focus was on batch processing, where data would be processed in nightly runs on large mainframe computers.In the 1990s, the use of databases exploded...

A deeper dive into data warehouse concepts and architecture

An Enterprise Data Warehouse (EDW) is the central data repository that contains structured, curated, consistent, and trusted data assets that are organized into a well-modeled schema. The data assets in an EDW are made up of all the relevant information about key business domains and are built by integrating data sourced from the following places:

Run-the-business transactional applications (ERPs, CRMs, Line of Business applications) that support all the key business domains across the enterprise.
External data sources such as data from partners and third parties.

An enterprise data warehouse provides business users and decision-makers with an easy-to-use, central platform that helps them find and analyze a well-modeled, well-integrated, single version of truth about various business subject areas such as customer, product, sales, marketing, supply chain, and more. Business users analyze data in the warehouse to measure business...

Bringing together the best of data warehouses and data lakes

In today’s highly digitized world, data about customers, products, operations and the supply chain can come from many sources, and can have a diverse set of structures. To gain deeper and more complete data driven insights into a business topic (such as customer journey, customer retention, product performance, etc.), organizations need to analyze all topic relevant data, of all structures, from all sources, together. A data lake is well suited to storing all these different types of data inexpensively, and provides a wide variety of tools to work with and consume the data. This includes the ability to transform data with frameworks such as Apache Spark, to train machine learning models on the data using tools such as Amazon Sagemaker, and to query the data using SQL with tools such as Amazon Athena, Presto or Trino. However, there are some limitations with traditional data lakes. For example, traditional implementations...

Hands-on – Using the AWS Command Line Interface (CLI) to create S3 buckets

In Chapter 1, An Introduction to Data Engineering, you created an AWS account and an AWS administrative user, and then ensured you could access your account. Console access allows you to access AWS services and perform most functions, however it can also be useful to interact with AWS services via the Command Line Interface (CLI) at times. In this hands-on section, you learn how to access the AWS CLI, and then use the CLI to create Amazon S3 buckets (a storage container in the Amazon S3 service).

Accessing the AWS CLI

The AWS CLI can be installed on your personal computer / laptop, or can be accessed from the AWS Console. To access the CLI on your personal computer, you need to generate a set of access keys.Your access keys consist of an Access Key ID (which is comparable to a user name), and a Secret Access Key (which is comparable to a password). With these two pieces of information, you can authenticate...

The rest of the chapter is locked

You have been reading a chapter from

Data Engineering with AWS - Second Edition

Published in: Oct 2023Publisher: PacktISBN-13: 9781804614426

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Gareth Eagar

Gareth Eagar has over 25 years of experience in the IT industry, starting in South Africa, working in the United Kingdom for a while, and now based in the USA. Having worked at AWS since 2017, Gareth has broad experience with a variety of AWS services, and deep expertise around building data platforms on AWS. While Gareth currently works as a Solutions Architect, he has also worked in AWS Professional Services, helping architect and implement data platforms for global customers. Gareth frequently speaks on data related topics.
Read more about Gareth Eagar

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages