Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Data Wrangling on AWS

You're reading from  Data Wrangling on AWS

Product type Book
Published in Jul 2023
Publisher Packt
ISBN-13 9781801810906
Pages 420 pages
Edition 1st Edition
Languages
Authors (3):
Navnit Shukla Navnit Shukla
Profile icon Navnit Shukla
Sankar M Sankar M
Profile icon Sankar M
Sampat Palani Sampat Palani
Profile icon Sampat Palani
View More author details

Table of Contents (19) Chapters

Preface Part 1:Unleashing Data Wrangling with AWS
Chapter 1: Getting Started with Data Wrangling Part 2:Data Wrangling with AWS Tools
Chapter 2: Introduction to AWS Glue DataBrew Chapter 3: Introducing AWS SDK for pandas Chapter 4: Introduction to SageMaker Data Wrangler Part 3:AWS Data Management and Analysis
Chapter 5: Working with Amazon S3 Chapter 6: Working with AWS Glue Chapter 7: Working with Athena Chapter 8: Working with QuickSight Part 4:Advanced Data Manipulation and ML Data Optimization
Chapter 9: Building an End-to-End Data-Wrangling Pipeline with AWS SDK for Pandas Chapter 10: Data Processing for Machine Learning with SageMaker Data Wrangler Part 5:Ensuring Data Lake Security and Monitoring
Chapter 11: Data Lake Security and Monitoring Index Other Books You May Enjoy

Step 2 – importing data

Before we can start importing data into SageMaker Data Wrangler, we need to create a connection with our data source. SageMaker Data Wrangler provides out-of-the-box native connectors to Amazon S3, Amazon Athena, Amazon Redshift, Snowflake, Amazon EMR, and Databricks. Besides that, you can also set up new data sources with over 40 SaaS and web applications using Amazon AppFlow, a fully managed integration service that helps you securely transfer data between software as a service (SaaS) applications. The Create connection screen shows the connectors in Data Wrangler, along with additional data sources you can set up using Amazon AppFlow.

Figure 10.5: Data Wrangler data sources

Figure 10.5: Data Wrangler data sources

In this chapter, we will use a publicly available example, the Titanic dataset. The Titanic dataset is considered the “Hello World” of machine learning datasets due to the number of commonly used data processing and machine learning techniques...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}