Reader small image

You're reading from  Machine Learning Engineering on AWS

Product typeBook
Published inOct 2022
PublisherPackt
ISBN-139781803247595
Edition1st Edition
Tools
Right arrow
Author (1)
Joshua Arvin Lat
Joshua Arvin Lat
author image
Joshua Arvin Lat

Joshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO for three Australian-owned companies and as director of software development and engineering for multiple e-commerce start-ups in the past. Years ago, he and his team won first place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and has shared his knowledge at several international conferences, discussing practical strategies on machine learning, engineering, security, and management.
Read more about Joshua Arvin Lat

Right arrow

Setting up Lake Formation

Now, it’s time to take a closer look at setting up our serverless data lake on AWS! Before we begin, let’s define what a data lake is and what type of data is stored in it. A data lake is a centralized data store that contains a variety of structured, semi-structured, and unstructured data from different data sources. As shown in the following diagram, data can be stored in a data lake without us having to worry about the structure and format. We can use a variety of file types such as JSON, CSV, and Apache Parquet when storing data in a data lake. In addition to these, data lakes may include both raw and processed (clean) data:

Figure 4.26 – Getting started with data lakes

ML engineers and data scientists can use data lakes as the source of the data used for building and training ML models. Since the data stored in data lakes may be a mixture of both raw and clean data, additional data processing, data cleaning...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Machine Learning Engineering on AWS
Published in: Oct 2022Publisher: PacktISBN-13: 9781803247595

Author (1)

author image
Joshua Arvin Lat

Joshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO for three Australian-owned companies and as director of software development and engineering for multiple e-commerce start-ups in the past. Years ago, he and his team won first place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and has shared his knowledge at several international conferences, discussing practical strategies on machine learning, engineering, security, and management.
Read more about Joshua Arvin Lat