Reader small image

You're reading from  Distributed Data Systems with Azure Databricks

Product typeBook
Published inMay 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781838647216
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Alan Bernardo Palacio
Alan Bernardo Palacio
author image
Alan Bernardo Palacio

Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder in startups, and later on earned a Master's degree from the faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.
Read more about Alan Bernardo Palacio

Right arrow

Summary

In this chapter, we have tried to cover all the main aspects of how Azure Databricks works. Some of the things we have discovered include how notebooks can be created to execute code, how we can import data to use, how to create and manage clusters, and so on. This is important because when creating ETLs and ML experiments in Azure Databricks within an organization, aside from how to code the ETL in our notebooks, we will need to know how to manage the data and computational resources required, how to share assets, and how to manage the permissions of each one of them.

In the next chapter, we will apply this knowledge to explore in more detail how to create and manage the resources needed to work with data in Azure Databricks, and learn more about custom VNets and the different alternatives that we have in order to interact with them, either through the Azure Databricks UI or the CLI tool.

Previous PageNext Chapter
You have been reading a chapter from
Distributed Data Systems with Azure Databricks
Published in: May 2021Publisher: PacktISBN-13: 9781838647216
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alan Bernardo Palacio

Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder in startups, and later on earned a Master's degree from the faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.
Read more about Alan Bernardo Palacio