Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Free eBook - Modern Data Architectures with Python

4.6 (7 reviews total)
By Brian Lipp
  • A new free eBook every day on the latest in tech
  • 30 permanently free eBooks from our core tech library
  1. Part 1:Fundamental Data Knowledge
About this book
Modern Data Architectures with Python will teach you how to seamlessly incorporate your machine learning and data science work streams into your open data platforms. You’ll learn how to take your data and create open lakehouses that work with any technology using tried-and-true techniques, including the medallion architecture and Delta Lake. Starting with the fundamentals, this book will help you build pipelines on Databricks, an open data platform, using SQL and Python. You’ll gain an understanding of notebooks and applications written in Python using standard software engineering tools such as git, pre-commit, Jenkins, and Github. Next, you’ll delve into streaming and batch-based data processing using Apache Spark and Confluent Kafka. As you advance, you’ll learn how to deploy your resources using infrastructure as code and how to automate your workflows and code development. Since any data platform's ability to handle and work with AI and ML is a vital component, you’ll also explore the basics of ML and how to work with modern MLOps tooling. Finally, you’ll get hands-on experience with Apache Spark, one of the key data technologies in today’s market. By the end of this book, you’ll have amassed a wealth of practical and theoretical knowledge to build, manage, orchestrate, and architect your data ecosystems.
Publication date:
September 2023
Publisher
Packt
Pages
318
ISBN
9781801070492

About the Author
  • Brian Lipp

    Brian Lipp is a Technology Polyglot, Engineer, and Solution Architect with a wide skillset in many technology domains. His programming background has ranged from R, Python, and Scala, to Go and Rust development. He has worked on Big Data systems, Data Lakes, data warehouses, and backend software engineering. Brian earned a Master of Science, CSIS from Pace University in 2009. He is currently a Sr. Data Engineer working with large Tech firms to build Data Ecosystems.

    Browse publications by this author
Latest Reviews (7 reviews total)
For a book about "Data architecture" it's not really speaking about this topics but is very broad around different items, not covering in details but too broad. Probably better for juniors who may want to have some ideas.
This book is essential for those keen to enhance their grasp on data. Designed for engineers, analysts, and managers, it explores modern data platforms and informed decision-making. It covers data architecture design, insights into analytics, Apache Spark's intricacies, Spark's batch and streaming capabilities, Kafka in data pipelines, MLOps for ML/AI, data visualization techniques, Python app integration with CI tools, practical Databricks applications, data governance essentials, and setting up projects using diverse tools. A captivating read for data enthusiasts!
Embarking on a journey into the world of data, this book is like a friendly guide, showing the way to navigate the complexities of modern technology. It skillfully breaks down the intricacies of emerging data skills, offering practical insights into design methodologies like Data Mesh and data lakehouses.As you delve into the pages, the book demystifies the world of data governance, providing a deeper understanding. Through clear examples in Python, it guides readers in building data pipelines on platforms like Databricks, employing tried-and-true techniques like the medallion architecture and Delta Lake. From understanding data patterns and enhancing performance with Spark internals to exploring MLOps tools and integrating visualization into data practices, this book serves as a bridge for developers, analytics engineers, and managers looking to fortify their organization's data ecosystem. The straightforward language, coupled with practical examples, makes it an invaluable resource for those eager to navigate the data landscape.