You're reading from Modern Data Architectures with Python

Product typeBook

Published inSep 2023

Reading LevelExpert

PublisherPackt

ISBN-139781801070492

Edition1st Edition

Languages

Python

Concepts

Data Science

Author (1)

Brian Lipp

Preface

Hello! Data platforms are popping up everywhere, but only some cars in the shop are the same. We are at the dawn of seeing most data stored not in company-owned data centers but, instead, in the cloud. Cloud storage is exceptionally cheap, and this abundance of cheap storage drives our choices. Cloud storage is cheap, and cloud processing is often significantly more affordable than adequately housing computers in a data center. With this increase in cheap, flexible cloud capability comes the flexibility to have elasticity – the ability to grow and shrink as needed. Virtual compute engines do not run directly on physical machines but, instead, run in abstractions called containers, allowing for temporary use. You no longer need to pay for expensive deep-learning hardware. The cloud can give you quick access at a fraction of the cost.

The next step in this evolution was putting together stacks of technology that played well into what was called a data platform. This was often riddled with incompatible technologies being forced to work together, many times requiring duct tape to get everything to work together. As time went on, a better choice appeared.

With the advent of open technologies to process data such as Apache Spark, we started to see a different path altogether. People began to ask fundamental questions.

What types of data does your platform fully support? It became increasingly important that your data platform equally supports semi-structured and structured data. What kinds of analysis and ML does your platform support? We started wanting to create, train, and deploy AI and ML on our data platforms using modern tooling stacks. The analysis must be available in various languages and tooling options, not just a traditional JDBC SQL path. How well does it support streaming? Streaming data has become more and more the norm in many companies. With it comes a significant jump in complexity. A system built to process, store, and work with streaming platforms is critical for many. Is your platform using only open standards? Open standards might seem like an afterthought, but being able to swap out aged technologies without the forced lift and shift migrations can be a significant cost saver. Open standards allow for various technologies to work together without any effort, which is a stark contrast to many closed data systems. This book will serve as a guide into all the questions and show you have to work with data platforms efficiently.

Who this book is for

Data is present in every business and working environment. People are constantly trying to understand and use data better.

This book has three different intended readerships:

Engineers: Engineers building data products and infrastructure can benefit from understanding how to build modern open data platforms
Analysts: Analysts who want to understand data better and use it to make critical decisions will benefit from understanding how to better interact with it
Managers: Decision makers who write the checks and consume data often need to understand data platforms from a high level better, which is incredibly important

What this book covers

Chapter 1, Modern Data Processing Architecture, provides a significant introduction to designing data architecture and understanding the types of data processing engines.

Chapter 2, Understanding Data Analytics, provides an overview of the world of data analytics and modeling for various data types.

Chapter 3, Apache Spark Deep Dive, provides a thorough understanding of how Apache Spark works and the background knowledge needed to write Spark code.

Chapter 4, Batch and Stream Processing with Apache Spark, provides a solid foundation to work with Spark for batch workloads and structured streaming data pipelines.

Chapter 5, Streaming Data with Kafka, provides a hands-on introduction to Kafka and its uses in data pipelines, including Kafka Connect and Apache Spark.

Chapter 6, MLOps , provides an engineer with all the needed background and hands-on knowledge to develop, train, and deploy ML/AI models using the latest tooling.

Chapter 7, Data and Information Visualization, explains how to develop ad hoc data visualization and common dashboards in your data platform.

Chapter 8, Integrating Continuous Integration into Your Workflow, delves deep into how to build Python applications in a CI workflow using GitHub, Jenkins, and Databricks.

Chapter 9, Orchestrating Your Data Workflows, gives practical hands-on experience with Databricks workflows that transfer to other orchestration tools.

Chapter 10, Data Governance, explores controlling access to data and dealing with data quality issues.

Chapter 11, Building Out the Ground Work, establishes a foundation for our project using GitHub, Python, Terraform, and PyPi among others.

Chapter 12, Completing Our Project, completes our project, building out GitHub actions, Pre-commit, design diagrams, and lots of Python.

To get the most out of this book

A fundamental knowledge of Python is strongly suggested.

Software/hardware covered in the book	OS requirements
Databricks	Windows, macOS, or Linux
Kafka
Apache Spark

If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Modern-Data-Architectures-with-Python. If there’s an update to the code, it will be updated in the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.”

A block of code is set as follows:

validator.expect_column_values_to_not_be_null(column="name")
validator.expect_column_values_to_be_between(
    column="age", min_value=0, max_value=100
)

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

adapter = HTTPAdapter(max_retries=restries)

Any command-line input or output is written as follows:

databricks fs ls

Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: “Here we have the main page for workflows; to create a new workflow, there is a Create job button at the top left.”

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, select your book, click on the Errata Submission Form link, and enter the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packtpub.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/9781801070492

Submit your proof of purchase
That’s it! We’ll send your free PDF and other benefits to your email directly

The rest of the chapter is locked

You have been reading a chapter from

Modern Data Architectures with Python

Published in: Sep 2023Publisher: PacktISBN-13: 9781801070492

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Brian Lipp

Brian Lipp is a Technology Polyglot, Engineer, and Solution Architect with a wide skillset in many technology domains. His programming background has ranged from R, Python, and Scala, to Go and Rust development. He has worked on Big Data systems, Data Lakes, data warehouses, and backend software engineering. Brian earned a Master of Science, CSIS from Pace University in 2009. He is currently a Sr. Data Engineer working with large Tech firms to build Data Ecosystems.
Read more about Brian Lipp

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Modern Data Architectures with Python

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook