You're reading from Data Ingestion with Python Cookbook

Product typeBook

Published inMay 2023

PublisherPackt

ISBN-139781837632602

Edition1st Edition

Concepts

Data Engineering

Author (1)

Gláucia Esppenchutz

What this book covers

Chapter 1, Introduction to Data Ingestion, introduces you to data ingestion best practices and the challenges of working with diverse data sources. It explains the importance of the tools covered in the book, presents them, and provides installation instructions.

Chapter 2, Data Access Principals – Accessing your Data, explores data access concepts related to data governance, covering workflows and management of familiar sources such as SFTP servers, APIs, and cloud providers. It also provides examples of creating data access policies in databases, data warehouses, and the cloud.

Chapter 3, Data Discovery – Understanding Our Data Before Ingesting It, teaches you the significance of carrying out the data discovery process before data ingestion. It covers manual discovery, documentation, and using an open-source tool, OpenMetadata, for local configuration.

Chapter 4, Reading CSV and JSON Files and Solving Problems, introduces you to ingesting CSV and JSON files using Python and PySpark. It demonstrates handling varying data volumes and infrastructures while addressing common challenges and providing solutions.

Chapter 5, Ingesting Data from Structured and Unstructured Databases, covers fundamental concepts of relational and non-relational databases, including everyday use cases. You will learn how to read and handle data from these models, understand vital considerations, and troubleshoot potential errors.

Chapter 6, Using PySpark with Defined and Non-Defined Schemas, delves deeper into common PySpark use cases, focusing on handling defined and non-defined schemas. It also explores reading and understanding complex logs from Spark (PySpark core) and formatting techniques for easier debugging.

Chapter 7, Ingesting Analytical Data, introduces you to analytical data and common formats for reading and writing. It explores reading partitioned data for improved performance and discusses Reverse ETL theory with real-life application workflows and diagrams.

Chapter 8, Designing Monitored Data Workﬂows, covers logging best practices for data ingestion, facilitating error identification, and debugging. Techniques such as monitoring file size, row count, and object count enable improved monitoring of dashboards, alerts, and insights.

Chapter 9, Putting Everything Together with Airﬂow, consolidates the previously presented information and guides you in building a real-life data ingestion application using Airflow. It covers essential components, configuration, and issue resolution in the process.

Chapter 10, Logging and Monitoring Your Data Ingest in Airflow, explores advanced logging and monitoring in data ingestion with Airflow. It covers creating custom operators, setting up notifications, and monitoring for data anomalies. Configuration of notifications for tools such as Slack is also covered to stay updated on the data ingestion process.

Chapter 11, Automating Your Data Ingestion Pipelines, focuses on automating data ingests using previously learned best practices, enabling reader autonomy. It addresses common challenges with schedulers or orchestration tools and provides solutions to avoid problems in production clusters.

Chapter 12, Using Data Observability for Debugging, Error Handling, and Preventing Downtime, explores data observability concepts, popular monitoring tools such as Grafana, and best practices for log storage and data lineage. It also covers creating visualization graphs to monitor data source issues using Airflow configuration and data ingestion scripts.

The rest of the page is locked

You have been reading a chapter from

Data Ingestion with Python Cookbook

Published in: May 2023Publisher: PacktISBN-13: 9781837632602

Author (1)

Gláucia Esppenchutz

Gláucia Esppenchutz is a data engineer with expertise in managing data pipelines and vast amounts of data using cloud and on-premises technologies. She worked in companies such as Globo, BMW Group, and Cloudera. Currently, she works at AiFi, specializing in the field of data operations for autonomous systems. She comes from the biomedical field and shifted her career ten years ago to chase the dream of working closely with technology and data. She is in constant contact with the open source community, mentoring people and helping to manage projects, and has collaborated with the Apache, PyLadies group, FreeCodeCamp, Udacity, and MentorColor communities.
Read more about Gláucia Esppenchutz

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Data Ingestion with Python Cookbook

What this book covers

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook