Reader small image

You're reading from  Building ETL Pipelines with Python

Product typeBook
Published inSep 2023
PublisherPackt
ISBN-139781804615256
Edition1st Edition
Right arrow
Authors (2):
Brij Kishore Pandey
Brij Kishore Pandey
author image
Brij Kishore Pandey

Brij Kishore Pandey stands as a testament to dedication, innovation, and mastery in the vast domains of software engineering, data engineering, machine learning, and architectural design. His illustrious career, spanning over 14 years, has seen him wear multiple hats, transitioning seamlessly between roles and consistently pushing the boundaries of technological advancement. He has a degree in electrical and electronics engineering. His work history includes the likes of JP Morgan Chase, American Express, 3M Company, Alaska Airlines, and Cigna Healthcare. He is currently working as a principal software engineer at Automatic Data Processing Inc. (ADP). Originally from India, he resides in Parsippany, New Jersey, with his wife and daughter.
Read more about Brij Kishore Pandey

Emily Ro Schoof
Emily Ro Schoof
author image
Emily Ro Schoof

Emily Ro Schoof is a dedicated data specialist with a global perspective, showcasing her expertise as a data scientist and data engineer on both national and international platforms. Drawing from a background rooted in healthcare and experimental design, she brings a unique perspective of expertise to her data analytic roles. Emily's multifaceted career ranges from working with UNICEF to design automated forecasting algorithms to identify conflict anomalies using near real-time media monitoring to serving as a subject matter expert for General Assembly's Data Engineering course content and design. Her mission is to empower individuals to leverage data for positive impact. Emily holds the strong belief that providing easy access to resources that merge theory and real-world applications is the essential first step in this process.
Read more about Emily Ro Schoof

View More author details
Right arrow

Checkpoint for recovery

A robust ETL pipeline is not just about moving data from point A to point B efficiently; it’s also about ensuring that the pipeline can recover gracefully from failures and ensure data integrity throughout the process. To accomplish this, effective checkpointing needs to be incorporated with logging practices.

A “checkpoint” in the ETL process is a point in the data flow where key data cleansing and transformation processes “bookmark” the output data after each manipulation is stored in a temporary location. Thus, in the event of a failure, once the precise point of failure is identified, you can restart the ETL process from the last successful checkpoint, instead of from the beginning. This approach not only saves time and computational resources but also helps maintain data integrity by reducing the risk of duplicate or missed data. Using the same logging instance we defined earlier in this chapter, we can apply the same...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Building ETL Pipelines with Python
Published in: Sep 2023Publisher: PacktISBN-13: 9781804615256

Authors (2)

author image
Brij Kishore Pandey

Brij Kishore Pandey stands as a testament to dedication, innovation, and mastery in the vast domains of software engineering, data engineering, machine learning, and architectural design. His illustrious career, spanning over 14 years, has seen him wear multiple hats, transitioning seamlessly between roles and consistently pushing the boundaries of technological advancement. He has a degree in electrical and electronics engineering. His work history includes the likes of JP Morgan Chase, American Express, 3M Company, Alaska Airlines, and Cigna Healthcare. He is currently working as a principal software engineer at Automatic Data Processing Inc. (ADP). Originally from India, he resides in Parsippany, New Jersey, with his wife and daughter.
Read more about Brij Kishore Pandey

author image
Emily Ro Schoof

Emily Ro Schoof is a dedicated data specialist with a global perspective, showcasing her expertise as a data scientist and data engineer on both national and international platforms. Drawing from a background rooted in healthcare and experimental design, she brings a unique perspective of expertise to her data analytic roles. Emily's multifaceted career ranges from working with UNICEF to design automated forecasting algorithms to identify conflict anomalies using near real-time media monitoring to serving as a subject matter expert for General Assembly's Data Engineering course content and design. Her mission is to empower individuals to leverage data for positive impact. Emily holds the strong belief that providing easy access to resources that merge theory and real-world applications is the essential first step in this process.
Read more about Emily Ro Schoof