You're reading from Building ETL Pipelines with Python

Product typeBook

Published inSep 2023

PublisherPackt

ISBN-139781804615256

Edition1st Edition

Concepts

Data Streaming

Authors (2):

Brij Kishore Pandey

Emily Ro Schoof

View More author details

Loading Transformed Data

After data has undergone processing and transformation within an ETL pipeline, the final step involves transferring it to its designated final location. The type of output location that’s used is determined by both the data’s specific utility and the tools available within your environment.

It is most common to store output data iterations within structured, relational databases. Such databases offer an easily accessible format conducive to both analytical exploration and forthcoming modifications.

In this chapter, we’ll get more acquainted with the “L” of ETL pipelines and the best practices for designing efficient load operations. We will discuss how to optimize load activities to fit the output data using either incremental or full data loading activities. Finally, we’ll provide a comprehensive walkthrough for creating a relational database management system (RDBMS) on your local device that you can utilize...

Technical requirements

To effectively utilize the resources and code examples provided in this chapter, ensure that your system meets the following technical requirements:

Software requirements:
- Integrated development environment (IDE): We recommend using PyCharm as the preferred IDE for working with Python, and we might make specific references to PyCharm throughout this chapter. However, you are free to use any Python-compatible IDE of your choice.
- Jupyter Notebooks should be installed.
- Python version 3.6 or higher should be installed.
- Pipenv should be installed to manage dependencies.
GitHub repository: The associated code and resources for this chapter can be found in this book’s GitHub repository at https://github.com/PacktPublishing/Building-ETL-Pipelines-with-Python. Fork and clone the repository to your local machine.

Introduction to data loading

Data loading is the final step of the ETL process, and arguably the entire purpose of the data pipeline. The loading phase of your ETL process requires careful preparation to ensure the transformed data is seamlessly transitioned to its destination.

Choosing the load destination

The choice of destination greatly impacts data accessibility, storage, querying capabilities, and overall system performance. Depending on the nature of your project, you might be loading data into relational databases, cloud-based data warehouses, NoSQL stores, or other repositories. Understanding the target system’s requirements and capabilities is a foundational step to designing an efficient loading strategy. Consider factors such as data types, indexing, partitioning, and data distribution.

Python’s adaptability ensures that, regardless of your destination choice, you have the tools at hand to integrate, manipulate, and optimize data loading processes...

Best practices for data loading

There isn’t one universally definitive approach to creating data pipeline loading activities, but some methods are more effective than others. Proper preparation and adherence to best practices empower you to navigate the data loading phase with confidence, optimizing efficiency, accuracy, and reliability in your ETL workflow.

The process of designing a data loading activity reflects the level of understanding you have of the full environmental conditions of your system. You can use the following three principles to design a data loading workflow that is both scalable and reusable:

Utilizing techniques such as bulk loading, parallel processing, and optimized SQL queries can significantly enhance loading performance for large datasets. By adopting scalable strategies, you ensure that your data loading solution remains efficient and responsive even as data volume increases.
Automation streamlines the loading process, reducing the...

Optimizing data loading activities by controlling the data import method

From the aforementioned list, this chapter specifically focuses on utilizing different loading strategies to design efficient loading activities. Later in this book, we will dive deeper into automation Python modules and cloud resources (Chapters 8 and 9), and monitoring ETL pipelines (Chapter 14). But for now, let’s focus on the two most common methods for data loads: full and incremental.

Creating demo data

We will utilize Python’s sqlite3 database to walk through a demo of how full and incremental data loads can be formed using Python. From the chapter_06/ directory in your PyCharm environment, open the Loading_Transformed_Data.ipynb file by initiating Jupyter in your PyCharm terminal by running the following command:

(venv) (base) usr@usr-MBP chapter_06  % jupyter notebook

Verify that the following code is in your Jupyter notebook:

# import modulesimport sqlite3
# demo...

Precautions to consider

Back in Chapter 2, we referenced that there is a wide range of purposes for data pipelines, ranging from daily updates for business analytics dashboards to cyclical long-term storage. Since many organizations make decisions based on the resulting output data, not only is the accuracy of data transformations crucial, but the resulting format and quality of the data loaded need to remain cohesive with the data that already exists within the target location.

In a clean, reproducible, and scalable data ecosystem, the target data output location maintains its own, arguably authoritative, structure that serves as the ground truth for business data within your company. It requires you to scrutinously manage the ongoing ETL processes that keep the storage environment up to date. When discussing the differences between full and incremental data loads in the previous section, it became clear that there is a need to distinguish between new, freshly curated data and...

Tutorial – preparing your local environment for data loading activities

In this section, we will use a local PostgreSQL database as the data load destination; we will do the same for many of the examples throughout this book. PostgreSQL is a free and open source RDBMS and supports SQL compliance. You can learn more about PostgreSQL at https://en.wikipedia.org/wiki/PostgreSQL.

Downloading and installing PostgreSQL

Depending on your device, select the respective download link in your browser (https://www.enterprisedb.com/downloads/postgres-postgresql-downloads) and follow these steps:

Download the postgres.app installer.
Click on the .dmg file and double-click the box to initiate the download.
Once the files are downloaded, the EDB PostgreSQL Setup Wizard will begin. Make sure you choose the following criteria during your installation:

Summary

In this chapter, we introduced the premise of ETL pipeline data loading activities, as well as some of the contingencies of designing these activities correctly. We walked through the essential steps of setting up data storage destinations and structuring schemas in anticipation of the resultant data from our pipeline. We also introduced a blend of Python capabilities to fully or incrementally load data using SQLite. Lastly, we set up our local environment with PostgreSQL, which we will use as our data loading output location for the remainder of this book. In the next chapter, we will guide you through the entire process of creating a fully operational data pipeline.

The rest of the chapter is locked
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime
Start free trial
Previous Chapter
Chapter 9 of 22
Next Chapter
You have been reading a chapter from
Building ETL Pipelines with Python
Published in: Sep 2023Publisher: PacktISBN-13: 9781804615256
© 2023 Packt Publishing Limited All Rights Reserved
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Sign up now
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Start free trial
Renews at $15.99/month. Cancel anytime
Authors (2)
Brij Kishore Pandey
Brij Kishore Pandey stands as a testament to dedication, innovation, and mastery in the vast domains of software engineering, data engineering, machine learning, and architectural design. His illustrious career, spanning over 14 years, has seen him wear multiple hats, transitioning seamlessly between roles and consistently pushing the boundaries of technological advancement. He has a degree in electrical and electronics engineering. His work history includes the likes of JP Morgan Chase, American Express, 3M Company, Alaska Airlines, and Cigna Healthcare. He is currently working as a principal software engineer at Automatic Data Processing Inc. (ADP). Originally from India, he resides in Parsippany, New Jersey, with his wife and daughter.
Read more about Brij Kishore Pandey
See other products by Brij Kishore Pandey
Emily Ro Schoof
Emily Ro Schoof is a dedicated data specialist with a global perspective, showcasing her expertise as a data scientist and data engineer on both national and international platforms. Drawing from a background rooted in healthcare and experimental design, she brings a unique perspective of expertise to her data analytic roles. Emily's multifaceted career ranges from working with UNICEF to design automated forecasting algorithms to identify conflict anomalies using near real-time media monitoring to serving as a subject matter expert for General Assembly's Data Engineering course content and design. Her mission is to empower individuals to leverage data for positive impact. Emily holds the strong belief that providing easy access to resources that merge theory and real-world applications is the essential first step in this process.
Read more about Emily Ro Schoof
See other products by Emily Ro Schoof
Personalised recommendations for you
Based on your interests and search pattern
Et al.
Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.
BookAug 2023230 pages5
Generative AI with LangChain
This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.
BookDec 2023360 pages4
Generative AI with LangChain
This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.
BookDec 2023360 pages5
Generative AI with LangChain
This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.
BookDec 2023360 pages1
Generative AI with LangChain
This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.
BookDec 2023360 pages5
Mastering Tableau 2023
This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.
BookAug 2023684 pages
Building AI Applications with ChatGPT APIs
This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.
BookSep 2023258 pages5
Building AI Applications with ChatGPT APIs
This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.
BookSep 2023258 pages2
Data Engineering with AWS
Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.
BookOct 2023636 pages5
Modern Data Architecture on AWS
Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.
BookAug 2023420 pages5
Practical Guide to Applied Conformal Prediction in Python
Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.
BookDec 2023240 pages
TinyML Cookbook
With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.
BookNov 2023664 pages
GO TO TOP
Machine Learning Kubernetes React Python Cybersecurity Unity Game Development Power BI Azure Web Development
Machine Learning
Kubernetes
React
Python
Cybersecurity
Unity Game Development
Power BI
Azure
Web Development
Legal
Terms and Conditions Privacy Policy Cookie Policy Shipping Policy Return Policy
Support
Help Contact Us
Business
Partnerships
Sponsored eBooks
Custom eBooks
Careers Become an author
Packt+ Membership
Subscription DataPro SecPro
Packt Subscription Packt NewsLetter
Legal
Terms and Conditions
Privacy Policy
Cookie Policy
Shipping Policy
Return Policy
Support
Help
Contact Us
Business
Packt Subscription
Packt NewsLetter
Company Address: Packt Publishing Ltd, Grosvenor House, 11 St Paul's Square, Birmingham, B3 1RB© 2024 Packt Publishing Limited All Rights Reserved
Installation Directory
/Library/PostgreSQL/15
Port
5432
Select Components
PostgreSQL Server
pgAdmin 4
Stack Builder
Command Line Tools...

Installation Directory	`/Library/PostgreSQL/15`
Port	`5432`
Select Components	PostgreSQL Server pgAdmin 4 Stack Builder Command Line Tools...

You're reading from Building ETL Pipelines with Python

Loading Transformed Data

Technical requirements

Introduction to data loading

Choosing the load destination

Best practices for data loading

Optimizing data loading activities by controlling the data import method

Creating demo data

Precautions to consider

Tutorial – preparing your local environment for data loading activities

Downloading and installing PostgreSQL

Summary

Unlock this book and the full library FREE for 7 days

Authors (2)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook