Reader small image

You're reading from  Data Engineering with Apache Spark, Delta Lake, and Lakehouse

Product typeBook
Published inOct 2021
PublisherPackt
ISBN-139781801077743
Edition1st Edition
Right arrow
Author (1)
Manoj Kukreja
Manoj Kukreja
author image
Manoj Kukreja

Manoj Kukreja is a Principal Architect at Northbay Solutions who specializes in creating complex Data Lakes and Data Analytics Pipelines for large-scale organizations such as banks, insurance companies, universities, and US/Canadian government agencies. Previously, he worked for Pythian, a large managed service provider where he was leading the MySQL and MongoDB DBA group and supporting large-scale data infrastructure for enterprises across the globe. With over 25 years of IT experience, he has delivered Data Lake solutions using all major cloud providers including AWS, Azure, GCP, and Alibaba Cloud. On weekends, he trains groups of aspiring Data Engineers and Data Scientists on Hadoop, Spark, Kafka and Data Analytics on AWS and Azure Cloud.
Read more about Manoj Kukreja

Right arrow

Running a data pipeline

Once the development and deployment have succeeded, it is time to orchestrate the data pipeline. Data pipeline runs are typically instantiated using the following three methods:

  • Manually—The simplest way to invoke a data pipeline is by doing this manually. This means that action needs to be taken by either using the control panel, command-line tools, or REpresentational State Transfer (REST) APIs. This method is suitable for development/testing or one-off executions but is unsuitable for production. As an example, data engineers may choose to run a pipeline manually while performing unit testing or may need to perform a one-off execution of the pipeline because the scheduled run failed.
  • Scheduled—In this method, the data pipeline is invoked using a scheduler. The scheduler can either be operating system-based—using schedulers in orchestration tools—or built into the ETL tool itself. This is the most common method of invoking...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Data Engineering with Apache Spark, Delta Lake, and Lakehouse
Published in: Oct 2021Publisher: PacktISBN-13: 9781801077743

Author (1)

author image
Manoj Kukreja

Manoj Kukreja is a Principal Architect at Northbay Solutions who specializes in creating complex Data Lakes and Data Analytics Pipelines for large-scale organizations such as banks, insurance companies, universities, and US/Canadian government agencies. Previously, he worked for Pythian, a large managed service provider where he was leading the MySQL and MongoDB DBA group and supporting large-scale data infrastructure for enterprises across the globe. With over 25 years of IT experience, he has delivered Data Lake solutions using all major cloud providers including AWS, Azure, GCP, and Alibaba Cloud. On weekends, he trains groups of aspiring Data Engineers and Data Scientists on Hadoop, Spark, Kafka and Data Analytics on AWS and Azure Cloud.
Read more about Manoj Kukreja