Reader small image

You're reading from  Azure Data Engineer Associate Certification Guide

Product typeBook
Published inFeb 2022
PublisherPackt
ISBN-139781801816069
Edition1st Edition
Tools
Concepts
Right arrow
Author (1)
Newton Alex
Newton Alex
author image
Newton Alex

Newton Alex leads several Azure Data Analytics teams in Microsoft, India. His team contributes to technologies including Azure Synapse, Azure Databricks, Azure HDInsight, and many open source technologies, including Apache YARN, Apache Spark, and Apache Hive. He started using Hadoop while at Yahoo, USA, where he helped build the first batch processing pipelines for Yahoo's ad serving team. After Yahoo, he became the leader of the big data team at Pivotal Inc., USA, where he was responsible for the entire open source stack of Pivotal Inc. He later moved to Microsoft and started the Azure Data team in India. He has worked with several Fortune 500 companies to help build their data systems on Azure.
Read more about Newton Alex

Right arrow

Integrating Jupyter/Python notebooks into a data pipeline

Integrating Jupyter/Python notebooks into our ADF data pipeline can be done using the Spark activity in ADF. You will need an Azure HDInsight Spark cluster for this exercise.

The prerequisite for integrating Jupyter notebooks is to create linked services to Azure Storage and HDInsight from ADF and have an HDInsight Spark cluster running.

You have already seen how to create linked services, in the Developing batch processing solutions by using Data Factory, Data Lake, Spark, Azure Synapse Pipelines, PolyBase, and Azure Databricks section earlier in this chapter, so I'll not repeat the steps here.

Select the Spark activity from ADF and specify the HDInsight linked service that you created in the HDInsight linked service field under the HDI Cluster tab as shown in the following screenshot.

Figure 9.26 – Configuring a Spark activity in ADF

Now, start the Jupyter notebook by going to...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Azure Data Engineer Associate Certification Guide
Published in: Feb 2022Publisher: PacktISBN-13: 9781801816069

Author (1)

author image
Newton Alex

Newton Alex leads several Azure Data Analytics teams in Microsoft, India. His team contributes to technologies including Azure Synapse, Azure Databricks, Azure HDInsight, and many open source technologies, including Apache YARN, Apache Spark, and Apache Hive. He started using Hadoop while at Yahoo, USA, where he helped build the first batch processing pipelines for Yahoo's ad serving team. After Yahoo, he became the leader of the big data team at Pivotal Inc., USA, where he was responsible for the entire open source stack of Pivotal Inc. He later moved to Microsoft and started the Azure Data team in India. He has worked with several Fortune 500 companies to help build their data systems on Azure.
Read more about Newton Alex