Reader small image

You're reading from  Data Engineering with Python

Product typeBook
Published inOct 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781839214189
Edition1st Edition
Languages
Right arrow
Author (1)
Paul Crickard
Paul Crickard
author image
Paul Crickard

Paul Crickard authored a book on the Leaflet JavaScript module. He has been programming for over 15 years and has focused on GIS and geospatial programming for 7 years. He spent 3 years working as a planner at an architecture firm, where he combined GIS with Building Information Modeling (BIM) and CAD. Currently, he is the CIO at the 2nd Judicial District Attorney's Office in New Mexico.
Read more about Paul Crickard

Right arrow

Chapter 15: Real-Time Edge Data with MiNiFi, Kafka, and Spark

In this chapter, you will learn how Internet-of-Things (IoT) devices, small computers, and sensors can send data into a data pipeline using Apache NiFi. For computers or devices with little processing power, MiNiFi allows them to be part of a NiFi data pipeline. MiNiFi is a lightweight version of NiFi with a stripped-down set of processors and no graphical user interface. It is built to send data using a data pipeline built into NiFi and deployed to the device.

In this chapter, we're going to cover the following main topics:

  • Setting up MiNiFi on a device
  • Building and deploying a MiNiFi task in NiFi

Setting up MiNiFi

Apache MiNiFi is a lightweight version of NiFi, to be used in data collection at the source. Increasingly, the source has become smaller IoT devices, sensors, and low-powered computers such as the Raspberry Pi. To incorporate these devices into your data pipelines, you need a way to get the data off the device. MiNiFi allows you to stream the data to NiFi as part of a standard data pipeline.

To get the MiNiFi binary, browse to https://nifi.apache.org/minifi/. The following screenshot is of the MiNiFi home page and will provide you with information and documentation for the project:

Figure 15.1 – The Apache MiNiFi home page

From the main navigation bar, go to Downloads and select the Download MiNiFi Components option. You will need to decide whether you want to run the MiNiFi Java or MiNiFi C++ version. Which version is appropriate will depend on the specifications of the device where MiNiFi will live. If you need the smallest footprint...

Building a MiNiFi task in NiFi

In this section, you will build a data pipeline and deploy it to MiNiFi. The data pipeline will generate flow files and send them to NiFi. The next section will take this further and use a processor that is not included with MiNiFi.

To use MiNiFi, you will need an older version of NiFi. The current tool – 0.5.0 – breaks because of changes to properties output from the nifi template. It will be fixed in 0.6.0, but until then, you will need to use at least version 1.9.0 of NiFi. You can get older NiFi versions at https://archive.apache.org/dist/nifi/1.9.0/. Unzip NiFi using the tar command with the -xvzf flags. Place the folder in your home directory using mv or your file explorer tools.

You will also need an older version of Java. To install the correct version of Java, use the following command:

sudo apt-get install openjdk-8-jre

Lastly, you will also need to make sure that NiFi is configured to allow site-to-site connections...

Summary

In this chapter, you learned how MiNiFi provides a means by which you can stream data to a NiFi instance. With MiNiFi, you can capture data from sensors, smaller devices such as a Raspberry Pi, or on regular servers where the data lives, without needing a full NiFi install. You learned how to set up and configure a remote processor group that allows you to talk to a remote NiFi instance.

In the Appendix, you will learn how you can cluster NiFi to run your data pipelines on different machines so that you can further distribute the load. This will allow you to reserve servers for specific tasks, or to spread large amounts of data horizontally across the cluster. By combining NiFi, Kafka, and Spark into clusters, you will be able to process more data than any single machine.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Engineering with Python
Published in: Oct 2020Publisher: PacktISBN-13: 9781839214189
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Paul Crickard

Paul Crickard authored a book on the Leaflet JavaScript module. He has been programming for over 15 years and has focused on GIS and geospatial programming for 7 years. He spent 3 years working as a planner at an architecture firm, where he combined GIS with Building Information Modeling (BIM) and CAD. Currently, he is the CIO at the 2nd Judicial District Attorney's Office in New Mexico.
Read more about Paul Crickard