Reader small image

You're reading from  Artificial Intelligence for IoT Cookbook

Product typeBook
Published inMar 2021
Reading LevelIntermediate
PublisherPackt
ISBN-139781838981983
Edition1st Edition
Languages
Right arrow
Author (1)
Michael Roshak
Michael Roshak
author image
Michael Roshak

Michael Roshak is a cloud architect and strategist with extensive subject matter expertise in enterprise cloud transformation programs and infrastructure modernization through designing, and deploying cloud-oriented solutions and architectures. He is responsible for providing strategic advisory for cloud adoption, consultative technical sales, and driving broad cloud services consumption with highly strategic accounts across multiple industries.
Read more about Michael Roshak

Right arrow
Setting Up the IoT and AI Environment

The Internet of Things (IoT) and artificial intelligence (AI) are leading to a dramatic impact on people's lives. Industries such as medicine are being revolutionized by wearable sensors that can monitor patients after they leave the hospital. Machine learning (ML) used on industrial devices is leading to better monitoring and less downtime with techniques such as anomaly detection, predictive maintenance, and prescriptive actions.

Building an IoT device capable of delivering results relies on gathering the right information. This book gives recipes that support the end-to-end IoT/ML life cycle. The next chapter has recipes for making sure that devices have the right sensors and the data is the best it can be for ML outcomes. Tools such as explanatory factor analysis and data collection design are used.

This chapter will cover the...

Choosing a device

Before starting with the classic recipe-by-recipe formatting of a cookbook, we'll start by covering a couple of base topics. Choosing the right hardware sets the stage for AI. Working with IoT means working with constraints. Using ML in the cloud is often a cost-effective solution as long as the data is small. Image, video, and sound data will often bog down networks. Worse yet, if you are using a cellular network, it can be highly expensive. The adage there is no money in hardware refers to the fact that most of the money made from IoT comes from the selling of services, not from producing expensive devices.

Dev kits  

Often, companies have their devices designed by electrical engineers. This is a cost-effective option. Custom boards do not have extra components, such as unnecessary Bluetooth or extra USB ports. However, predicting CPU and RAM requirements of an ML model at board design time is difficult. Starter kits can be useful tools to use until the hardware requirements are understood. The following boards are among the most widely adopted boards on the market:

  • Manifold 2-C with NVIDIA TX2
  • The i.MX series
  • LattePanda
  • Raspberry Pi Class
  • Arduino
  • ESP8266

They are often used as a scale of functionality. A Raspberry Pi Class device, for example, would struggle with custom vision applications but would do great for audio or general ML applications. One determining factor for many data scientists is the programming language. The ESP8266 and Arduino need to be programmed in a low-level language such as C or C++, while devices such as Raspberry Pi Class or above can be programmed...

Manifold 2-C with NVIDIA TX2

The NVIDIA Jetson is one of the best choices for running complex ML models such as real-time video on the Edge. The NVIDIA Jetson comes with a built-in NVIDIA GPU. The Manifold version of the product is designed to fit onto a DJI drone and perform tasks such as image recognition or self-flying. The only downside to running NVIDIA Jetson is its use of the ARM64 architecture. ARM64 does not work well with TensorFlow, although other libraries such as PyTorch work fine on ARM64. The Manifold retails for $500, which makes it a high-price option, but this is often necessary when doing real-time ML on the Edge:

Price Typical Models Use Cases
$500 Re-enforcement learning, computer vision Self-flying drones, robotics

 

The i.MX series

The i.MX series of chips is open source and boasts impressive RAM and CPU capabilities. The open design helps engineers build boards easily. The i.MX series uses Freescale semiconductors. Freescale semiconductors have guaranteed production life runs of 10 through 15 years, which means the board design will be stable for years. The i.MX 6 can range from $200 to $300 in cost and can handle CPU-intensive tasks easily, such as object recognition in live streaming video:

Price Typical Models Use Cases
$200+ Computer vision, NLP Sentiment analysis, face recognition, object recognition, voice recognition

 

LattePanda 

Single Board Computers (SBCs) such as the LattePanda are capable of running heavy sensor workloads. These devices can often run Windows or Linux. Like the i.MX series, they are capable of running object recognition on the device; however, the frame rate for recognizing objects can be slow:

Price Typical Models Use Cases
$100+ Face detection, voice recognition, high-speed Edge models Audio-enabled kiosk, high-frequency heart monitoring 

Raspberry Pi Class

Raspberry Pis are a standard starter kit for IoT. With their $35 price tag, they give you a lot of capability for the cost: they can run ML on the Edge with containers. They have a Linux or IoT Core operating system, which allows the easy plugging and playing of components and a community of developers building similar platform tools. Although Raspberry Pi Class devices are capable of handling most ML tasks, they tend to have performance issues on some of the more intensive tasks, such as video recognition:

Price Typical Models Use Cases
$35 Decision trees, artificial neural networks, anomaly detection Smart home, industrial IoT

 

Arduino

At $15, the Arduino is a cost-effective solution. Arduino is supported by a large community and uses the Arduino language, a set of C/C++ functions. If you need to run ML models on an Arduino device, it is possible to package ML models built on popular frameworks such as PyTorch into the Embedded Learning Library (ELL). The ELL allows ML models to be deployed on the device without needing the overhead of a large operating system. Porting ML models using ELL or TensorFlow Lite can be challenging due to the limited memory and compute capacity of the Arduino:

Price Typical Models Use Cases
$15 Linear regression Sensor reading classification

 

ESP8266 

At under $5, devices such as the ESP8266 and smaller represent a class of devices that take data in and transmit it to the cloud for ML evaluations. Besides being inexpensive, they are also often low-power devices, so they can be powered by solar power, network power, or a long-life battery:

Price Typical Models Use Cases
$5 or below In the cloud only In the cloud only

Setting up Databricks

Processing large amounts of data is not possible on a single computer. That is where distributed systems such as Spark (made by Databricks) come in. Spark allows you to parallelize large workloads over many computers. 

Spark was developed to help solve the Netflix Prize, which had a $1 million prize for the team that made the best recommendation engine. Spark uses distributed computing to wrangle large and complex datasets. There are distributed Python equivalent libraries, such as Koalas, which is a distributed equivalent of pandas. Spark also supports analytics and feature engineering that requires a large amount of compute and memory, such as graph theory problems. Spark has two modes: a batch mode for training large datasets and a streaming mode for scoring data in near real time. 

IoT data tends to be large and imbalanced. A device may have 10 years of data showing it is running in normal conditions and only a few records showing it needs...

Storing data

Today, there are tools that make it easy to work with large amounts of data. There are a few things to remember though. There are optimal ways of storing data at scale that can make dealing with large datasets easier.

Working with data, the type of large datasets that come from IoT devices can be prohibitively expensive for many companies. Storing data in Delta Lake, for example, can give the user a 340-times performance boost over accessing the data over JSON. The next three sections will introduce three storage methods that can cut down a data analytics job from weeks to hours.

Parquet

Parquet is one of the most common file formats in big data. Parquet's columnar storage format allows it to store highly compressed data. Its advantage is that it takes up less space on the hard disk and takes up less network bandwidth, making it ideal for loading into a DataFrame. Parquet ingestion into Spark has been benchmarked at 34 times the speed of JSON.

Avro

The Avro format is a popular storage format for IoT. While it does not have the high compression ratio that Parquet does, it is less compute expensive to store data because it uses a row-level data storage schema. Avro is a common format for streaming data such as IoT Hub or Kafka.

Delta Lake

Delta Lake is an open source project released by Databricks in 2019. It stores files in Parquet. In addition, it is able to keep track of data check-ins, enabling the data scientist to look at data as it existed at a given time. This can be useful when trying to determine why accuracy in a particular ML model drifted. It also keeps metadata about the data, giving it a 10-times performance increase over standard Parquet for analytics workloads.

While considerations are given to both choosing a device and setting up Databricks, the rest of this chapter will follow a modular, recipe-based format.

Setting up IoT Hub

Developing IoT solutions can be complicated. There are many issues to deal with, such as ML, Edge deployments, security, monitoring device state, and ingesting telemetry in the cloud. Cloud providers such as Azure provide a ready-made solution that can have components such as data storage and cloud-to-device messages built in.

In this recipe, we are going to set up IoT Hub in Azure for an IoT Edge device that will be doing ML calculations on the Edge.

Getting ready

Before using IoT Hub, you need to have a device and an Azure subscription. There is a free trial subscription available if you do not already have one. You will also need some sort of device.

How to do it...

To set up IoT Hub, the first thing you will need is a resource group. Resource groups are like folders on Windows or macOS. They allow you to place all of the resources for a particular project in the same location. The resource groups icon is in the Favorites menu in the left panel of the Azure portal:

The following is what we need to do:

  1. Select Create a resource. From there, the wizard will take you through the steps to create a resource group.
  2. Then, click on the + icon at the top to create an IoT Hub instance.
  3. In the search box, type in IoT Hub. The wizard will take you through how to set up IoT Hub.
One important thing to note on the Scale page is you will want to select the S1 or higher pricing tier. The S1 tier gives you bidirectional communication with the device and also enables you to use advanced features such as dice twins and the ability to push ML models to Edge devices.

How it works...

IoT Hub is a platform developed specifically for IoT. Issues that affect IoT, such as unreliable communication, are handled through mechanisms such as Advanced Message Queuing Protocol (AMQP) and Message Queuing Telemetry Transport (MQTT). IoT Hub has a rich ecosystem of tools to help IoT developers, such as device twins, cloud-to-device messages, a device security center, Kubernetes integration, and a marketplace for Edge modules.

Setting up an IoT Edge device

In this recipe, we're going to set up an IoT Edge device that can communicate with IoT Hub and also receive new ML containers that it can use to perform ML evaluations on the device.

IoT Edge devices have advantages over traditional IoT devices. The main advantage is their ability to update over the air (OTA). By using containers, models can be deployed easily without having to worry about bricking the device.

Getting ready

Before you create an IoT Edge device, make sure that your device is supported by IoT Edge. Some device architectures, such as ARM64, are not. Next, make sure your IoT Hub instance from the previous recipe is up and running. The IoT Edge runtime must be installed on your device. For the sake of this tutorial, we will assume that the user has a Raspberry Pi Class device.

How to do it...

To set up an IoT Edge device, you will need to set up both the cloud and device side. The IoT device needs a place in the cloud to send its information. This recipe has two parts. The first part is configuring the IoT Edge device in IoT Hub. The second is configuring the device to talk to the cloud.

Configuring an IoT Edge device (cloud side)

The steps are as follows:

  1. In the IoT Hub blade, select IoT Edge.
  2. Click on the + Add an IoT Edge device button. This will take you to the Add IoT Edge device wizard.
  3. Give your device a unique device ID and select Save.
  4. A new device will be displayed in the middle of the screen. Click on that device and copy its primary connection string:

The next section explains getting a device to talk to the cloud. To do this, you will need the device connection string. The device connection string can be found in the device properties section. Click on the device you want the connection string from and copy the connection string.

Configuring an IoT Edge device (device side)

The first thing to do is to install Moby. Moby is a scaled-down version of Docker. Docker allows you to push Edge modules down to the device. These models can be data-collected modules from sensors and they can also be ML modules. The steps are as follows:

  1. Download and install the Moby engine on the device:
curl -L https://aka.ms/moby-engine-armhf-latest -o moby_engine.deb && sudo dpkg -i ./moby_engine.deb
  1. Download and install the Moby CLI:
curl -L https://aka.ms/moby-cli-armhf-latest -o moby_cli.deb && sudo dpkg -i ./moby_cli.deb
  1. Fix the installation:
sudo apt-get install -f
  1.  Install the IoT Edge security manager:
curl -L https://aka.ms/libiothsm-std-linux-armhf-latest -o libiothsm-std.deb && sudo dpkg -i ./libiothsm-std.deb
  1. Install the security daemon:
curl -L https://aka.ms/iotedged-linux-armhf-latest -o iotedge.deb && sudo dpkg -i ./iotedge.deb
  1. Fix the installation:
sudo apt-get install ...

How it works...

In this recipe, we've created a device in the cloud that has a specific key for that device. This is part of a security measure where each device has its own unique key for a device. If a device becomes compromised, it can be shut off.

We then added the IoT Edge SDK to the device and connected it to the cloud. At this point, the device is fully connected to the cloud and is ready to receive ML models and send its telemetry to the cloud. The next step is to deploy Edge modules to the device. These Edge modules are dockerized containers that can access sensors on the device and send telemetry to the cloud or run a trained model.

Deploying ML modules to Edge devices

Docker is the primary method of deployment for IoT devices. Docker allows you to create and test containers locally and deploy them to edge devices. Docker files can be specially scripted to deploy in various chip architectures such as x86 and ARM. In this recipe, we're going to walk through creating an IoT Edge module with ML libraries deployed from the cloud. 

Getting ready

To create an IoT Edge module, first install Visual Studio Code. After Visual Studio Code is up and running, install the Azure IoT Edge extension. This can be done by finding the extension icon () in the side panel of Visual Studio Code. In the extension search bar, search for azure iot edge and install the extension:

After installing the extension, Visual Studio Code now has a wizard that can create an IoT Edge deployment. With a few modifications, it can be configured to deploy an ML model.

How to do it...

The steps for this recipe are as follows:

  1. In Visual Studio Code, press Ctrl + Shift + P to bring up the command window and find Azure IoT Edge: New IoT Edge Solution:

  1. Select a location for your code.
  2. Enter a solution name.
  3. Choose a language. For the purpose of this book, we will be using Python as our language.
  4. Create a module name.
  5. Select a local port for running your code locally.

How it works...

After completing the wizard, you should see something in your Visual Studio Code explorer that looks like the following:

Let's explore what was created for you. The main entry point for the project is main.pymain.py has a sample to help make the development time faster. To deploy main.py, you will use the deployment.template.json file. Right-clicking on deployment.template.json brings up a menu that has an option to create a deployment manifest. In the modules folder, there is a sample module with three Docker files for ARM32, AMD64, and AMD64 in debug mode. These are the currently supported chip set architectures. Dockerfile.arm32v7 is the architecture that is supported on Raspberry Pi v3.

To make sure you build ARM32 containers and not AMD64 containers, go into the module.json file and remove any references to other Docker files. For example, the following has three Docker references:

platforms": {
"amd64"...

There's more...

To install TensorFlow, which is an ML library, Keras, which is an abstraction layer on top of TensorFlow that makes it easier to program, and h5py, which is a serialization layer that allows you to serialize and deserialize TensorFlow models, go to the target Docker container, then go into the requirements.txt file and install the libraries by inserting the following:

tensorflow
keras
h5py

Setting up Kafka

Kafka is an open source project that is inexpensive at scale, can execute ML models with millisecond latency, and has a multi-topic pub/sub model. There are several ways to set up Kafka. It is an open source project, so you can download the Kafka project and run Zookeeper and Kafka locally. Confluent, the parent company of Kafka, has a paid service that offers many additional features, such as dashboards and KSQL. They are available in Azure, AWS, and Google Cloud as a managed service and also, you can run Kafka as a dockerized container for development use.

One downside about using Kafka is that there is a lot of additional overhead to do to make it a good IoT project. Kafka, for example, is not secure by default. Security is handled through a series of plugins both on the device side through x.509 certificates and on the cloud side through Lightweight Directory Access Protocol (LDAP), Ranger, or Kerberos plugins. Deploying ML models is also not trivial. Any ML...

Getting ready

In this example, we will be using Confluent Kafka in docker-compose. You will need to have Git, Docker, and docker-compose installed on your computer to run this recipe. To add ML models to Kafka streams, you will need to use a platform that runs on Java, such as H2O or TensorFlow.

How to do it...

The steps for this recipe are as follows:

  1.  Clone the repo:
git clone https://github.com/confluentinc/cp-all-in-on
  1. Run docker-compose:
docker-compose up -d --build

Confluent Kafka comes with many containers. After waiting about 10 minutes for the containers to finish launching, go to a browser and navigate to localhost:9091 to see Kafka Control Center.

How it works...

Kafka uses a journal to record data coming from end users into topics. These topics can then be read by consumers of the data. What has made Kafka a popular tool among the IoT community is its advanced features. Multiple streams can be combined, and streams can be turned into a key/value-based table where the most recent stream updates the table. But most importantly for the purpose of this book, ML algorithms can be run on streaming data with latency times in milliseconds. This recipe shows how to push data into Kafka and then create a Java project to manipulate the data in real time.

There's more...

Streaming data into Kafka is fairly easy. There are producers that send device-to-cloud messages and consumers that receive cloud-to-device messages. In the following example, we are going to implement a producer:

  1. Download an example project:
git clone https://github.com/Microshak/KafkaWeatherStreamer.git
cd KafkaWeatherStreamer
  1. Install the requirements:
pip install -r requirements.txt
  1. Run the weather.py file:
python3 weather.py

You should now be able to look at your Kafka Control Center and see data flowing in. The Kafka Streams API is a real-time platform that can perform ML computations with millisecond latency. The Streams API has the concepts of KTables and KStreams. KStreams are data streaming into Kafka on various topics. KTables are streams turned into tables where the data is updated every time there is a new record associated with its primary key. This allows multiple streams to be joined together similarly to how tables in a database are joined together...

Installing ML libraries on Databricks

Databricks is a unified big data and analytics platform. It is great for training ML models and working with the kind of large-scale data that is often found in IoT. There are extensions such as Delta Lake that allow researchers the ability to view data as it existed at certain periods of time so that they can do analysis when models drift. There are also tools such as MLflow that allow the data scientist to compare multiple models against each other. In this recipe, we are going to install various ML packages such as TensorFlow, PyTorch, and GraphFrames on Databricks. Most ML packages can be installed via PyPI. The format used to install TensorFlow, for example, will work on various ML frameworks such as OpenAI Gym, Sonnet, Keras, and MXNet. Some tools are available in Databricks that are not available in Python. For those, we use the pattern explored by GraphX and GraphFrame where packages are installed through Java extensions.

Getting ready

Before we start, it's important to know how the components work with each other. Let's start with workspaces. The workspace area is where you can share results between data scientists and engineers through the use of Databricks notebooks. Notebooks can interoperate with the filesystem in Databricks to store Parquet or Delta Lake files. The workspaces section also stores files such as Python libraries and JAR files. In the workspaces section, you can create folders to store shared files. I typically create a packages folder to store the Python and JAR files. Before we install the Python packages, let's first examine what a cluster is by going to the cluster section.

In your Databricks instance, go to the Clusters menu. You can create a cluster or use a cluster that has already been created. With clusters, you specify the amount of compute needed. Spark can work over large datasets but also work with GPUs for ML-optimized workloads. Some clusters have ML tools...

How to do it...

Traditional ML notebooks can have issues with different versions of ML packages installed. Databricks circumvents this by allowing users to set up resources that have a set of preinstalled packages. In this recipe, we're going to install various ML packages into Databricks. These packages can then be assigned to all new clusters going forward or specific clusters. This gives data scientists the flexibility to work with new versions of ML packages but still support older ML models they have developed. We will look at this recipe in three parts.

Importing TensorFlow

Perhaps the easiest way to import a Python library such as TensorFlow is to use PyPI. Simply go to https://pypi.org/ and search for TensorFlow. This will give you the information needed and the ability to look at different versions. The installation steps are as follows:

  1. Go to https://pypi.org/ and search for TensorFlow.
  2. Copy the name and version number you want in this format: tensorflow==1.14.0
  3. In the Workspace tab of Databricks, right-click anywhere and from the dropdown, click on Create and then Library:

  1. On the Create Library page, select PyPI as the library source:

  1. Copy the name of the library and the version number and paste that into the Package section.
  2. Click Create.

If you have already created a cluster, you can attach TensorFlow to it. You can also have TensorFlow installed on all clusters.

Installing PyTorch

PyTorch is a popular ML library written in native Python and has built-in support for GPUs. Installing PyTorch is very similar to installing TensorFlow. You can install it via PyPI in the Create | Library menu choice. In the PyPI import library menu, put in the current version of PyPI (torch==1.1.0.post2). The installation steps are as follows:

  1. Go to https://pypi.org/ and search for PyTorch.
  2. Copy the name and version number you want in this format: torch==1.1.0.post2
  3. In the Workspace tab of Databricks, right-click anywhere and from the dropdown, click on Create and then Library.
  1. Select PyPI as the library source.
  2. Copy the name of the library and the version number and paste that into the Package section.
  3. Click Create.

If you have already created a cluster, you can attach PyTorch to it. You can also install PyTorch on all clusters. 

Installing GraphX and GraphFrames

Spark has some distributed libraries that are not available anywhere else in data science. GraphFrames is one of them. In graph theory, you can perform actions such as finding the shortest path, network flow, homophily, centrality, and influence. Because GraphFrames is built on GraphX, which is a Java library, you need to install the Java library, and then to use the Python wrapper, you will need to pip install the Python library that accesses the Java JAR file. The installation steps are as follows:

  1. Download a JAR file from https://spark-packages.org/package/graphframes/graphframes. You'll need to find a version that matches the version of Spark that you are running in your cluster.
  2. In the Workspace tab of Databricks, right-click anywhere and from the dropdown, click on Create and then Library.
  3. Drag and drop the JAR file into the space titled Drop JAR here.
  4. Click Create.
  5. Then, import another library.
  6. In the Workspace tab of Databricks...

How it works...

Being designed for both data engineers and data scientists, Databricks supports multi-versions of software and multi-languages. It does this by allowing the installation of different versions of ML packages by allowing the user to configure each cluster separately. TensorFlow is installed implicitly on the streaming cluster. Another cluster has the popular Conda environment installed. Finally, the test environment does not have TensorFlow installed.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Artificial Intelligence for IoT Cookbook
Published in: Mar 2021Publisher: PacktISBN-13: 9781838981983
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Michael Roshak

Michael Roshak is a cloud architect and strategist with extensive subject matter expertise in enterprise cloud transformation programs and infrastructure modernization through designing, and deploying cloud-oriented solutions and architectures. He is responsible for providing strategic advisory for cloud adoption, consultative technical sales, and driving broad cloud services consumption with highly strategic accounts across multiple industries.
Read more about Michael Roshak