Reader small image

You're reading from  TinyML Cookbook - Second Edition

Product typeBook
Published inNov 2023
PublisherPackt
ISBN-139781837637362
Edition2nd Edition
Right arrow
Author (1)
Gian Marco Iodice
Gian Marco Iodice
author image
Gian Marco Iodice

Gian Marco Iodice is team and tech lead in the Machine Learning Group at Arm, who co-created the Arm Compute Library in 2017. The Arm Compute Library is currently the most performant library for ML on Arm, and it's deployed on billions of devices worldwide – from servers to smartphones. Gian Marco holds an MSc degree, with honors, in electronic engineering from the University of Pisa (Italy) and has several years of experience developing ML and computer vision algorithms on edge devices. Now, he's leading the ML performance optimization on Arm Mali GPUs. In 2020, Gian Marco cofounded the TinyML UK meetup group to encourage knowledge-sharing, educate, and inspire the next generation of ML developers on tiny and power-efficient devices.
Read more about Gian Marco Iodice

Right arrow

Overview of deep learning

ML is the ingredient that makes our tiny devices capable of making intelligent decisions. These software algorithms heavily rely on the correct data to learn patterns or actions based on experience. As we commonly say, data is everything for ML because it is what makes or breaks an application.

This book will refer to deep learning (DL) as a specific class of ML that can perform complex prediction tasks directly on raw images, text, or sound. These algorithms have state-of-the-art accuracy and can be better and faster than humans in solving some data analysis problems.

A complete discussion of DL architectures and algorithms is beyond the scope of this book. However, this section will summarize some essential points relevant to understanding the following chapters.

Deep neural networks

A deep neural network consists of several stacked layers aimed at learning patterns.

Each layer contains several neurons, the fundamental computing elements for artificial neural networks (ANNs) inspired by the human brain.

A neuron produces a single output through a linear transformation, defined as the weighted sum of the inputs plus a constant value called bias, as shown in the following diagram:

Diagram

Description automatically generated

Figure 1.3: A neuron representation

The coefficients of this weighted sum are called weights.

Weights and bias are obtained after an iterative training process to make the neuron capable of learning complex patterns. However, neurons can only solve simple linear problems with linear transformations. Therefore, non-linear functions, called activations, generally follow the neuron’s output to help the network learn complex patterns:

Diagram

Description automatically generated
Figure 1.4: An activation function

An example of a widely adopted activation function is the rectified linear unit (ReLU), which returns the maximum value between the input value and 0:

float relu(float input) {
  return max(input, 0);
}

Its computational simplicity makes it preferable to other non-linear functions, such as a hyperbolic tangent or logistic sigmoid, requiring more computational resources.

In the following subsection, we will see how the neurons are connected to solve complex visual recognition tasks.

Convolutional neural networks

Convolutional neural networks (CNNs) are specialized deep neural networks predominantly applied to visual recognition tasks.

We can consider CNNs as the evolution of a regularized version of the classic fully connected neural networks with dense layers, also known as fully connected layers.

As we can see in the following diagram, a characteristic of fully connected networks is connecting every neuron to all the output neurons of the previous layer:

Diagram

Description automatically generated

Figure 1.5: A fully connected network

Unfortunately, this method of connecting neurons does not work well for training a model for image classification.

For instance, if we considered an RGB image of size 320x240 (width x height), we would need 230,400 (320*240*3) weights for just one neuron. Since our models will undoubtedly need several layers of neurons to discern complex problems, the model will likely overfit, given the unmanageable number of trainable parameters. Overfitting implies that the model learns to predict the training data well but struggles to generalize data not used during the training process (unseen data).

In the past, data scientists adopted manual feature engineering techniques to extract a reduced set of good features from images. However, the approach suffered from being difficult, time-consuming, and domain-specific.

With the rise of CNNs, visual recognition tasks saw improvement thanks to convolution layers, which make feature extraction part of the learning problem.

Based on the assumption that we are dealing with images and inspired by biological processes in the animal visual cortex, the convolution layer borrows the widely adopted convolution operator from image processing to create a set of learnable features.

The convolution operator is performed similarly to other image processing routines: sliding a window application (filter or kernel) onto the entire input image and applying the dot product between its weights and the underlying pixels, as shown in Figure 1.6:

A picture containing graphical user interface

Description automatically generated

Figure 1.6: Convolution operator

This approach brings two significant benefits:

  • It extracts the relevant features automatically without human intervention.
  • It reduces the number of input signals per neuron considerably.

For instance, applying a 3x3 filter on the preceding RGB image would only require 27 weights (3*3*3).

Like fully connected layers, convolution layers need several kernels to learn as many features as possible. Therefore, the convolution layer’s output generally produces a set of images (feature maps), commonly kept in a multidimensional memory object called a tensor, as shown in the following illustration:

Diagram

Description automatically generated

Figure 1.7: Representation of a 3D tensor

Traditional CNNs for visual recognition tasks usually include the fully connected layers at the network’s end to carry out the prediction stage. Since the output of the convolution layers is a set of images, we generally adopt subsampling strategies to reduce the information propagated through the network and the risk of overfitting when feeding the fully connected layers.

Typically, there are two ways to perform subsampling:

  • Skipping the convolution operator for some input pixels. As a result, the output of the convolution layer will have fewer spatial dimensions than the input ones.
  • Adopting subsampling functions such as pooling layers.

The following figure shows a generic CNN architecture, where the pooling layer reduces the spatial dimensionality, and the fully connected layer performs the classification stage:

Diagram

Description automatically generated

Figure 1.8: Traditional CNN with a pooling layer to reduce the spatial dimensionality

When developing DL networks for tinyML, one of the most crucial factors is the model’s size, defined as the number of trainable weights. Due to the limited physical memory of our platforms, the model needs to be compact to fit the target device. However, memory constraints are not the only challenge we may face. For instance, while trained models often use floating-point precision arithmetic operations, the CPUs on our platforms may lack hardware acceleration.

Thus, to overcome these limitations, quantization becomes an indispensable technique.

Model quantization

Quantization is the process of performing neural network computations in lower bit precision. The widely adopted technique for microcontrollers applies the quantization post-training and converts the 32-bit floating-point weights to 8-bit integer values. This technique brings a 4x model size reduction and a significant latency improvement with little or no accuracy drop.

Other techniques like pruning (setting weights to zero) or clustering (grouping weights into clusters) can help reduce the model size. However, in this book, we will limit the scope to the quantization technique because it is sufficient to showcase the model deployment on microcontrollers.

If you are interested in learning more about pruning and clustering, you can refer to the following practical blog post, which shows the benefit of these two techniques on the model size: https://community.arm.com/arm-community-blogs/b/ai-and-ml-blog/posts/pruning-clustering-arm-ethos-u-npu.

As we know, ML is the component that allows smartness into our application. Nevertheless, to ensure the longevity of battery-powered applications, it is essential to use low-power devices. So far, we have mentioned power and energy in general terms, but let’s see what they mean practically in the following section.

Previous PageNext Page
You have been reading a chapter from
TinyML Cookbook - Second Edition
Published in: Nov 2023Publisher: PacktISBN-13: 9781837637362
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Gian Marco Iodice

Gian Marco Iodice is team and tech lead in the Machine Learning Group at Arm, who co-created the Arm Compute Library in 2017. The Arm Compute Library is currently the most performant library for ML on Arm, and it's deployed on billions of devices worldwide – from servers to smartphones. Gian Marco holds an MSc degree, with honors, in electronic engineering from the University of Pisa (Italy) and has several years of experience developing ML and computer vision algorithms on edge devices. Now, he's leading the ML performance optimization on Arm Mali GPUs. In 2020, Gian Marco cofounded the TinyML UK meetup group to encourage knowledge-sharing, educate, and inspire the next generation of ML developers on tiny and power-efficient devices.
Read more about Gian Marco Iodice