TinyML Cookbook - Second Edition

Getting Ready to Unlock ML on Microcontrollers

Here we are – on the first step that marks the beginning of our journey into the world of tinyML.

We will start this chapter by giving an overview of this rapidly emerging field, discussing the opportunities and challenges of bringing machine learning (ML) to low-power microcontrollers.

After this introduction, we will delve into the fundamental elements that make tinyML unique from traditional ML in the cloud, on desktops, or even on smartphones. We will revisit some basic ML concepts and introduce new fundamental ones specific to this domain, regarding power consumption and microcontroller development. Don’t worry if you are new to embedded programming. In this chapter and the next, we will provide an introduction to microcontroller programming to ensure everyone has a solid foundation to get started.

Once we have presented the tinyML building blocks, we will focus on setting up a development environment for a simple but meaningful LED application, which will officially kick off our practical journey. In contrast to what we will find in the following chapters, this chapter has a more theoretical structure to get you familiar with the concepts and terminology of this fast-growing technology.

In this chapter, we will cover the following topics:

Introduction to tinyML
Overview of deep learning
Learning the difference between power and energy
Programming microcontrollers
Introduction to the development platforms
Setting up the software development environment
Deploying a sketch on microcontrollers

Introduction to tinyML

Tiny machine learning, or, as we will refer to it, tinyML, is a technology that is gaining huge momentum in various fields, due to its ability to enable non-intrusive smartness. tinyML is not new, as it has already facilitated consumer electronics like smart speakers and smartwatches for many years. However, recent advances in hardware and software have made it more accessible and practical than ever. Therefore, it is no longer a niche technology.

There are at least three factors that make tinyML particularly appealing: cost, energy, and privacy.

The first benefit given by this technology is its cost-effectiveness. Devices used in tinyML are typically low-cost, ranging from a few cents to a few dollars in most cases. As a result, it is an affordable technology for businesses and individuals to drive innovation.

The second unique advantage of tinyML is its ability to run ML on low-power platforms.

The overall goal of tinyML is to allow smartness through low-power devices. This feature enables applications to operate on compact batteries such as coin cells or even plants (https://www.youtube.com/watch?v=_xELDU15_oE) for months, contributing to tackling energy challenges sustainably.

Privacy is the other factor that makes tinyML an attractive technology. While the internet provides tremendous opportunities, there is always a concern regarding user data exposure to unauthorized parties. The risks here could concern compromised privacy or personal identity theft to commit fraud, just to name a couple. tinyML can mitigate this issue by running ML algorithms on-device without sending data to the cloud.

As you may have noticed, so far, we have discussed why tinyML has the potential to enable ubiquitous intelligence. However, what is tinyML in practical terms?

What is tinyML?

tinyML encompasses the set of ML and embedded system technologies to enable the creation of intelligent applications for low-power devices. Generally, these devices have limited memory and processing power, but they are equipped with sensors to sense the physical environment and make decisions based on ML algorithms.

In tinyML, ML and the deployment platform are not independent entities but entities that need to know each other at best. Building an ML architecture without considering the target device capabilities will make it challenging to deploy effective applications. On the other hand, designing power-efficient processors to expand the ML capabilities of these devices would be impossible without knowing the software algorithms involved. Therefore, we can only bring tremendous and compelling tinyML applications to life through a delicate balance between software and hardware.

Throughout this book, we will explore tinyML with microcontrollers as target devices. Why microcontrollers, you ask? Well, let’s just say that they are the perfect match for what we want, and in the following subsection, we will tell you why.

Why ML on microcontrollers?

The first and foremost reason for choosing microcontrollers is their popularity in various fields, such as automotives, consumer electronics, kitchen appliances, healthcare, and telecommunications. These devices are present in our day-to-day electronic devices, and with the emergence of the Internet of Things (IoT), their market growth has been exponential.

Already in 2018, the market research company IDC (https://www.idc.com) reported 28.1 billion microcontrollers sold worldwide. Those are impressive numbers, considering that 1.5 billion smartphones and 67.2 million PCs were sold in the same year. Therefore, tinyML is a significant milestone in the evolution of IoT devices, paving the way for the proliferation of intelligent and connected low-power devices.

The other reasons for choosing microcontrollers are their affordability, ease of programming, and ability to run sophisticated ML algorithms, making them suitable for a wide range of applications.

However, these devices are generally connected to the internet in the IoT space. Therefore, if we can transmit data to a trusted cloud service, why can’t we delegate the ML computation to it, given its superior performance? In other words, why do we need to run ML locally?

Why run ML on-device?

In addition to privacy, as discussed earlier, there are two other reasons to run ML locally:

Reducing latency: Sending data back and forth to and from the cloud is not instant and could affect applications that must respond reliably within a time frame.
Reducing power consumption: Sending and receiving data to and from the cloud is not power-efficient, even when using low-power communication protocols such as Bluetooth.

The following stacked bar chart shows the power consumption breakdown for the on-board components on the Arduino Nano 33 BLE Sense board, one of the microcontroller boards employed in this book:

A diagram of a radio transmitter

Description automatically generated

Figure 1.1: Power consumption breakdown for the Arduino Nano 33 BLE Sense board

Looking at the power consumption breakdown, we can observe that CPU computation uses less power than Bluetooth communication (14% versus 65%). As a result, it is preferable to compute more and transmit less to mitigate the risk of fast battery drain. Typically, the radio module, such as the one used for Bluetooth or other wireless communications, is the component that needs the most power in embedded devices.

Now that we know the benefits of running ML on these tiny devices, what are the practical opportunities and challenges?

The opportunities and challenges for tinyML

tinyML finds its natural home in applications where low power consumption is a critical requirement, such as when a device must operate with a battery for as long as possible.

If we think about it, we are already surrounded by battery-powered products that use ML under the hood. For example, wearable devices, such as smartwatches and fitness tracking bands, can recognize human activities to track our health goals or detect dangerous situations, such as a fall to the ground.

These products are based on tinyML for all intents and purposes because they need on-device ML on a low-power system to interpret sensor data continuously.

However, the use of battery-powered tinyML applications extends beyond wearable devices. For example, there are scenarios where we might need to monitor an environment to detect hazardous conditions, such as detecting fires to prevent them from spreading across a wide area.

There are unlimited use cases for tinyML, and the ones we briefly introduced are only a few.

However, despite the unlimited potential use cases for tinyML, some critical challenges must be addressed. The most significant challenges arise from the computational perspective of our devices, since they are often limited in memory and processing power. We work on systems with a few kilobytes of RAM and, in some cases, processors with no floating-point arithmetic acceleration. Furthermore, the deployment environment could be unfriendly. For example, environmental factors, such as dust and extreme weather conditions, could interfere during the normal execution of our applications.

As we have touched upon deployment environments briefly, let us delve deeper into them in the following subsection.

Deployment environments for tinyML

A tinyML application could live in both centralized and distributed systems.

In a centralized system, the application does not necessarily need to communicate with other devices. Nowadays, we interact with our smartphones, cameras, drones, and kitchen appliances seamlessly with our voices. For example, detecting the magic words “OK, Google,” “Alexa,” and so on in smart assistants is a tinyML application in every respect. In fact, this application can only run locally on a low-power system for a quick response and minimal power usage.

Usually, centralized tinyML applications aim to trigger more power-hungry functionalities, such as activating a media service.

In a distributed system, the device (that is, the node or sensor node) still performs ML locally but also communicates with nearby devices to achieve a common goal, as shown in Figure 1.2:

A picture containing text, device

Description automatically generated

Figure 1.2: A wireless sensor network

Since the nodes are part of a network and typically communicate through wireless technologies, we commonly call the network a wireless sensor network (WSN).

While this scenario may appear to conflict with the power consumption implications of transmitting data, devices may still need to collaborate to obtain meaningful knowledge about their working environment. In fact, specific applications may require a holistic understanding of the distribution of physical quantities, such as temperature, humidity, and soil moisture, rather than knowing the values from a particular node.

For example, consider an application to improve agriculture efficiency. In this scenario, a WSN could assist in identifying areas of the field that require more water than others. In fact, by gathering and analyzing data from multiple nodes across the field, a network can provide a comprehensive understanding of the soil moisture levels, helping farmers reduce their water usage. But that’s not all. Efficient communication protocols are crucial for the network’s lifetime. Therefore, we may think of using tinyML to make them more effective. Since sending raw data consumes too much energy, ML could perform a partial computation to reduce the data to transmit and the frequency of communications.

tinyML presents endless possibilities, and the few mentioned are a small fraction of what is achievable. For those seeking to expand their knowledge and skills in this field, tinyML Foundation is the ideal community to join.

Join the tinyML community!

tinyML Foundation (www.tinyml.org) is a non-profit organization that aims to educate, inspire, and connect the worldwide tinyML community.

Supported by companies such as Arm, Edge Impulse, Google, and Qualcomm, the foundation is energizing a diverse global community of engineers, scientists, academics, and business professionals to envision a world of ubiquitous devices powered by tinyML to create a healthier and sustainable environment.

Through free virtual and in-person initiatives, the tinyML Foundation promotes knowledge sharing, engagement, and connection among experts and newcomers. In 2023, over 13,000 people joined the group, and there have been 47 Meetup groups in 39 countries.

With several Meetup (https://www.meetup.com) groups in different countries, you can join any near you for free (https://www.meetup.com/en-AU/pro/TinyML/) to always be up to date with new tinyML technologies and upcoming events.

After this brief introduction to tinyML, it is time to explore its ingredients in more detail. The following section will start analyzing the element that makes our devices capable of intelligent decisions.

Overview of deep learning

ML is the ingredient that makes our tiny devices capable of making intelligent decisions. These software algorithms heavily rely on the correct data to learn patterns or actions based on experience. As we commonly say, data is everything for ML because it is what makes or breaks an application.

This book will refer to deep learning (DL) as a specific class of ML that can perform complex prediction tasks directly on raw images, text, or sound. These algorithms have state-of-the-art accuracy and can be better and faster than humans in solving some data analysis problems.

A complete discussion of DL architectures and algorithms is beyond the scope of this book. However, this section will summarize some essential points relevant to understanding the following chapters.

Deep neural networks

A deep neural network consists of several stacked layers aimed at learning patterns.

Each layer contains several neurons, the fundamental computing elements for artificial neural networks (ANNs) inspired by the human brain.

A neuron produces a single output through a linear transformation, defined as the weighted sum of the inputs plus a constant value called bias, as shown in the following diagram:

Diagram

Description automatically generated

Figure 1.3: A neuron representation

The coefficients of this weighted sum are called weights.

Weights and bias are obtained after an iterative training process to make the neuron capable of learning complex patterns. However, neurons can only solve simple linear problems with linear transformations. Therefore, non-linear functions, called activations, generally follow the neuron’s output to help the network learn complex patterns:

Figure 1.4: An activation function

An example of a widely adopted activation function is the rectified linear unit (ReLU), which returns the maximum value between the input value and 0:

float relu(float input) {
  return max(input, 0);
}

Its computational simplicity makes it preferable to other non-linear functions, such as a hyperbolic tangent or logistic sigmoid, requiring more computational resources.

In the following subsection, we will see how the neurons are connected to solve complex visual recognition tasks.

Convolutional neural networks

Convolutional neural networks (CNNs) are specialized deep neural networks predominantly applied to visual recognition tasks.

We can consider CNNs as the evolution of a regularized version of the classic fully connected neural networks with dense layers, also known as fully connected layers.

As we can see in the following diagram, a characteristic of fully connected networks is connecting every neuron to all the output neurons of the previous layer:

Figure 1.5: A fully connected network

Unfortunately, this method of connecting neurons does not work well for training a model for image classification.

For instance, if we considered an RGB image of size 320x240 (width x height), we would need 230,400 (320*240*3) weights for just one neuron. Since our models will undoubtedly need several layers of neurons to discern complex problems, the model will likely overfit, given the unmanageable number of trainable parameters. Overfitting implies that the model learns to predict the training data well but struggles to generalize data not used during the training process (unseen data).

In the past, data scientists adopted manual feature engineering techniques to extract a reduced set of good features from images. However, the approach suffered from being difficult, time-consuming, and domain-specific.

With the rise of CNNs, visual recognition tasks saw improvement thanks to convolution layers, which make feature extraction part of the learning problem.

Based on the assumption that we are dealing with images and inspired by biological processes in the animal visual cortex, the convolution layer borrows the widely adopted convolution operator from image processing to create a set of learnable features.

The convolution operator is performed similarly to other image processing routines: sliding a window application (filter or kernel) onto the entire input image and applying the dot product between its weights and the underlying pixels, as shown in Figure 1.6:

A picture containing graphical user interface

Description automatically generated

Figure 1.6: Convolution operator

This approach brings two significant benefits:

It extracts the relevant features automatically without human intervention.
It reduces the number of input signals per neuron considerably.

For instance, applying a 3x3 filter on the preceding RGB image would only require 27 weights (3*3*3).

Like fully connected layers, convolution layers need several kernels to learn as many features as possible. Therefore, the convolution layer’s output generally produces a set of images (feature maps), commonly kept in a multidimensional memory object called a tensor, as shown in the following illustration:

Figure 1.7: Representation of a 3D tensor

Traditional CNNs for visual recognition tasks usually include the fully connected layers at the network’s end to carry out the prediction stage. Since the output of the convolution layers is a set of images, we generally adopt subsampling strategies to reduce the information propagated through the network and the risk of overfitting when feeding the fully connected layers.

Typically, there are two ways to perform subsampling:

Skipping the convolution operator for some input pixels. As a result, the output of the convolution layer will have fewer spatial dimensions than the input ones.
Adopting subsampling functions such as pooling layers.

The following figure shows a generic CNN architecture, where the pooling layer reduces the spatial dimensionality, and the fully connected layer performs the classification stage:

Figure 1.8: Traditional CNN with a pooling layer to reduce the spatial dimensionality

When developing DL networks for tinyML, one of the most crucial factors is the model’s size, defined as the number of trainable weights. Due to the limited physical memory of our platforms, the model needs to be compact to fit the target device. However, memory constraints are not the only challenge we may face. For instance, while trained models often use floating-point precision arithmetic operations, the CPUs on our platforms may lack hardware acceleration.

Thus, to overcome these limitations, quantization becomes an indispensable technique.

Model quantization

Quantization is the process of performing neural network computations in lower bit precision. The widely adopted technique for microcontrollers applies the quantization post-training and converts the 32-bit floating-point weights to 8-bit integer values. This technique brings a 4x model size reduction and a significant latency improvement with little or no accuracy drop.

Other techniques like pruning (setting weights to zero) or clustering (grouping weights into clusters) can help reduce the model size. However, in this book, we will limit the scope to the quantization technique because it is sufficient to showcase the model deployment on microcontrollers.

If you are interested in learning more about pruning and clustering, you can refer to the following practical blog post, which shows the benefit of these two techniques on the model size: https://community.arm.com/arm-community-blogs/b/ai-and-ml-blog/posts/pruning-clustering-arm-ethos-u-npu.

As we know, ML is the component that allows smartness into our application. Nevertheless, to ensure the longevity of battery-powered applications, it is essential to use low-power devices. So far, we have mentioned power and energy in general terms, but let’s see what they mean practically in the following section.

Learning the difference between power and energy

Power matters in tinyML, and its target is in the milliwatt (mW) range or below, which means thousands of times more efficient than a traditional desktop machine.

Although there are cases where we might consider using energy harvesting solutions, such as solar panels, those could not always be possible because of cost and physical dimensions.

However, what do we mean by power and energy? Let’s discover these terms by giving a basic overview of the fundamental physical quantities governing electronic circuits. This knowledge will be crucial for building electronic circuits with microcontrollers in the following chapters.

Voltage versus current

Current is what makes an electronic circuit work, which is the flow of electric charges across surface A of a conductor in a given time, as described in the following diagram:

Figure 1.9: Current is a flow of electric charges across surface A at a given time

The current is defined as follows:

Here, we have the following:

I: Current, measured in amperes (A)
Q: The electric charges across surface A in a given time, measured in coulombs (C)
t: Time, measured in seconds (s)

The current flows in a circuit under the following conditions:

We have a conductive material (for example, copper wire) to allow the electric charge to flow.
We have a closed circuit, so a circuit without interruption provides a continuous path to the current flow.
We have a source of energy, which is a potential difference source called voltage.

The voltage is measured with volts (V) and produces an electric field to allow the electric charge to flow in the circuit. Both the USB port and battery are potential difference sources. The symbolic representation of a power source is given in the following figure:

Figure 1.10: Battery symbol representation

To avoid constantly referring to V+ and V-, we will define the battery’s negative terminal as a reference by convention, assigning it 0 V (GND).

Ohm’s law relates voltage and current, which says through the following formula that the current through a conductor is proportional to the voltage across a resistor:

A resistor is an electrical component used to reduce the current flow. This component, whose symbolic representation is reported in the following figure, has a resistance measured with Ohm () and identified with the letter R:

A picture containing shape

Description automatically generated

Figure 1.11: Resistor symbol representation

Resistors are essential components for any electronic circuit, and for those used in our projects, their value is reported through colored bands on the elements. Standard resistors have four, five, or six bands. The color on the bands denotes the resistance value, as illustrated in the following example via the different shades:

Figure 1.12: Example of a four-band resistor

To easily decode the color bands, we recommend using the online tool at Digi-Key (https://www.digikey.com/en/resources/conversion-calculators/conversion-calculator-resistor-color-code).

With an understanding of the main physical quantities governing electronic circuits, we are now prepared to talk about the difference between power and energy.

Power versus energy

Sometimes, we interchange the words power and energy because we believe they are the same. However, although they are related, they represent distinct physical quantities. Energy is the capacity for doing work (for example, using force to move an object), while power is the energy consumption rate.

In practical terms, power indicates how fast we drain the battery, so high power implies a faster discharge.

Power and energy are related to voltage and current through the following formulas:

The following table presents the physical quantities reported in the power and energy formulas:

Table

Description automatically generated

Figure 1.13: Table reporting the physical quantities in the power and energy formulas

On microcontrollers, the voltage supply is in the order of a few volts (for example, 3.3 V), while the current consumption is in the range of microampere () or milliampere (mA). For this reason, we commonly refer to microwatt () or milliwatt (mW) for power and microjoule () or millijoule (mJ) for energy.

Now, consider the following problem to familiarize yourself with the presented concepts.

Suppose you have a processing task, and you have the option to run it on two different processors with the following power consumptions in the active state:

Figure 1.14: Table reporting two processing units with different power consumptions

What processor would you use to run the task?

Although PU1 has higher (4x) power consumption than PU2, this does not imply that PU1 is less energy efficient. On the contrary, PU1 could be more computationally performant than PU2 (for example, 8x), making it the best choice from an energy perspective, as demonstrated by the following calculations:

Based on the preceding example, we can conclude that PU1 is our better choice because it needs less energy from the battery under the same workload.

Commonly, we adopt OPS per Watt (arithmetic operations performed per Watt) to bind the power consumption to the computational resources of our processors.

In terms of power and energy concepts, that is all we need to know about it. Therefore, the only remaining aspect to discuss concerns the devices used for our tinyML projects: the microcontrollers.

Programming microcontrollers

A microcontroller, often shortened to MCU, is a full-fledged computer because it consists of a processor (which can also be multicore nowadays), a memory system, and some peripherals. Unlike a standard computer, a microcontroller fits entirely on an integrated chip, is incredibly low-power, and is inexpensive.

We often confuse microcontrollers with microprocessors, but they refer to different devices. In contrast to a microcontroller, a microprocessor integrates only the processor on a chip, requiring external connections to a memory system and other components to form a fully operating computer.

The following figure summarizes the main differences between a microprocessor and a microcontroller:

Diagram, schematic

Description automatically generated

Figure 1.15: Microprocessor versus microcontroller

As for all processing units, the target application influences their architectural design choice.

For example, a microprocessor tackles scenarios where the tasks are usually as follows:

Dynamic, which means they can change with user interactions or time
General-purpose
Compute-intensive

A microcontroller addresses completely different scenarios, as the applications can:

Be single-purpose and repetitive
Have time frame constraints
Be battery-powered
Need to fit in a small physical space
Be cost-effective

Tasks are generally single-purpose and repetitive. Therefore, the microcontroller does not require strict re-programmability. Typically, microcontroller applications are less computationally intensive than microprocessor ones and do not have frequent interactions with the user. However, they can interact with the environment or other devices. As an example, consider the thermostat.

The device only requires monitoring the temperature regularly and communicating with the heating system.

Sometimes, tasks must be executed within a specific time frame. This requirement is characteristic of real-time applications (RTAs), where the violation of the time constraint may affect the quality of service (soft real time) or be hazardous (hard real time). A car’s anti-lock braking system (ABS) is an example of a hard RTA because the electronic system must respond within a time frame to prevent the wheels from locking when applying brake pedal pressure.

RTA applications require a latency-predictable device, so all hardware components (CPU, memory, interrupt handler, and so on) must respond in a precise number of clock cycles.

Hardware vendors commonly report latency in the datasheet, expressed in clock cycles.

The time constraint poses some architectural design adaptations and limitations for a general-purpose microprocessor. For instance, the memory management unit (MMU), used to translate virtual memory addresses, is generally not integrated into CPUs for microcontrollers.

Microcontroller applications can be battery-powered, as the device has been designed to be low-power. As per the time frame constraints, power consumption also poses some architectural design differences from a microprocessor. Without going deeper into the hardware details, all the off-chip components generally reduce power efficiency as a rule of thumb. That is the main reason microcontrollers typically integrate memories within a chip.

Microcontrollers typically have lower clock frequency than microprocessors to consume less energy.

Microcontrollers are also an ideal choice for building products that need a compact physical footprint and cost-effectiveness. Since these devices are computers within a chip, the package size is typically a few square millimeters and is economically more advantageous than microprocessors.

In the following table, we have summarized what we have just discussed for easy future reference:

Figure 1.16: Table comparing a microprocessor with a microcontroller

In the next section, we will go deeper into microcontrollers’ architectural aspects by analyzing the memory architecture and internal peripherals crucial for ML model deployment.

Memory architecture

Microcontrollers are CPU-based embedded systems, meaning the CPU is responsible for interacting with all its subcomponents.

All CPUs require at least one memory to read the instructions and store/read variables during the program’s execution. In the microcontroller context, we typically dedicate two separate memories for the instructions and data: program and data memory.

Program memory is non-volatile read-only memory (ROM) reserved for the program to execute. Although its primary goal is to contain the program, it can also store constant data. Thus, program memory is similar to our everyday computers’ hard drives.

Data memory is volatile memory reserved to store/read temporary data. Therefore, it operates similarly to RAM in a personal computer, as its contents are lost when switching off the system.

Given the different program and data memory requirements, we usually employ other semiconductor technologies. In particular, we can find flash technologies for the program memory and static random-access memory (SRAM) for the data memory.

Flash memories are non-volatile and offer low power consumption but are generally slower than SRAM. However, given the cost advantage over SRAM, we can find larger program memory than data memory.

Now that you know the difference between program and data memory, where would you store the weights for a deep neural network model?

The answer to this question depends on whether the model has constant weights. If the weights are constant during inference, it is more efficient to store them in program memory for the following reasons:

Program memory has more capacity than SRAM.
It reduces memory pressure on the SRAM, since other functions require storing variables or chunks of memory at runtime.

We want to remind you that microcontrollers have limited memory resources, so a decision like this can significantly reduce SRAM memory usage.

Microcontrollers offer extra on-chip features to expand their capabilities and make these tiny computers different from each other. These features are the peripherals, which are discussed in the upcoming subsection.

Peripherals

Peripherals are essential in microcontrollers to interface with sensors or other external components.

Each peripheral has a dedicated functionality and is assigned to a metal leg (pin) of the integrated circuit.

You can refer to the peripheral pin assignment section in the microcontroller datasheet to find out each pin’s functionalities.

Hardware vendors typically number the pins anti-clockwise, starting from the top-left corner of the chip, marked with a dot for easy reference, as shown in Figure 1.17:

Figure 1.17: Viewed from the top, pins are numbered anti-clockwise, starting from the top-left corner, marked with a dot

Peripherals can be of various types, and the following subsection will provide a brief overview of those commonly integrated into microcontrollers.

General-purpose input/output (GPIO or IO)

GPIOs do not have a predefined and fixed purpose. Their primary function is to provide or read binary signals that, by nature, can only live in two states: HIGH (1) or LOW (0). The following figure shows an example of a binary signal:

A diagram of a high voltage

Description automatically generated

Figure 1.18: Binary signal

Typical GPIO usages are as follows:

Turning on and off an LED
Detecting whether a button is pressed
Implementing complex digital interfaces/protocols such as VGA

GPIO peripherals are versatile and generally available in all microcontrollers. We will use this peripheral often, such as turning on and off LEDs or detecting whether a button has been pressed.

Analog/digital converters

When developing tinyML applications, we will likely deal with time-varying physical quantities, such as images, audio, and temperature.

Whatever these quantities are, the sensor transforms them into a continuous electrical signal interpretable by the microcontrollers. This electrical signal, which can be either a voltage or current, is called an analog signal.

The microcontroller, in turn, needs to convert the analog signal into a digital format so that the CPU can process the data.

Analog/digital converters act as translators between analog and digital worlds. Thus, we have the analog-to-digital converter (ADC) that converts the electrical signal into a digital format, and the digital-to-analog converter (DAC), which performs the opposite functionality.

In this book, we will use this peripheral to transform the analog signal the microphone generates into a digital format.

Serial communication

Communication peripherals integrate standard communication protocols to control external components. Typical serial communication peripherals available in microcontrollers are I2C, SPI, UART (commonly called serial), and USB.

The serial peripheral will be used extensively in our projects to transmit messages from the microcontroller to our computer (we’ll refer to this communication as over the serial throughout this book). For example, we will use this peripheral to debug our applications and generate media files.

Timers

In contrast to all the peripherals we just described, timers do not interface with external components, since they are used to trigger or synchronize events. For example, a timer can be set up to acquire data from a sensor at a specific time interval.

Having covered the topic of peripherals, we have completed our overview of the tinyML ingredients. With a grasp of the relevant terminology and fundamental concepts about ML, power/energy consumption, and microcontrollers, we can now introduce the development platforms used in this book.

Introduction to the development platforms

The development platforms used in this book are microcontroller boards. A microcontroller board is a printed circuit board (PCB) that combines a microcontroller with the necessary electronic circuit to make it ready for use. In some cases, these platforms could also include additional devices, such as sensors or additional external memory, to target specific end applications.

The Arduino Nano 33 BLE Sense (Arduino Nano for short), Raspberry Pi Pico, and the SparkFun RedBoard Artemis Nano (SparkFun Artemis Nano for short) are the microcontroller boards used in this book.

As we will see in more detail in the upcoming subsections, the platforms have an incredibly small form factor, a USB port for power/programming, and an Arm-based microcontroller. At the same time, they also have unique features that make them ideal for targeting different development scenarios.

Arduino Nano 33 BLE Sense

The Arduino Nano, designed by Arduino (https://www.arduino.cc), is a versatile platform suitable for various tinyML applications. It integrates the nRF52840 microcontroller, powered by an Arm Cortex-M4 CPU that runs at 64 MHz, as well as 1 MB of program memory and 256 KB of data memory, along with various sensors and a Bluetooth radio:

Diagram

Description automatically generated with low confidence

Figure 1.19: Arduino Nano board

When developing on the Arduino Nano, we only need to add a few additional external components, as most are already on-board.

The Arduino Nano 33 BLE Sense underwent an upgrade to the Rev2 version in 2023. This updated version retains the same form factor and processor as the Rev1 but includes enhanced sensors to cover a broader range of applications. The projects featured in this book are compatible with both the Rev1 and Rev2 versions.

Raspberry Pi Pico

Raspberry Pi Pico, designed by Raspberry Pi (https://www.raspberrypi.org), does not provide sensors and the Bluetooth module on-board. Still, it has the RP2040 microcontroller powered by a dual-core Arm Cortex-M0+ processor, running at 13 3MHz with 264 KB of SRAM. The device boasts an external flash memory of 2 MB for the program, making it an excellent choice for tinyML applications that require speed and memory space:

A computer parts and information

Description automatically generated with medium confidence

Figure 1.20: Raspberry Pi Pico board

In this book, this board will be ideal for learning how to interface with external sensors and build electronic circuits.

SparkFun RedBoard Artemis Nano

The SparkFun RedBoard Artemis Nano, designed by SparkFun Electronics (https://www.sparkfun.com/), is a platform that integrates the Apollo3 microcontroller, powered by an Arm Cortex-M4F processor running at 48 MHz with 1 MB of program memory and 384 KB of data memory.

The platform also boosts a digital microphone, making it ideal for those interested in developing always-on voice command applications:

Figure 1.21: SparkFun RedBoard Artemis Nano

This platform is optional but recommended to grasp the concepts presented in the recipes for the Arduino Nano and Raspberry Pi Pico, using an alternative device.

This book will not include a comprehensive discussion about projects for the SparkFun RedBoard Artemis Nano. However, when you come across the There’s more…with the SparkFun Artemis Nano! section at the end of a recipe, you can find the instructions to replicate it on this device.

Although the book will not discuss projects for the SparkFun RedBoard Artemis Nano, the source code for this platform will be accessible on GitHub.

Setting up the software development environment

To develop tinyML applications, we require different software tools and frameworks to cover both ML development and embedded programming.

In the following subsection, we will start by introducing the Arduino development environment used to write and upload programs to the Arduino Nano, Raspberry Pi Pico, and the SparkFun RedBoard Artemis Nano.

Getting ready with Arduino IDE

Arduino Integrated Development Environment (Arduino IDE) is a software application developed by Arduino (https://www.arduino.cc/en/software) to write and upload programs to Arduino compatible boards.

The Arduino Nano, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano are Arduino compatible boards.

Programs are written in C++ and are commonly called sketches by Arduino programmers.

Arduino IDE makes software development accessible and straightforward to developers with no background in microcontroller programming. In fact, the tool abstracts all the complexities we might have when dealing with these platforms, such as cross-compilation and device programming.

To download, install, and set up the Arduino IDE on your computer, you can follow the instructions provided at the following link: https://github.com/PacktPublishing/TinyML-Cookbook_2E/blob/main/Docs/setup_local_arduino_ide.md.

In addition to the standalone version, Arduino offers a browser-based IDE called the Arduino Web Editor (https://create.arduino.cc/editor). The Arduino Web Editor enables even more streamlined programmability, as programs can be written, compiled, and uploaded directly from the web browser to microcontrollers.

To install the Arduino Web Editor, you can follow the guide available on the Arduino website: https://docs.arduino.cc/learn/starting-guide/the-arduino-web-editor.

The free version of the Arduino Web Editor has a daily compilation time limit of 200 seconds. Therefore, users may want to upgrade to a paid plan or use the free local Arduino IDE to avoid the compilation time constraint and have unlimited compilation time.

The Arduino projects presented in this book for the Arduino Nano and Raspberry Pi Pico are compatible with both IDEs, although the screenshots exclusively showcase the cloud-based Arduino Web Editor. However, the SparkFun RedBoard Artemis Nano projects can only be developed using the local Arduino IDE.

To install the SparkFun RedBoard Artemis Nano board in the Arduino IDE, you must follow the instructions provided at the following link: https://github.com/PacktPublishing/TinyML-Cookbook_2E/blob/main/Docs/setup_sparkfun_artemis_nano.md.

From now on, we will use the term Arduino IDE interchangeably for both the Arduino Web Editor and the local Arduino IDE. However, when mentioning the SparkFun RedBoard Artemis Nano, the Arduino IDE will specifically denote the local version.

Having introduced the development environment for microcontroller programming, let’s now introduce the framework and software environment to train ML models, which are TensorFlow and Google Colaboratory.

Getting ready with TensorFlow

TensorFlow (https://www.tensorflow.org) is an end-to-end free and open-source software platform developed by Google for ML. We will use this software to build and train our ML models, using Python in Google Colaboratory.

Colaboratory (https://colab.research.google.com/notebooks) – Colab for short– is a free Python development environment that runs in the browser using Google Cloud. It is like a Jupyter notebook but has some essential differences, such as the following:

It does not need setting up.
It is cloud-based and hosted by Google.
There are numerous Python libraries pre-installed (including TensorFlow).
It is integrated with Google Drive.
It offers free access to GPU and TPU shared resources.
It is easy to share (also on GitHub).

Therefore, TensorFlow does not require setting up because Colab comes with it.

In Colab, we recommend enabling the GPU acceleration on the Runtime tab to speed up the computation on TensorFlow. To do so, navigate to Runtime | Change runtime type, and select GPU from the Hardware accelerator drop-down list, as shown in Figure 1.22:

Graphical user interface, text, application

Description automatically generated

Figure 1.22: Hardware accelerator drop-down list

Since the GPU acceleration is a shared resource among other users, there is limited access to the free version of Colab.

You could subscribe to Colab Pro (https://colab.research.google.com/) to get priority access to the fastest GPUs.

TensorFlow is not the only software from Google that we will use. In fact, once we have produced the ML model, we will need to run it on the microcontroller. For this, Google developed TensorFlow Lite for Microcontrollers.

TensorFlow Lite forMicrocontrollers (https://www.tensorflow.org/lite/microcontrollers) – tflite-micro for short– is the crucial software library to unlock ML applications on low-power microcontrollers. The project is part of TensorFlow and allows you to run DL models on devices with a few KB of memory. Written in C/C++, the library does not require an operating system and dynamic memory allocation.

To build a tflite-micro-based application into any Arduino project, you first need to create the Arduino TensorFlow Lite library (https://github.com/tensorflow/tflite-micro-arduino-examples) and then import it into the Arduino IDE.

For your convenience, we have already produced this library, which is compatible with the Arduino Nano, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano and is available at the following link: https://github.com/PacktPublishing/TinyML-Cookbook_2E/blob/main/ArduinoLibs/Arduino_TensorFlowLite.zip.

At the moment, you do not need to import this library. When it is time to deploy the ML models on microcontrollers, we will guide you through the precise steps to import the library into the Arduino IDE.

For those interested in the process of creating the Arduino TensorFlow Lite library, we have outlined the steps on GitHub, which can be found at the following link: https://github.com/PacktPublishing/TinyML-Cookbook_2E/blob/main/Docs/build_arduino_tflitemicro_lib.md.

In this book, TensorFlow won’t be our only avenue to design and train ML models. Another framework will accompany us in preparing ML models for microcontrollers. This framework is Edge Impulse.

Getting ready with Edge Impulse

Edge Impulse (https://www.edgeimpulse.com) is an all-in-one software platform for ML development from data acquisition to model deployment. It is free for developers, and in a few minutes, we can have an ML model up and running on our microcontrollers. This platform features a wide range of integrated tools for the following:

Data acquisition from sensor data
Data labeling
Applying digital signal processing routines on the input data
Designing, training, and testing ML models via a user-friendly interface
Deploying ML models on microcontrollers
AutoML

Developers just need to sign up on the Edge Impulse website to access all these features directly within the user interface (UI).

We are approaching the end of this first chapter. However, before we wrap up, we want to ensure we can successfully run a basic sketch on our microcontrollers. Therefore, in the upcoming section, we will build a simple Arduino pre-built application, marking the beginning of our journey into tinyML.

Deploying a sketch on microcontrollers

Following the introductory section, we will delve into our first recipe to familiarize ourselves with the Arduino IDE and better understand how to compile and upload a sketch on an Arduino platform. To accomplish this objective, we will use a pre-built Arduino sketch to blink the LED on our microcontroller boards.

Getting ready

An Arduino sketch consists of two functions, setup() and loop(), as shown in the following code block:

void setup() {
}
void loop() {
}

The setup() function is the first function executed by the program when we press the reset button or power up the board. This function is executed only once and is generally responsible for initializing variables and peripherals.

After the setup() function, the program executes the loop() one, which runs iteratively and forever, as illustrated in the following diagram:

Text

Description automatically generated with medium confidence

Figure 1.23: The setup() function runs once

These two functions are required in all Arduino programs.

How to do it…

Open the Arduino IDE, and follow the steps to make the on-board LED of our microcontroller boards blink:

Step 1:

Connect either the Arduino Nano or Raspberry Pi Pico to a laptop/PC through the micro-USB data cable. Next, check that the Arduino IDE reports the board’s name and serial port in the device drop-down menu:

A close-up of a computer code

Description automatically generated

Figure 1.24: The device drop-down menu reporting the board’s name and serial port

If you have connected the Arduino Nano, the device drop-down menu in the Arduino IDE should report Arduino Nano 33 BLE as the board’s name, as shown in Figure 1.24.

Instead, if you connect the Raspberry Pi Pico, the Arduino IDE should report Raspberry Pi Pico as the board’s name.

Near the board’s name, you can also find the serial port. The serial port, which in Figure 1.24 is /dev/ttyACM0, depends on the operating system (OS) and the device driver. This serial port will be our bridge for communication between the microcontroller and the computer.

Step 2:

Open the prebuilt Blink example by clicking on Examples from the left-hand side menu, BUILT IN from the new menu, and then Blink, as shown in the following screenshot:

Graphical user interface, application

Description automatically generated

Figure 1.25: Built-in LED blink example

Once you have clicked on the Blink sketch, the code will be visible in the editor area.

Step 3:

Click on the arrow on the left of the board dropdown to compile and upload the program to the target device, as shown in Figure 1.26:

Figure 1.26: The arrow on the left of the board dropdown will compile and flash the program on the target device

In embedded programming, we generally use the term flashing when referring to the uploading of the program to the microcontroller.

The console output should return Done at the bottom of the page, and the on-board LED should start blinking, which means the sketch has been successfully compiled and uploaded to the microcontroller!

There’s more…with the SparkFun Artemis Nano!

The LED blinking sketch we just uploaded on the Arduino Nano and Raspberry Pi Pico is also available for the SparkFun Artemis Nano microcontroller.

In the local Arduino IDE, the Blink example is in File -> Examples -> 01.Basics -> Blink:

A screenshot of a computer

Description automatically generated

Figure 1.27: Built-in LED blink example in the local Arduino IDE

Once you click the Blink example, a new window with the sketch will be displayed. Before compiling the program, connect the SparkFun Artemis Nano to a laptop/PC through the USB-C data cable and make sure the device drop-down menu shows RedBoard Artemis Nano as the board’s name:

Figure 1.28: The device drop-down menu reporting the SparkFun Artemis Nano board

Then, click on the arrow on the left of the board dropdown to compile and upload the program to the target device. After a few seconds, the console output should return Upload complete, and the on-board LED of the SparkFun Artemis Nano should start blinking!