Designing and writing software for embedded systems poses a different set of challenges than traditional high-level software development. This chapter gives an overview of these challenges and introduces the basic components and the platform that will be used as a reference in this book.
In this chapter, we will discuss the following topics:
- Domain definition
- Flash memory
- Interfaces and peripherals
- Connected systems
- The reference platform
Embedded systems are computing devices performing specific, dedicated tasks with no direct or continued user interaction. Due to the variety of markets and technologies, these objects have different shapes and sizes, but often all have a small size and a limited amount of resources.
In this book, the concepts and the building blocks of embedded systems are analyzed through the development of the software components that interact with its resources and peripherals. The first step is to define the scope for the validity of the techniques and the architectural patterns explained here, within the broader definition of embedded systems.
One part of the embedded market relies on devices with enough power and resources to run a variant of the GNU/Linux operating system. These systems, often referred to as embedded Linux, are outside the scope of this book, as their development includes different strategies of design and integration of the components. A typical hardware platform that is capable of running a system based on the Linux kernel is equipped with a reasonably large amount of RAM, up to a few gigabytes, and sufficient storage space on board to store all the software components provided in the GNU/Linux distribution. Additionally, for Linux memory management to provide separate virtual address spaces to each process on the system, the hardware must be equipped with a memory management unit (MMU), a hardware component that assists the OS in translating physical addresses to virtual addresses, and vice versa, at runtime.
This class of devices presents different characteristics that are often overkill for building tailored solutions, which can use a much simpler design and reduce the production costs of the single units. Hardware manufacturers and chip designers have researched new techniques to improve the performance of microcontroller-based systems, providing, in the past decade, a new generation of platforms that would cut hardware costs, firmware complexity, size, and power consumption to provide a set of features that are most interesting for the embedded market.
Due to their specifications, in some real-life scenarios, embedded systems must be able to execute a series of tasks within a short, measurable, and predictable amount of time. These kinds of systems are called real-time systems, and differ from the approach of multi-task computing used in desktops, servers, and mobile phones. Real-time processing is a goal that is extremely hard, if not impossible, to reach on embedded Linux platforms. The Linux kernel is not designed for hard real-time processing, and even if patches are available to modify the kernel scheduler to help meet these requirements, the results are not comparable to bare-metal, constrained systems that are designed with this purpose in mind.
Some other application domains, such as battery-powered and energy-harvesting devices, can benefit from the low power consumption capabilities of smaller embedded devices and the energy efficiency of the wireless communication technologies often integrated in embedded connected devices. The higher amount of resources and the increased hardware complexity of Linux-based systems often does not scale down enough on energy levels, or requires much more effort to meet similar figures in power consumption.
The type of microcontroller-based systems that we'll analyze in this book are 32-bit systems, capable of running software in a single-threaded, bare-metal application as well as integrating minimalist real-time operating systems, which are very popular in the industrial manufacturing of embedded systems that we use on a daily basis to accomplish specific tasks, and are becoming more and more adopted to define more generic, multiple-purpose development platforms.
In the past, 8-bit microcontrollers have dominated the embedded market. The simplicity of their design allows us to write small applications that can accomplish a set of pre-defined tasks, but are too simple and usually equipped with way too few resources to implement an embedded system, especially since 32-bit microcontrollers have evolved to cover all the use cases for these devices within the same range of price, size, and power consumption.
8-bit microcontrollers nowadays are mostly relegated to the market of educational platform kits, aimed at introducing hobbyists and newcomers to the basics of software development on electronic devices. 8-bit platforms are not covered in this book, because they lack the characteristics that allow advanced system programming, multithreading, and advanced features developed to build professional embedded systems.
In the context of this book, the term embedded systemsis used to indicatea class of systemsrunning onmicrocontroller-based hardware architecture, offering constrained resources but allowing to buildreal-timesystems,throughfeaturesprovided by the hardware architectureto implementsystem programming.
The architecture of an embedded system is centered around its microcontroller, also sometimes referred to as the microcontroller unit (MCU), typically a single integrated circuit containing the processor, RAM, flash memory, serial receivers and transmitters, and other core components. The market offers many different choices among architectures, vendors, price range, features, and integrated resources. These are typically designed to be inexpensive, low-resource, low-energy consuming, self-contained systems on a single integrated circuit, which is the reason why they are often referred to as System-on-Chip (SoC).
Due to the variety of processors, memories, and interfaces that can be integrated, there is no actual reference architecture for microcontrollers. Nevertheless, some architectural elements are common across a wide range of models and brands, and even across different processor architectures.
Some microcontrollers are dedicated to specific applications and exposing a particular set of interfaces to communicate to peripherals and to the outside world. Others are focused on providing solutions with reduced hardware costs, or with very limited energy consumption. Nevertheless, the following set of components is hardcoded into almost every microcontroller:
- Flash memory
- Serial transceivers
Additionally, more and more devices are capable of accessing a network, to communicate with other devices and gateways. Some microcontrollers may provide either well-established standards, such as Ethernet or Wi-Fi interfaces, or specific protocols specifically designed to meet the constraints of embedded systems, such as Sub-GHz radio interfaces or Controller Area Network (CAN) bus, being partially or fully implemented within the IC.
All the components must share a bus line with the processor, which is responsible for coordinating the logic. The RAM, flash memory, and control registers of the transceivers are all mapped in the same physical address space:
A simplified block diagram of the components inside a generic microcontroller
The addresses where the RAM and flash are mapped to depend on the specific model, and are usually provided in the datasheet. A microcontroller is able to run code in its own native machine language, a sequence of instructions conveyed into a binary file that is specific for the architecture it is running on. By default, compilers provide a generic executable file as the output of the compilation and assembly operations, which needs to be converted into the format that is executable for the target.
The processor is designed to execute the instructions stored, in its own specific binary format directly from RAM as well as from its internal flash memory, usually mapped starting from position zero in memory, or from another well-known address specified in the microcontroller manual. The CPU can fetch and execute code from RAM faster, but the final firmware is stored in the flash memory, which is usually bigger than the RAM on almost all microcontrollers, and permits it to retain the data across power cycles and reboots.
Compiling a software operating environment for an embedded microcontroller and loading it onto the flash memory requires a host machine, which is a specific set of hardware and software tools. Some knowledge about the target device's characteristics is also needed to instruct the compiler to organize the symbols within the executable image. For many valid reasons, C is the most popular language in embedded software, although not the only available option. Higher-level languages, such as Rust and C++, can produce embedded code when combined with a specific embedded runtime, or even in some cases by entirely removing the runtime support from the language. This book will focus entirely on C code, because it abstracts less than any other high-level language, thus making easier to describe the behavior of the underlying hardware while looking at the code.
All modern embedded systems platforms also have at least one mechanism (such as JTAG) for debugging purposes, and to upload the software to the flash. When the debug interface is accessed from the host machine, a debugger can interact with the breakpoint unit in the processor, interrupting and resuming the execution, and can also read and write from any address in memory.
A significant part of embedded programming is the communication with the peripherals, using the interfaces that the MCU exposes. Embedded software development requires basic knowledge of electronics, the ability to understand schematics and datasheets, and confidence with the measurement tools, such as logic analyzers or oscilloscopes.
Approaching embedded development means keeping the focus on the specifications as well as the hardware restrictions at all times. Embedded software development is a constant challenge to focus on the most efficient way to perform a set of specific tasks, but keeping in strong consideration the limited resources available. There are a number of compromises to deal with, which are uncommon in other environments. Here are some examples:
- There might be not enough space in the flash to implement a new feature
- There might not be enough RAM to store complex structures or make copies of large data buffers
- The processor might be not fast enough to accomplish all the required calculations and data processing in due time
- Battery-powered and resources-harvesting devices might require lower energy consumption to meet lifetime expectations
Moreover, PC and mobile operating systems make large use of the MMU, a component of the processor that allows runtime translations between physical and virtual addresses. The MMU is a necessary abstraction to implement address space separation among the tasks, and between the tasks and the kernel itself. Embedded microcontrollers do not have an MMU, and usually lack the amount of non-volatile memory required to store kernel, applications, and libraries. For this reason, embedded systems are often running in a single task, with a main loop performing all the data processing and communication in a specific order. Some devices can run embedded operating systems, which are far less complex than their PC counterparts.
Application developers often see the underlying system as a commodity, while embedded development often means that the entire system has to be implemented from scratch, from the boot procedure up to the application logic. In an embedded environment, the various software components are more closely related to each other, because of the lack of more complex abstractions, such as memory separations between the processes and the operating system kernel. A developer approaching embedded systems for the first time might find testing and debugging on some of the systems a bit more intricate than just running the software and reading out the results. This becomes especially true on those systems that have been designed with little or no human interaction interfaces.
A successful approach requires a healthy workflow, which includes well-defined test cases, a list of key performance indicators coming from the analysis of the specifications to identify possibilities of trade-offs, a number of tools and procedures at hand to perform all the needed measurements, and a well-established and efficient prototyping phase.
In this context, security deserves some special consideration. As usual, when writing code at the system level, it is wise to keep in mind the system-wide consequences of possible faults. Most embedded application code run with extended privileges on the hardware, and a single task misbehaving can affect the stability and the integrity of the entire firmware. As we will see, some platforms offer specific memory-protection mechanisms and built-in privilege separation, which are useful for building fail-safe systems even in the absence of a full operating system based on separating process address spaces.
The most popular type of design for embedded software is based on a single loop-based sequential execution model, where modules and components are connected to each other exposing callback interfaces. However, modern microcontrollers offer features and core logic characteristics that can be used by system developers to build a multi-tasking environment to run logically separated applications.
These features are particularly handy in the approach to more complex real-time systems, and interesting to understand the possibilities of the implementation of safety models based on process isolation and memory segmentation.
"640 KB of memory ought to be enough for everyone"
– Bill Gates (founder and former director of Microsoft)
This famous statement has been cited many times in the past three decades to underline the progress in technology and the outstanding achievements of the PC industry. While it may sound like a joke for many software engineers out there, it is still in these figures that embedded programming has to be thought about, more than 30 years after MS-DOS was initially released.
Although most embedded systems are capable of breaking that limit today, especially due to the availability of external DRAM interfaces, the simplest devices that can be programmed in C may have as little as 4 KB of RAM available to implement the entire system logic. Obviously this has to be taken into account when approaching the design of an embedded system, by estimating the amount of memory potentially needed for all the operations that the system has to perform, and the buffers that may be used at any time to communicate with peripherals and nearby devices.
The memory model at the system level is simpler than that of PCs and mobile devices. Memory access is typically done at the physical level, so all the pointers in your code are telling you the physical location of the data they are pointing to. In modern computers, the operating system is responsible for translating physical addresses to a virtual representation of the running tasks. The advantage of the physical-only memory access on those systems that do not have an MMU is the reduced complexity of having to deal with address translations while coding and debugging. On the other hand, some of the features implemented by any modern OS, such as process swapping and dynamically resizing address spaces through memory relocation, become cumbersome and sometimes impossible.
Handling memory is particularly important in embedded systems. Programmers who are used to writing application code expect a certain level of protection to be provided by the underlying OS. In fact, a virtual address space does not allow memory areas to overlap, and the OS can easily detect unauthorized memory accesses and segmentation violations, it then promptly terminates the process and avoids having the whole system compromised. On embedded systems, especially when writing bare-metal code, the boundaries of each address pool must be checked manually. Accidentally modifying a few bits in the wrong memory, or even accessing a different area of memory, may result in a fatal, irrevocable error. The entire system may hang, or, in the worst case, become unpredictable. A safe approach is required when handling memory in embedded systems, in particular when dealing with life-critical devices. Identifying memory errors too late in the development process is complex and often requires more resources than forcing yourself to write safe code and protecting the system from a programmer's mistakes.
Proper memory-handling techniques are explained in Chapter 5, General Purpose Peripherals.
In a server or a personal computer, the executable applications and libraries reside in storage devices. At the beginning of the execution, they are accessed, transformed, possibly uncompressed, and stored in RAM before the execution starts.
The firmware of an embedded device is in general one single binary file containing all the software components, which can be transferred to the internal flash memory of the MCU. Since the flash memory is directly mapped to a fixed address in the memory space, the processor is capable of sequentially fetching and executing single instructions from it with no intermediate steps. This mechanism is called execute in place (XIP). All non-modifiable sections on the firmware don't need to be loaded in memory, and are accessible through direct addressing in the memory space. This includes not only the executable instructions, but also all the variables that are marked as constant by the compiler. On the other hand, supporting XIP requires a few extra steps in the preparation of the firmware image to be stored in flash, and the linker needs to be instructed about the different memory-mapped areas on the target.
The internal flash memory mapped in the address space of the microcontroller is not accessible for writing. Altering the content of the internal flash can be done only by using a block-based access, due to the hardware characteristics of flash memory devices. Before changing the value of a single byte in flash memory, in fact, the whole block containing it must be erased and rewritten. The mechanism offered by most manufacturers to access block-based flash memory for writing is known as In-Application Programming (IAP). Some filesystem implementations take care of abstracting write operations on a block-based flash device, by creating a temporary copy of the block where the write operation is performed.
During the selection of the components for a microcontroller-based solution, it is vital to properly match the size of the flash memory to the space required by the firmware. The flash is in fact often one of the most expensive components in the MCU, so for a deployment on a large scale, choosing an MCU with a smaller flash could be more cost-effective. Developing software with code size in mind is not very usual nowadays within other domains, but it may be required when trying to fit multiple features in such little storage. Finally, compiler optimizations may exist on specific architectures to reduce code size when building the firmware and linking its components.
Additional non-volatile memories that reside outside of the MCU silicon can typically be accessed using specific interfaces, such as Serial Peripheral Interface. External flash memories use different technologies than the internal flash, which is designed to be fast and execute code in place. While being generally more dense and less expensive, external flash memories do not allow direct memory mapping in the physical address space, which makes them not suitable for storing firmware images, as it would be impossible to execute the code fetching the instructions sequentially, unless a mechanism is used to load the executable symbols in RAM, because read access on these kinds of devices is performed one block at a time. On the other hand, write access may be faster when compared to IAP, making these kinds of non-volatile memory devices ideal for storing data retrieved at runtime in some designs.
In order to communicate with peripherals and other microcontrollers, a number of de facto standards are well established in the embedded world. Some of the external pins of the microcontroller can be programmed to carry out communication with external peripherals using specific protocols. A few of the common interfaces available on most architectures are:
- Asynchronous UART-based serial communication
- Serial Peripheral Interface (SPI) bus
- Inter-integrated circuit (I2C) bus
- Universal Serial Bus (USB)
Asynchronous communication is provided by the Universal Asynchronous Receiver-Transmitter (UART). These kind of interfaces, commonly simply known as serial ports, are called asynchronous because they do not need to share a clock signal to synchronize the sender and the receiver, but rather work on pre-defined clock rates that can be aligned while the communication is ongoing. Microcontrollers may contain multiple UARTs that can be attached to a specific set of pins upon request. Asynchronous communication is provided by UART as a full-duplex channel, through two independent wires, connecting the RX pin of each endpoint to the TX pin on the opposite side.
To properly understand each other, the systems at the two endpoints must set up the UART using the same parameters. This includes the framing of the bytes on the wire, and the frame rate. All of these parameters have to be known in advance by both endpoints in order to correctly establish a communication channel. Despite being simpler than the other types of serial communication, UART-based serial communication is still widely used in electronic devices, particularly as an interface toward modems and GPS receivers. Furthermore, using TTL-to-USB serial converters, it is easy to connect a UART to a console on the host machine, which is often handy for providing log messages.
- Serial clock line to synchronize the endpoints
- Master-slave protocol
- One-to-many communication over the same three-wire bus
The master device, usually the microcontroller, shares the bus with one or more slaves. To trigger the communication, a separate slave select (SS) signal is used to address each slave connected to the bus. The bus uses two independent signals for data transfer, one per direction, and a shared clock line that synchronizes the two ends of the communication. Due to the clock line being generated by the master, the data transfer is more reliable, making it possible to achieve higher bitrates than ordinary UART. One of the keys for the continued success of SPI over multiple generations of microcontrollers is the low complexity required for the design of slaves, which can be as simple as a single shift register. SPI is commonly used in sensor devices, LCD displays, flash memory controllers, and network interfaces.
I2C is slightly more complex, and that is because it is designed with a different purpose in mind: interconnecting multiple microcontrollers, as well as multiple slave devices, on the same two-wire bus. The two signals are serial clock (SCL) and serial data (SDA). Unlike SPI or UART, the bus is half-duplex, as the two directions of the flow share the same signal. Thanks to a 7-bit slave-addressing mechanism incorporated in the protocol, it does not require additional signals dedicated to the selection of the slaves. Multiple masters are allowed on the same line, given that all the masters in the system follow the arbitration logic in the case of bus contention.
The USB protocol, originally designed to replace UART and include many protocols in the same hardware connector, is very popular in personal computers, portable devices, and a huge number of peripherals.
This protocol works in host-device mode, with one side of communication, the device, exposing services that can be used by the controller, on the host side. USB transceivers present in many microcontrollers can work in both modes. By implementing the upper layer of the USB standards, different types of devices can be emulated by the microcontroller, such as serial ports, storage devices, and point-to-point Ethernet interfaces, creating microcontroller-based USB devices that can be connected to a host system.
If the transceiver supports host mode, the embedded system can act as a USB host and devices can be connected to it. In this case, the system should implement device drivers and applications to access the functionality provided by the device.
An increasing number of embedded devices designed for different markets are now capable of network communication with their peers in the surrounding area, or with gateways routing their traffic to a broader network or to the internet. The term Internet of Things (IoT) has been used to describe the networks where those embedded devices are able to communicate using internet protocols. This means that IoT devices can be addressed within the network in the same way as more complex systems, such as PCs or mobile devices, and most importantly, use the transport layer protocols typical of internet communications to exchange data. TCP/IP is a suite of protocols standardized by the IETF, and it is the fabric of the infrastructure for the internet and other self-contained, local area networks. The Internet Protocol (IP) provides network connectivity, but on the condition that the underlying link provides packet-based communication and mechanisms to control and regulate access to the physical media. Fortunately, there are many network interfaces that meet these requirements. Alternative protocol families, which are not compatible with TCP/IP, are still in use in several distributed embedded systems, but a clear advantage of using the TCP/IP standard on the target is that, in the case of communication with non-embedded systems, there is no need for a translation mechanism to route the frames outside the scope of the LAN.
Besides the types of links that are widely used in non-embedded systems, such as Ethernet or wireless LAN, embedded systems can benefit from a wide choice of technologies that are specifically designed for the requirements introduced by IoT. New standards have been researched and put into effect to provide efficient communication for constrained devices, defining communication models to cope with specific resource usage limits and energy efficiency requirements.
Recently, new link technologies have been developed in the direction of lower bitrates and power consumption for wide-area network communication. These protocols are designed to provide narrow-band, long-range communication. The frame is too small to fit IP packets, so these technologies are mostly employed to transmit small payloads, such as periodic sensor data, or device configuration parameters if a bidirectional channel is available, and they require some form of gateway to translate the communication to travel across the internet.
The interaction with the cloud services, however, requires, in most cases, to connect all the nodes in the network, and to implement the same technologies used by the servers and the IT infrastructure directly in the host. Enabling TCP/IP communication on an embedded device is not always straightforward. Even though there are several open source implementations available, system TCP/IP code is complex, big in size, and often has memory requirements that may be difficult to meet.
The same observation applies for the Secure Socket Layer (SSL) / Transport Layer Security (TLS) library. Choosing the right microcontroller for the task is again crucial, and if the system has to be connected to the internet and support secure socket communication, then the flash and RAM requirements have to be updated in the design phase to ensure integration with third-party libraries.
- Selection of the correct technologies and protocols
- Limitations on bitrate, packet size, and media access
- Availability of the nodes
- Single points of failure in the topology
- Configuration of the routes
- Authentication of the hosts involved
- Confidentiality of the communication over the media
- Impact of buffering on network speed, latency, and RAM usage
- Complexity of the implementation of the protocol stacks
Chapter 9, Parallel Tasks and Scheduling, analyzes some of the link-layer technologies implemented in embedded systems to provide remote communication, integrating TCP/IP communication to the design of distributed systems integrated in IoT services.
The preferred design strategy for embedded CPU cores is reduced instruction set computer (RISC). Among all the RISC CPU architectures, a number of reference designs are used as guidelines by silicon manufacturers to produce the core logic to integrate into the microcontroller. Each reference design differs from the others in a number of characteristics of the CPU implementation. Each reference design includes one or more families of microprocessors integrated in embedded systems, sharing the following characteristics:
- Word size used for registers and addresses (8-bit, 16-bit, 32-bit, or 64-bit)
- Instruction set
- Register configurations
- Extended CPU features (interrupt controller, FPU, MMU)
- Caching strategies
- Pipeline design
Choosing a reference platform for your embedded system depends on your project needs. Smaller, less feature-rich processors are generally more suited to low energy consumption, have a smaller MCU packaging, and are less expensive. Higher-end systems, on the other hand, come with a bigger set of resources and some of them have dedicated hardware to cope with challenging calculations (such as a floating point unit, or an Advanced Encryption Standard (AES) hardware module to offload symmetric encryption operations). 8-bit and 16-bit core designs are slowly giving way to 32-bit architectures, but some successful designs remain relatively popular in some niche markets and among hobbyists.
ARM is the most ubiquitous reference design supplier in the embedded market, with more than 10 billion ARM-based microcontrollers produced for embedded applications. One of the most interesting core designs in the embedded industry is the ARM Cortex-M family, which includes a range of models scaling from cost-effective and energy-efficient, to high-performance cores specifically designed for multimedia microcontrollers. Despite ranging among three different instruction sets (ARMv6, ARMv7 and ARMv8), all Cortex-M CPUs share the same programming interface, which improves portability across microcontrollers in the same families.
Most of the examples in this book will be based on this family of CPUs. Though most of the concepts expressed will be applicable to other core designs as well, picking a reference platform now opens the door to a more complete analysis of the interactions with the underlying hardware. In particular, some of the examples in this book use specific assembly instructions from the ARMv7 instruction set, which is implemented in some Cortex-M CPU cores.
- ARMv6 (M0, M0+) or ARMv7 (M3, M4, M7) architecture
- Optional 8-region memory protection unit (MPU)
The total memory address space is 4 GB. The beginning of the internal RAM is typically mapped at the fixed address of
0x20000000. The mapping of the internal flash, as well as the other peripherals, depends on the silicon manufacturer. However, the highest 512 MB (
0xFFFFFFFF) addresses are reserved for the System Control Block (SCB), which groups together several configuration parameters and diagnostics that can be accessed by the software at any time to directly interact with the core.
Synchronous communication with peripherals and other hardware components can be triggered through interrupt lines. The processor can receive and recognize a number of different digital input signals and react to them promptly, interrupting the execution of the software and temporarily jumping to a specific location in the memory. Cortex-M supports up to 240 interrupt lines on the high-end cores of the family. The interrupt vector, located at the beginning of the software image in flash, contains the addresses of the interrupt routines that will automatically execute on specific events. Thanks to the NVIC, interrupt lines can be assigned priorities, so that when a higher-priority interrupt occurs while the routine for a lower interrupt is executed, the current interrupt routine is temporarily suspended to allow the higher-priority interrupt line to be serviced. This ensures minimal interrupt latency for these signal lines, which are somewhat critical for the system to execute as fast as possible.
At any time, the software on the target can run in two privilege modes: unprivileged or privileged. The CPU has built-in support for privilege separation between system and application software, even providing two different registers for the two separate stack pointers. We will examine in more detail how to properly implement privilege separation, and how to enforce memory separation when running untrusted code on the target.
A Cortex-M core is present in many microcontrollers, from different silicon vendors. Software tools are similar for all the platforms, but each MCU has a different configuration to take into account. Convergence libraries are available to hide manufacturer-specific details and improve portability across different models and brands. Manufacturers provide reference kits and all the documentation required to get started, which are intended to be used for evaluation during the design phase, and may also be useful for developing prototypes at a later stage. Some of these evaluation boards are equipped with sensors, multimedia electronics, or other peripherals that extend the functionality of the microcontroller.
Approaching embedded software requires, before anything else, a good understanding of the hardware platform and its components. Through the description of the architecture of modern microcontrollers, this chapter pointed out some of the peculiarities of embedded devices, and how developers should efficiently rethink their approach to meeting requirements and solving problems, while at the same time taking into account the features and the limits of the target platform. In the next chapter, we'll analyze the tools and procedures typically used in embedded development, how to organize the workflow, and how to effectively prevent, locate, and fix bugs.