Architecting High-Performance Embedded Systems

Chapter 1: Architecting High-Performance Embedded Systems

This chapter introduces the elements of embedded system architectures and discusses some key system features that are common across a wide variety of embedded applications. An embedded system generally includes at least one microcontroller or microprocessor, sensors, actuators, a power source, and, in many cases, one or more network interfaces. The chapter continues with an exploration of the relationship between embedded systems and the Internet of Things (IoT).

This chapter emphasizes the necessity for many types of embedded systems to function in a real-time manner and presents the basic embedded system operating sequence of reading from input devices, computing outputs, and updating output devices in a repetitive manner while remaining synchronized with the passage of time.

The chapter concludes with an introduction to digital logic and the Field-Programmable Gate Array (FPGA), and identifies the design space within the spectrum of embedded systems most appropriately addressed by these high-performance devices.

After completing this chapter, you will have a broad understanding of the components that make up embedded systems and the relationship of embedded systems to the IoT. You will know why many embedded systems must operate in synchronization with real time and will understand the basic structure of FPGAs and how they can be employed to implement high-performance embedded systems.

We will cover the following topics in this chapter:

Elements of embedded systems
The Internet of Things
Operating in real time
FPGAs in embedded systems

Elements of embedded systems

Embedded systems are everywhere. Almost any electrical device you interact with that is more complicated than a simple light switch contains a digital processor that reads input data from its environment, executes a computational algorithm, and generates some kind of output that interacts with the environment.

From the moment you open your eyes in the morning (in response to an alarm produced by a digital device), to brushing your teeth (with an electric toothbrush that contains a digital processor), to toasting a breakfast bagel (in a digitally controlled toaster oven), to disabling your (digital) home alarm system, you interact with embedded devices. Throughout the day, you provide input to, and receive output from, many other devices, such as television remote controls, traffic signals, and railroad crossings. Highly digitized transportation systems, including automobiles, airplanes, and passenger ferries, each contain dozens, if not hundreds, of embedded processors that manage drive train operation, oversee safety features, maintain a comfortable climate, and provide entertainment for the humans they carry.

Let's take a moment to clarify the sometimes-murky dividing line separating embedded systems from general-purpose computing devices. The attribute that defines an embedded computing system is the integration of digital processing within a device that has some larger purpose beyond mere computing. Devices that do not contain any type of digital processing are not embedded systems. For example, an electric toothbrush that contains only a battery and a motor controlled by an on-off switch is not an embedded system. A toothbrush containing a microcontroller that illuminates a red light when you press down too hard while brushing is an embedded system.

A desktop computer, even though it is capable of performing many tasks, and can be enhanced through the addition of a wide variety of peripherals, is just a computer. An automobile, on the other hand, has as its primary purpose the transportation of passengers. In performing this function, it relies on a variety of subsystems containing embedded processing. Automobiles are embedded systems. Personal computers are not.

A smartphone is more difficult to clearly categorize in terms of membership in the set of embedded systems. When in use as a telephone, it is clearly performing a function consistent with the definition of an embedded system. When using it as a web browser, though, it more closely resembles a small general-purpose computer. Clearly, it is not always possible to definitively determine whether a device is an embedded system.

It is helpful to understand differences in the operating environment of general-purpose computers in comparison to embedded devices. Personal computers and enterprise servers tend to work best in climate-controlled indoor settings. Embedded devices such as those in automobiles are often exposed to far more rugged conditions, including the full effects of rain, snow, wind, dust, and heat.

A large percentage of embedded devices lack any sort of active cooling system (which is standard in personal computers and server computers) and must ensure their internal components remain at safe operating temperatures regardless of external conditions.

Embedded systems, whether they are relatively simple devices or highly complex systems, are typically composed of the following elements.

Power source

All electronic digital devices require some a of power. Most commonly, embedded systems are powered by utility electrical power, batteries, or by the host system in which the device operates. For example, an automobile taillight assembly containing a processor and a CAN bus communication interface is powered by 12 volts Direct Current (DC) provided by the car's electrical system.

It is also possible to power embedded devices from rechargeable batteries connected to solar panels that allow the device to continue operation at nighttime and on cloudy days, or even by harvesting energy from the environment. A self-winding wristwatch uses energy harvested from arm motion to generate mechanical or electrical power. Safety- and security-critical embedded systems often use utility power as the primary power source while also providing batteries as backup power to enable operation during power outages.

Time base

Embedded systems generally require some means of tracking the progress of time, also known as wall clock time, both in the short term (for durations of microseconds and milliseconds) and in the long term, keeping track of the date and time of day. Most commonly, a primary system clock signal is generated using a crystal oscillator or a Microelectromechanical System (MEMS) oscillator that produces an output frequency of a few megahertz.

A crystal oscillator amplifies the resonant vibration of a physical crystal, typically made of quartz, to generate a square wave electrical signal using the piezoelectric effect. A MEMS oscillator contains a vibrating mechanical structure that produces an electrical output using electrostatic transduction.

Once set to the correct time, a clock driven by a crystal oscillator or a MEMS oscillator will exhibit small errors in frequency (typically 1-100 parts per million) that accumulate over periods of days and weeks to gradually drift by seconds and then minutes away from the correct time. To mitigate this problem, most internet-connected embedded devices periodically access a time server to reset their internal clocks to the current time.

Digital processing

Embedded computing systems, by definition, contain some form of digital processor. The processing function is generally provided by a microcontroller, a microprocessor, or a system on a chip (SoC). A microcontroller is a highly integrated device that contains one or more central processing units (CPUs), random access memory (RAM), read-only memory (ROM), and a variety of peripheral devices. A microprocessor contains one or more CPUs, but has less of the overall system functionality integrated in the same device in comparison to a microcontroller, typically relying on external circuits for RAM, ROM, and peripheral interfaces.

An SoC is even more highly integrated than a microcontroller, generally combining one or more microcontrollers with additional digital hardware resources configured to perform specialized functions at high speed. As we will see in the FPGAs in embedded systems section and in subsequent chapters, SoC designs can be implemented as FPGA devices in architectures combining traditional microcontrollers with custom, high-performance digital logic.

Memory

Embedded systems generally contain RAM for working memory as well as some type of ROM, often flash memory, to store executable program code and other required information such as static databases. The quantity of each type of memory must be sufficient to meet the needs of the embedded system architecture over its planned life cycle. If the device is intended to support firmware upgrades, sufficient memory resources must be provided in the hardware design to support the anticipated range of potential system capability enhancements over its lifetime.

Software and firmware

In traditional computing environments, the executable code that users work with, such as web browsers and email programs, is referred to as software. This term is used to differentiate program code from the hardware that makes up the physical components of the computer system. In general-purpose computers, software is stored as files on some type of disk drive. In embedded systems, executable code is usually stored in some type of ROM, which is a hardware component within the device. Because of this arrangement, we can contemplate that the code occupies a middle ground between hardware and software. This middle ground is referred to as firmware. In the early days of embedded systems, code was often burned into a memory device that could not be changed after the initial programming. These devices were more hardware-like (hence more firm) than most currently produced embedded devices, which often contain rewriteable flash memory. Nevertheless, we continue to use the term firmware to describe code programmed into embedded systems.

Specialized circuitry

Embedded systems support a wide variety of applications, some of which are relatively simple processes such as monitoring button presses on a television remote control and producing the corresponding output signal, while other types of systems perform extremely complex processing-intensive work on high data rate input signals. While a simple embedded system may be able to use a tiny microcontroller to perform all of the digital processing required, a more complex system may require processing resources that exceed the capabilities of off-the-shelf microcontrollers and more capable microprocessors such as x86 and ARM processors.

In years past, architects of these more sophisticated embedded designs would turn to an application-specific integrated circuit (ASIC) to implement custom circuitry to perform the processing at the speed needed for proper system operation. An ASIC is an integrated circuit containing a custom digital circuit designed to support a particular application. The production of ASIC devices typically involves a very expensive production setup phase, which makes their use impractical during project prototyping and for small production runs.

Fortunately, much of the capability afforded by ASICs is available in low-cost FPGA devices. Because FPGAs are easily reprogrammable, they are generally used for embedded system prototyping and in low volume production runs. For high-volume production (thousands or millions of units), the lower per-unit cost of an ASIC can make the production setup costs worthwhile. This book will focus on the use of FPGAs in the prototyping of embedded systems.

Input from the environment

Embedded systems generally require input from their environment, whether it is from a human operating a user interface or from sensors measuring certain aspects of the system or environment in which they operate. For example, a battery-electric vehicle powertrain controller will track various aspects of the vehicle state, such as battery voltage, motor current, vehicle speed, and the position of the accelerator pedal. The system architecture must include hardware peripherals to measure input from each of the sensors with the necessary precision. The overall powertrain control system must be capable of performing measurements from all sensors at the rate required for proper vehicle operation.

Output to the environment

In addition to reading inputs from the environment, the embedded system will generally produce one or more outputs for use by human operators or by the host system. Continuing the battery-electric vehicle example, the powertrain controller uses the accelerator pedal position, along with other inputs, to compute a command to the controller for the drive motor. This command adjusts the torque output of the drivetrain.

In addition to directly supporting system operation, embedded controllers often provide output for human consumption, such as displaying the vehicle speed in the dashboard. Each output must be updated at a rate sufficient to support proper system operation, including the needs of human perception. When implementing human interfaces, graphical outputs should update smoothly without visible glitches or flicker and audio outputs must avoid timing-related problems such as gaps or skips.

Network communication

While many simple embedded systems operate in a completely self-contained manner, reading their inputs, computing outputs, and updating output devices in an isolated context, more and more embedded system designs support some form of network communication. This capability enables device features such as remote notifications from home video doorbells and the continuous monitoring of machinery on factory floors.

Enhancing an embedded system with an always available network communication capability can provide significant enhancements to functionality. However, this feature also presents a security risk that may be exploited by malicious actors if developers aren't careful to emphasize security within the system architecture. It is important to understand and address the security risks introduced by the inclusion of communication capabilities in an embedded system architecture.

Embedded system architects combine these elements to produce a system design that performs its intended functions, with appropriate safety margins, across the entire range of anticipated environmental operating conditions.

A suitable system design satisfies additional requirements such as size and weight constraints and power consumption limits, and holds production costs to an acceptable level. The design constraints for an embedded system depend heavily on such attributes as the number of units that will be produced, safety-critical aspects of the system, and the need for operation in rugged conditions.

There may be additional considerations that surface during the selection of the microcontroller or microprocessor architectural family and associated tools, such as the availability of suitable programming language compilers and debuggers. The selection of a processor family may depend in part on the past experience of the development team. It also depends on the cost, availability, and anticipated learning curve associated with the development tools.

Embedded system architectures that include persistent communication capability must address an additional dimension of the design space involving communications between individual devices and centralized nodes (typically servers accessed over the internet) and interactions between users and the embedded systems.

The widespread deployment of small-scale embedded systems with network connectivity has introduced the term Internet of Things (IoT). The next section discusses the relevance of IoT to the architectures of embedded systems.

The Internet of Things

Conceptually, the IoT represents an effort to maximize the utility of large numbers of disparate embedded devices through massive network communication. The feature that distinguishes IoT devices from more mundane embedded systems is the presence of a communication path between each device and one or more central nodes that gather data from the sea of devices and, in many cases, allow authorized users to issue commands to individual devices and to collections of devices.

During the IoT device development process, particularly when developing devices that will have access to sensitive personal information (such as home security cameras), responsible embedded system architects must undertake extensive measures to ensure the security of the end devices. IoT devices are often installed in consumer's homes, and security breakdowns that allow malicious actors to take control of cameras, microphones, or security systems must be prevented to the maximum extent possible. Although the system designer cannot prevent every security mistake an end user might commit, a more secure system can assist the user by taking steps such as guiding the selection of strong passwords and by being resistant to common types of attacks such as brute force password guessing.

Examples of IoT devices and systems include the following:

A home alarm system consisting of window and door sensors and motion sensors: This type of system generally includes a smartphone app providing immediate notification of alarm events. This system not only notifies the alarm company to initiate a response to alarm events, it also notifies the homeowner to the occurrence of those events. Clearly, this type of alarm system must be resistant to cyberattacks that would render the alarm function ineffective.
Electrical lights and power outlets: Many different illumination devices are available with internet-based monitoring and control, including light bulbs, light fixtures, and power strips capable of switching lights on and off. The app associated with each of these devices allows remote control of individual lights as well as the scheduling of light turn-on and turn-off times throughout the day. As with IoT alarm systems, security is an important feature that must be fully integrated into the system design.
Smart speakers: IoT speakers such as Amazon Echo and Google Nest provide a voice interface that allows users to make requests in natural language. Users preface commands with a word or phrase to wake up the speaker, such as "Alexa" or "Hey Google," followed by a command or request. These devices enable interaction with a variety of other IoT devices, including alarm systems and lighting control. An example voice command is "Alexa, turn on the lights."
Medical monitoring and treatment: A wide variety of embedded devices is deployed in hospitals and home environments to monitor aspects of patient health, such as temperature, blood oxygen, heart rate, breathing, and many more. These devices often communicate with a centralized database to enable tracking of current and historical health patterns by medical professionals. Other digital systems perform active treatment functions, such as infusing medications, and assisting with breathing.
Industrial applications: Embedded systems are widely used in factory lines, energy generation systems, energy transmission systems, and in the oil and gas industries to monitor and control complex systems and processes. For example, a broad range of sensors and actuators is required to perform real-time monitoring and management of the operation of an oil pipeline that may be thousands of miles long.

This book is focused on the architecture and design of embedded systems. We will examine all aspects of the design of IoT embedded systems, including network communication. We will discuss IoT security requirements for embedded systems as well as the communication protocols used to monitor and control IoT embedded devices.

Embedded devices usually operate under tight time constraints. The next section introduces the key aspects of real-time operation and the approaches embedded systems use to synchronize with the passage of time.

Operating in real time

To satisfy an embedded system's real-time requirements, the system must sense the state of its environment, compute a response, and output that response within a prescribed time interval. These timing constraints generally take two forms: periodic operation and event-driven operation.

Periodic operation

Embedded systems that perform periodic updates intend to remain in synchronization with the passage of time in the real world over long periods of time. These systems maintain an internal clock and use the passage of time as measured by the system clock to trigger the execution of each processing cycle. Most commonly, processing cycles repeat at fixed time intervals. Embedded systems typically perform processing at rates ranging from 10 to 1,000 updates per second, though particular applications may update at rates outside this range. Figure 1.1 shows the processing cycle of a simple periodically updated embedded system:

Figure 1.1 – Periodically updated embedded system

In the system of Figure 1.1, processing starts at the upper box, where initialization is performed for the processor itself and for the input/output (I/O) devices used by the system. The initialization process includes configuring a timer that triggers an event, typically an interrupt, at regularly spaced points in time. In the second box from the top, processing pauses while waiting for the timer to generate the next event. Depending on the capabilities of the processor, waiting may take the form of an idle loop that polls a timer output signal, or the system may enter a low power state waiting for the timer interrupt to wake the processor.

After the timer event occurs, the next step, in the third box from the top, consists of reading the current state of the inputs to the device. In the following box, the processor performs the computational algorithm and produces the values the device will write to the output peripherals. Output to the peripherals takes place in the final box at the bottom of the diagram. After the outputs have been written, processing returns to wait for the next timer event, forming an infinite loop.

Event-driven operation

Embedded systems that respond to discrete events may spend the vast majority of their time in an idle state and only come to life when an input is received, at which time the system executes an algorithm to process the input data, generates output, writes the output to a peripheral device, and then goes back to the idle state. A pushbutton-operated television remote control is a good example of an event-driven embedded device. Figure 1.2 shows the processing steps for an event-driven embedded device:

Figure 1.2 – Event-driven embedded system

Most of the processing steps in an event-driven embedded system are similar to those of the periodic system, except the initiation of each pass through the computational algorithm is triggered by an input to the device. Each time an input event occurs, the system reads the input device that triggered the event, along with any other inputs that are needed. The processor computes the outputs, writes outputs to the appropriate devices, and returns to wait for the next event, again forming an infinite loop. The system may have inputs for many different events, such as presses and releases of each of the keys on a keypad.

Many embedded systems must support both periodic and event-driven behaviors. An automobile is one example. While driving, the drivetrain processors sense inputs, perform computations, and update outputs to manage the vehicle speed, steering, and braking at regular time intervals. In addition to these periodic operations, the system contains other input signals and sensors that indicate the occurrence of events, such as shifting into gear or the involvement of the vehicle in a collision.

For a small, microcontroller-based embedded system, the developer might write the entirety of the code, including all timing-related functions, input, and output via peripheral interfaces, and the algorithms needed to compute outputs given the inputs. Implementing the blocks of Figure 1.1 or Figure 1.2 for a small system might consist of a few hundred lines of C code or assembly language.

At the higher end of system complexity, where the processor might need to update various outputs at different rates and respond to a variety of event-type input signals, it becomes necessary to segment the code between the time-related activities, such as scheduling cyclic updates, and the code that performs the computational algorithms of the system. This segmentation becomes particularly critical in highly complex systems that contain hundreds of thousands or even millions of lines of code. Real-time operating systems provide this capability.

Real-time operating systems

When a system architecture is of sufficient complexity that the separation of time-related functionality from computational algorithms becomes beneficial, it is common to implement an operating system to manage lower-level functionality, such as scheduling time-based updates and managing responses to interrupt-driven events. This allows application developers to focus on the algorithms required by the system design, which includes their integration into the capabilities provided by the operating system.

An operating system is a multilayer suite of software providing an environment in which applications perform useful functions, such as managing the operation of a car engine. These applications execute algorithms consisting of processor instruction sequences and perform I/O interactions with the peripheral devices needed to complete their tasks.

Operating systems can be broadly categorized into real-time and general-purpose operating systems. A real-time operating system (RTOS) provides features to ensure that responses to inputs occur within a specified time limit, as long as some assumptions about how the application code behaves remain true. Real-time applications that perform tasks such as managing the operation of a car engine or a kitchen appliance typically run under an RTOS to ensure that the electrical and mechanical components they control receive responses to any change in inputs within a specified time.

Embedded systems often perform multiple functions simultaneously. The automobile is a good example, where one or more processors continuously monitor and control the operation of the powertrain, receive input from the driver, manage the climate control, and operate the sound system. One method of handling this diversity of tasks is to assign a separate processor to perform each function. This makes the development and testing of the software associated with each function straightforward, though a possible downside is that the design ends up with a plethora of processors, many of which don't have very much work to do.

Alternatively, a system architect may assign more than one of these functions to a single processor. If the functions assigned to the processor perform updates at the same rate, integration in this manner may be straightforward, particularly if the functions do not need to interact with each other.

In the case where multiple functions that execute at different rates are combined in the same processor, the complexity of the integration will increase, particularly if the functions must transfer data among themselves.

In the context of an RTOS, separate periodically scheduled functions that execute in a logically simultaneous manner are called tasks. A task is a block of code with an independent flow of execution that is scheduled in a periodic or event-driven manner by the operating system. Some operating systems use the term thread to represent a concept similar to a task. A thread is a flow of code execution, while the term task generally describes a thread of execution combined with other system resources required by the task.

Modern RTOS implementations support the implementation of an arbitrary number of tasks, each of which may execute at different update rates and at different priorities. The priority of an RTOS task determines when it is allowed to execute relative to other tasks that may be simultaneously ready to execute. Higher-priority tasks get the first chance to execute when the operating system is making scheduling decisions.

An RTOS may be preemptive, meaning it has the authority to pause the execution of a lower-priority task when a higher-priority task becomes ready to run. When this happens, which typically occurs when it becomes time for the higher-priority task to perform its next update, or when a blocked I/O operation initiated by the higher-priority task completes, the system saves the state of the lower-priority task and transfers control to the higher-priority task. After the higher-priority task finishes and returns to the waiting state, the system switches back to the lower-priority task and resumes its execution.

As we'll see in later chapters, there are several additional features available in popular RTOS implementations such as FreeRTOS. There are also some significant performance constraints that developers of applications running in an RTOS environment must be aware of to avoid problems such as higher-priority tasks entirely blocking the execution of lower-priority tasks, and the possibility of deadlock between communicating tasks.

In the next section, we will introduce the basics of digital logic and examine the capabilities of modern FPGA devices.

FPGAs in embedded systems

A gate array is a digital integrated circuit containing a large number of logic elements that can be connected in an arbitrary manner to form complex digital devices. Many FPGAs even support the implementation of a full-blown microcontroller together with an array of I/O devices. A microcontroller or microprocessor implemented using the gates of an FPGA is referred to as a soft processor.

Early versions of gate arrays were one-time programmable devices in which a circuit design would be implemented within a device at the factory where the device was constructed, or perhaps by system developers using a programming device connected to their desktop computers. Once a device had been programmed, it could not be changed. Since that time, the technology of gate arrays has improved and now reprogrammable gate arrays are widely available.

Today, there is a tremendous variety of Field-Programmable Gate Arrays (FPGAs) available even to system developers of modest means. As the name implies, FPGAs are gate arrays that can be reprogrammed at any time, even after an embedded system has been assembled and delivered to its end user.

Before we get into the specifics of FPGA devices, we'll introduce some underlying concepts related to digital circuits, specifically logic gates and flip-flops.

Digital logic gates

A modern FPGA device contains what we might think of as a large box of digital parts that can be used to assemble complex logic circuits. The simplest of these components include the AND, OR, and XOR gates that perform basic logic functions. Each of these gates has two inputs and one output. The NOT gate is even simpler, with one input and one output. Logic gates operate on the binary input values 0 and 1 and produce an output of 0 or 1 as determined by the inputs.

In reality, the binary values in these circuits are represented by a voltage, with 0 usually represented as a low voltage (near zero volts) and 1 as a higher voltage that depends on the technology of the circuitry in which the gates are implemented. A common level for the 1 value in modern circuitry is 3.3 volts.

We will briefly discuss the behavior of each of these gates and present the gate's schematic symbol and the truth table that defines the gate's behavior. The behavior of a logic gate can be represented as a truth table where, for each possible combination of inputs, the output is given. Each column represents one input or output signal, with the output shown at the right side of the table. Each row presents one set of input values with the output of the gate given those inputs.

The AND gate outputs a 1 when both of its inputs are 1, otherwise the output is 0. Figure 1.3 is the AND gate schematic symbol:

Figure 1.3 – AND gate schematic symbol

The following table is the truth table for the AND gate:

The OR gate outputs a 1 if either of its inputs is 1, otherwise the output is 0. Figure 1.4 is the OR gate schematic symbol:

Figure 1.4 – OR gate schematic symbol

The following table is the truth table for the OR gate:

The XOR gate outputs a 1 if exactly one of its outputs is 1, otherwise the output is 0. Figure 1.5 is the XOR gate schematic symbol:

Figure 1.5 – XOR gate schematic symbol

The following table is the truth table for the XOR gate:

The NOT gate has a single input and an output that is the inverse of its input: An input of 0 produces an output of 1, and an input of 1 produces an output of 0. Figure 1.6 is the NOT gate schematic symbol:

Figure 1.6 – NOT gate schematic symbol

In Figure 1.6, the triangle represents an amplifier, meaning this is a device that turns a weaker input signal into a stronger output signal. The circle represents the inversion operation.

The following table is the truth table for the NOT gate:

Each of the AND, OR, and XOR gates can be implemented with an inverting output. The function of an inverting gate is the same as described, except the output is the opposite of the output from the non-inverting gate. The schematic symbol for an AND, OR, or XOR gate with inverted output has a small circle added at the output side of the symbol, just as on the output of the NOT gate. The names of the gates with inverted outputs are NAND, NOR, and XNOR. The letter N in each of these names indicates NOT. For example, NAND means NOT AND, which is functionally equivalent to an AND gate followed by a NOT gate.

Flip-flops

A device that changes its output state only when a clock signal makes a specified transition (either low-to-high or high-to-low) is referred to as an edge-sensitive device. A flip-flop is an edge-sensitive device that holds one bit of data as its output signal. The flip-flop updates the data value it contains based on the state of its input signal when the clock input receives the specified transition.

The positive edge-triggered D flip-flop is a common digital circuit component that finds use in a variety of applications. The D flip-flop typically includes set and reset input signals that force the stored value to 1 (set) or to 0 (reset). This type of flip-flop has a data input called the D input.

The D flip-flop has a clock input that triggers the transfer of the D input to the Q output on the clock's rising edge. The output (the overbar here means NOT) always has the opposite binary value from the Q output. Other than within an extremely narrow window of time surrounding the rising edge of the clock signal, the flip-flop does not respond to the value of the D input. When active (at the 1 level), the S (set) and R (reset) inputs override any activity on the D and clock inputs.

Figure 1.7 shows the schematic symbol for the D flip-flop. The clock input is indicated by the small triangle on the left side of the symbol:

Figure 1.7 – D flip-flop

The truth table for the D flip flop is shown below. The upward-pointing arrows in the CLK column indicate the rising edge of the clock signal. The and outputs on the table rows containing upward-pointing arrows in the CLK column represent the state of the outputs following the rising clock edge. In this table, the value X indicates don't care, meaning it does not matter what value that signal has in determining the Q output. The output Qprev prev represents the most recent value of produced through the action of the S, R, D, and CLK inputs:

Any digital circuit composed of a collection of logic gates is referred to as combinational logic when the output at any moment depends only on the current state of the inputs. In other words, the output does not depend on previous input values. Combinational logic circuits have no memory of past inputs or outputs.

Armed with this background information on logic gates and flip-flops, we will next discuss the implementation of circuits composed of these and related components in FPGAs.

Elements of FPGAs

The digital parts available within an FPGA typically fall into the categories of lookup tables, flip-flops, block RAM, and DSP slices. We will briefly examine each of these components.

Lookup tables

Lookup tables are used extensively in FPGAs to implement combinational logic circuits constructed from simple logic gates such as NOT, AND, OR, and XOR, as well as the siblings of the last three of these with inverted outputs: NAND, NOR, and XNOR.

Rather than implementing a logic gate circuit in hardware with the actual gates in its design, it is always possible to represent the same circuit using a simple lookup table. Given any combination of input signals, the correct output can be retrieved from a memory circuit addressed by the inputs. A typical FPGA lookup table has six single-bit input signals and a single bit output. This is equivalent to a single-bit-wide memory device with six address inputs holding 64 bits of data (26 = 64). Circuits that require fewer than six inputs can treat some of the inputs as don't care inputs. Circuits with greater complexity can combine multiple lookup tables to produce their results.

Flip-flops

For a digital circuit to retain any record of past events, some form of memory is required. As presented in the previous section, a flip-flop is a high-speed single-bit memory storage device. As with lookup tables, FPGAs contain large numbers of flip-flops to support the construction of complex sequential logic circuits. Digital circuitry that generates outputs based on a combination of current inputs and past inputs is called sequential logic. This is in contrast to combinational logic, where outputs depend only on the current state of the inputs.

Block RAM

A Block RAM (BRAM) is a range of dedicated memory locations within an FPGA. In comparison to traditional processor hardware, flip-flops can be likened to processor registers, while BRAM is more like cache memory. Cache memory in a processor is used to temporarily store copies of recently accessed memory contents in a memory area where the processor can access it again, if it needs to, much faster than reaching out to main memory. FPGA synthesis tools allocate BRAM to circuit designs in a manner that optimizes the performance of the digital circuit.

DSP slices

A DSP slice is a section of digital logic optimized to perform the central computation of digital signal processing – the Multiply-Accumulate (MAC) operation. MAC processing involves multiplying two lists of numbers element by element and adding the products together. As a simple example, if two sequences are defined as a0, a1, a2 and b0, b1, b2, the result of a MAC operation on these sequences is a0b0 + a1b1 + a2b2. Many DSP algorithms are built upon repetitive MAC operations performed with a list of algorithm-specific coefficients on a stream of input data.

Other functional elements

Every FPGA manufacturer expends significant effort to ensure each FPGA model provides the highest performance possible for use in a wide range of application areas. In order to better meet a diversity of needs, FPGAs often include hardware implementations of additional categories of low-level digital components such as shift registers, carry logic, and multiplexers. The inclusion of these hardware elements enables the synthesis of better-performing algorithms in comparison to an FPGA that generates these low-level components from the more generic resources available within the device.

The next section introduces the FPGA synthesis process, which converts a high-level description of an FPGA algorithm into a circuit implementation within a specific FPGA device.

FPGA synthesis

Although an FPGA device contains a large collection of low-level digital building blocks used to implement complex digital devices, it is important for system developers who are new to FPGA technology to understand that, in most cases, designers do not need to work directly at the level of these components. Instead, digital designers specify the system configuration as a combination of higher-level predefined functional blocks, such as a soft processor, and custom digital logic defined using a hardware description language. It is also possible to specify FPGA algorithms using programming languages such as C and C++.

The process of converting the high-level description of device functionality into the allocation and interconnection of the lookup tables, flip-flops, BRAM, and other device components is called FPGA synthesis. The synthesis process is conceptually similar to the software compilation process that converts human-readable source code to a binary program that can be executed by a processor.

Hardware design languages

It is easy to represent simple digital circuits using logic diagrams based on the schematic symbols presented earlier in this chapter. When designing digital devices that are very complex, however, the use of logic diagrams quickly becomes unwieldy. As an alternative to the logic diagram, a number of hardware description languages have been developed over the years.

The two most popular hardware design languages are VHDL and Verilog. VHDL is a multilevel acronym where the V stands for VHSIC, which means Very High-Speed Integrated Circuit, and VHDL stands for VHSIC Hardware Description Language. The syntax and some of the semantics of VHDL are based on the Ada programming language. Verilog has capabilities similar to VHDL. Although the two languages are not equivalent, it is broadly true that almost any digital design that you might implement in one of these languages can be implemented in the other language.

To provide a quick comparison between schematic diagram-based logic design and designing with a hardware description language, we will look at a simple adder circuit. A full adder adds two data bits plus an incoming carry bit and produces a one-bit sum and a carry output bit. This circuit, shown in Figure 1.8, is called a full adder because it includes the incoming carry in the calculation. A half adder, in comparison, adds only the two data bits without an incoming carry:

Figure 1.8 – Full adder circuit

The full adder uses logic gates to produce its output as follows: The sum bit S is 1 only if the total number of 1 bits in the collection A, B, Cin is an odd number. Otherwise, S is 0. The two XOR gates perform this logical operation. Cout is 1 if both A and B are 1, or if just one of A and B is 1 and Cin is also 1. Otherwise, Cout is 0.

The VHDL code in the following listing defines a digital circuit that performs the equivalent full adder function:

-- Load the standard libraries
 
library IEEE;
  use IEEE.STD_LOGIC_1164.ALL;
 
-- Define the full adder inputs and outputs
 
entity FULL_ADDER is
  port (
    A     : in    std_logic;
    B     : in    std_logic;
    C_IN  : in    std_logic;
    S     : out   std_logic;
    C_OUT : out   std_logic
  );
end entity FULL_ADDER;
 
-- Define the behavior of the full adder
 
architecture BEHAVIORAL of FULL_ADDER is
 
begin
 
  S     <= (A XOR B) XOR C_IN;
  C_OUT <= (A AND B) OR ((A XOR B) AND C_IN);
 
end architecture BEHAVIORAL;

This code is a fairly straightforward textual description of the full adder in Figure 1.8. Here, the section introduced with entity FULL_ADDER is defines the input and output signals of the full adder component. The architecture section toward the end of the code describes how the circuit logic operates to produce the outputs S and C_OUT given the inputs A, B, and C_IN. The term std_logic refers to a single-bit binary data type. The <= characters represent wire-like connections that drive the output on the left-hand side with the value computed on the right-hand side.

It is important, especially for FPGA developers coming from a software background, to understand that there is no concept of sequential execution in VHDL code. The statements in the BEHAVIORAL section at the end of the code that associate the outputs S and C_OUT with logical expressions are defining a digital circuit equivalent to Figure 1.8. They are not specifying computations that execute in sequence as in a traditional software program.

The benefits of using FPGAs in embedded system designs

For embedded system architects who are new to developing with FPGAs, the many benefits of using these devices may not be immediately obvious. Although FPGAs certainly are not appropriate for every embedded system design, it is useful to consider whether the use of FPGA technology is appropriate for your next system design.

Some of the benefits of developing embedded systems with FPGAs are as follows:

Processor customization: Because the soft processors used in FPGAs are programmed into the device, it is standard for the developers of these products to provide a variety of configuration alternatives to the end user. Some common options are a choice between a 64-bit or 32-bit processor, the inclusion or exclusion of a floating-point processor, and the inclusion or exclusion of instructions that require significant hardware resources, such as integer division. These are just a few of the options that are likely to be available. The soft processor configuration can be modified even late in the development cycle to optimize trade-offs between system performance and FPGA resource utilization.
Flexible peripheral configuration: Since the I/O interfaces in an FPGA design are defined in software, designers can include exactly the I/O devices they need and avoid including I/O hardware they don't need. As with processor customization, it is straightforward to modify the types and the number of I/O devices even late in the development cycle.
High-level synthesis: Modern FPGA development tools support the definition of computationally intensive algorithms in traditional programming languages, including C and C++. This allows system developers with a software skill set to develop algorithms in a traditional software development environment and directly transition the same code into an optimized FPGA implementation. The FPGA version of the algorithm is relieved of traditional processor-based restrictions, such as sequential instruction execution and a fixed memory architecture. The high-level synthesis tools will generate an FPGA implementation that exploits execution parallelization and defines a memory architecture best suited to the algorithm. A custom hardware algorithm can be combined with a soft processor to implement a complete, high-performance digital system on a single FPGA device.
Hardware acceleration for parallelizable applications: Any algorithm that benefits from parallelization is a candidate for implementation as custom FPGA logic. Rather than executing an algorithm sequentially with processor instructions, FPGA hardware can often perform the processing in parallel much faster. Many modern FPGA devices contain dedicated hardware to support digital signal processing (DSP) operations. These capabilities are available for use by many types of parallel algorithms, such as digital filtering and neural networks.
Extensive debugging capabilities: Soft processors often provide options to enable a variety of debugging capabilities, such as instruction tracing, multiple complex breakpoints, and the ability to monitor the innermost operations of the processor and its interactions with other system components at the hardware level. As system development wraps up, developers can remove resource-intensive debugging capabilities from the final design to enable deployment in a smaller and less costly FPGA device.
Rapid prototyping of ASIC designs: For embedded system designs intended to support the high volume that makes ASIC usage cost-effective, it is helpful to perform early prototyping with FPGAs to validate the system's digital design prior to investing in an ASIC implementation. The use of FPGAs in this context enables rapid development iterations that enable extensive testing of the new features introduced at each build iteration.

Xilinx FPGAs and development tools

There are several manufacturers of FPGA devices and the development tools associated with them. To avoid trying to cover multiple vendors and their FPGA devices and development toolchains, and to avoid discussing these topics at too abstract a level, we are going to select one vendor and one set of development tools for use in the examples and projects developed in this book. This is not to suggest that another vendor's devices and tools aren't as good, or possibly better, for the applications we will discuss. We are simply choosing to use Xilinx FPGA devices and development tools to make the steps we are taking concrete and to allow you to follow along.

The Vivado Design Suite is available as a free download from Xilinx, though you will need to create a Xilinx user account to access the download page. Visit https://www.xilinx.com/ and select the option to create your account. Once you are logged in on the website, visit https://www.xilinx.com/support/download.html and download the Vivado Design Suite.

Vivado can be installed on Windows and Linux operating systems. Our projects in the coming chapters can be developed with Vivado running under either operating system.

Vivado includes a set of simulation capabilities that will allow you to develop and execute FPGA implementations within the simulation environment at zero cost. When you decide you need to see your FPGA design run on an actual FPGA, the best option for the projects we will be covering is the Arty A7-100T. This board currently costs US$249 and is available at https://store.digilentinc.com/arty-a7-artix-7-fpga-development-board-for-makers-and-hobbyists/.