Reader small image

You're reading from  The FPGA Programming Handbook - Second Edition

Product typeBook
Published inApr 2024
PublisherPackt
ISBN-139781805125594
Edition2nd Edition
Tools
Right arrow
Authors (2):
Frank Bruno
Frank Bruno
author image
Frank Bruno

Frank Bruno is an experienced high-performance design engineer specializing in FPGAs with some ASIC experience. He has experience working for companies like SpaceX, GM Cruise, Belvedere Trading, Allston Trading, and Number Nine. He is currently working as an FPGA engineer for Belvedere Trading.
Read more about Frank Bruno

Guy Eschemann
Guy Eschemann
author image
Guy Eschemann

Guy Eschemann was an Electrical Engineer with over twenty years of experience designing FPGA-based embedded systems for automotive, industrial, medical, aerospace, military, and telecom applications. He was working as an FPGA engineer at plc2 Design GmbH.
Read more about Guy Eschemann

View More author details
Right arrow

Introducing FPGAs

A gate array in ASIC terms is a sea of gates with some number of mask steps that can be configured for a given application. This allows for a more inexpensive product since the company designing the ASIC only needs to pay for the masks necessary for configuring. The FPGA takes this one step further by providing the programmability of the fabric as part of the device. This results in an increased cost as you are paying for the interconnect you are not using and the storage devices necessary to configure the FPGA fabric but allows for cost reductions as these parts become standard devices that can be mass produced.

If we look at the functions in the previous section through the adder example, we can see one commonality; they can all be produced using a truth table. This becomes key in FPGA development. We can regard these truth tables as Read-Only Memory (ROM) representations of the functions. In fact, we can regard them as Programmable ROMs (PROMs) in the case of building up our FPGA.

If you look at the truth table for the full adder, you can see that there are three inputs and two outputs. If we look at the inputs as an address, then we could use storage to look up the data stored at that location and use it as an output. A ROM can be very tiny since it is hardcoded to look up certain values. A PROM uses some storage elements that can be changed to implement new functions, which is closer to what we will use in the FPGA.

Let’s take the example of the fundamental logic functions. We can reproduce any of them by utilizing a two-input lookup table, which could look something like this:

Figure 1.7: Two-input LUT examples

In Figure 1.7, we can see an oversimplified example with four storage elements, in this case, flip-flops (configuration storage), but in the case of an actual FPGA, we would more likely have a much simpler structure utilizing far fewer transistors. We will discuss storage elements in Chapter 4, Counting Button Presses. The storage elements are connected to one another such that their configuration can be loaded. By attaching other Lookup Tables (LUTs) to the chain, multiple LUTs can be configured at startup or, in the case of partial reconfiguration, during normal operation.

We have a component called a MUX, or multiplexor, that acts as the address lookup by selecting the storage element pointed to by the address made up of inputs A and B. This gives us the value at that location. If you look back at our two input gates, you should be able to see that we can map any two-input gate to this structure. By adding a flip-flop, we can see the final structure of the LUT takes shape.

The power of the simplicity of the structure is the ability to replicate this design many times over. In the case of modern FPGAs, they are built of many tiles or columns of logic such as this, allowing a much simpler piece to be designed, implemented, verified, and then replicated to produce the large gate count devices available. This allows for a range of lower-cost devices with fewer columns of resources to larger devices with many more, some even using Stacked Silicon Interconnects (SSIs), which allows multiple ASIC dies to be attached via an interconnect substrate.

In 1985, Xilinx introduced the XC2064, which was what we would consider the first FPGA utilizing an array of 64 three-input LUTs with one flip-flop. The breakthrough with this design was that it was modular and had good interconnect resources. This entire part would be approximately equivalent to one CLB in the Artix-7 we will be targeting.

The combinational logic blocks would be useless if we didn’t have a way for them to communicate with each other. This is where programmable interconnect gets involved. The FPGA fabric consists of LUTs with associated flip-flops making up slices and, ultimately, CLBs. These blocks are all connected using a rich topology of routing channels, allowing for almost limitless configuration. FPGAs also contain many other resources that we will explore over the course of this book, block RAMs, Serial-Deserial (SERDES) cores, Digital Signal Processing (DSP) elements, and many types of programmable Input/Output (I/O).

We’ve seen the theory of a simple device, but this isn’t really that useful today. Next, we will look at the FPGAs, which, although a few years old, are still great devices to choose for hobbyists and low-cost designs.

Exploring the Xilinx Artix-7 and 7 series devices

The FPGA we will be looking at in this book is an Artix-7 device. These devices have the highest performance per watt of the Xilinx 7 series devices. For a reasonable price, they feature a large amount of relatively high-performance logic to implement your designs. The FPGA components we will introduce here are common in the Artix (low-end), Kintex (mid-range), and Virtex (high-end) parts in the 7 series.

Combinational logic blocks

ASICs are made up of logic gates based upon libraries provided by ASIC foundries, such as Taiwan Semiconductor Manufacturing Company and Tower Semiconductor. These libraries can contain everything from AND, OR, and NOT gates to more complicated math cells and storage elements. When developing an FPGA, you will be targeting the same Boolean logic equations as you would in an ASIC. We will be using a very similar flow. However, the synthesis process will target the CLBs of the FPGA:

Figure 1.8: CLB internals

A CLB consists of a pair of slices, each of which contains four six-input LUTs and their eight flip-flops. Vivado (or optionally a third-party synthesis tool such as Synopsys Synplify) compiles the SystemVerilog/Verilog or VHDL code and maps it to these CLB elements. To fully explore the details of the CLB, I would suggest reading the Xilinx UG474, 7 Series FPGAs CLB user guide (https://www.xilinx.com/support/documentation/user_guides/ug474_7Series_CLB.pdf). At a high level, each LUT allows a degree of flexibility such that any Boolean function with six inputs can be implemented or two arbitrarily defined five-input functions if they share common inputs. There is also dedicated high speed carry logic for arithmetic functions, which will be discussed in later chapters.

The slices come in two formats, SLICEL (L = logic), which can only implement logic functions and storage elements, and SLICEM (M = memory), which additionally can be reconfigured for use as a small memory element (distributed RAM) or a shift register. There are approximately three times the number of SLICELs as SLICEMs. The following table, for the one suggested development board for this book, shows the breakdown:

Board

Device

Slices

SLICEL

SLICEM

6-Input LUTs

Distributed RAM (Kb)

Shift Register (Kb)

Flip-Flops

Nexys A7

7A100T

15,850

11,100

4,750

63,400

1,188

594

126,800

Table 1.2: Nexys A7 FPGA resources

Although it is possible to instantiate and force the functionality of lower-level components, such as slices or LUTs, this is beyond the scope of this book, and a feature not widely used. We will be targeting CLB usage through the synthesis of the HDL that we write.

Storage

Aside from the SLICEMs that make up the CLBs that can be used as memories or shift registers, FPGAs contain Block RAMs (BRAMs) that are larger storage elements. The 7 series parts all have a 36 Kb BRAM that can be split into two 18 Kb BRAMs. The recommended development board, the Nexys A7 with the 100T part has 135 BRAMs, 36Kb each.

In normal computer parlance, Kb is kilobits, while KB is kilobytes – lowercase b for bits, and uppercase B for bytes.

BRAMs can be configured as follows:

  • True dual port memories – Two read/write ports.
  • Simple dual port memories – 1 read/1 write. In this case, a 36-Kb BRAM can be up to 72 bits wide and an 18-Kb BRAM up to 36 bits wide.
  • A single port.

The contents of BRAMs can be loaded at initialization and configured via a file or initial block in the code. This can be useful for implementing ROMs or start-up conditions.

BRAMs in 7 series devices also contain logic to implement FIFOs. This saves CLB resources and reduces synthesis overhead and potential timing problems in a design. We will go over FIFOs in a later chapter.

All 36-Kb BRAMs have dedicated Error Correction Code (ECC) functions. As this is something more related to high-reliability applications, such as medical-, automotive-, or space-based, we will not go into detail on it in this book.

Clocking

7 series devices implement a rich clocking methodology, which can be explored in detail in the UG472 7 series FPGAs clocking resources user guide (https://www.xilinx.com/support/documentation/user_guides/ug472_7Series_Clocking.pdf). For most purposes, our discussion in the Phase Locked Loop (PLL) section will give you everything you need to know; however, the referenced document will delve into far more detail.

I/Os

For the most part, we will limit ourselves to the I/Os supported by the two targeted development boards. In general, the 7 series devices handle a variety of interfaces from 3.3v CMOS/TTL to LVDS and memory interface types. The boards we are using will dictate the I/Os defined in our project files. For more information on all the supported types, you can reference the UG471 7 Series FPGAs SelectIO resources user guide.

DSP48E1 – Xilinx DSP core in 7 series devices

FPGAs have a large footprint in DSP applications that use a lot of multipliers and, more specifically, Multiply Accumulate (MAC) functions. One of the first innovations in FPGAs was to include hard multipliers, which were quickly replaced by DSP blocks that could implement MAC functions. Figure 1.9 shows the implementation of the DSP48E1 implemented in the Artix-7.

Figure 1.9: Xilinx UG479 7 series DSP48E1

One of the most expensive operations in an FPGA is arithmetic. In an ASIC, the largest and slowest operation is typically a multiplication operation, and the smaller or faster operation is an add operation. For this reason, for many years, FPGA manufacturers have been implementing hard arithmetic cores in their fabric. This makes the opposite true in an FPGA, where the slower operation is typically an adder, especially as the widths get larger. The reason for this is that the multiply has been hardened into a complex, pipelined operation. We will explore the DSP operator more in later chapters. The UG479 7 Series DSP48E1 user guide (https://www.xilinx.com/support/documentation/user_guides/ug479_7Series_DSP48E1.pdf) is a good reference if you are interested in delving into the details.

ASMBL architecture

The 7 series devices are the fourth generation where Xilinx has used the Advanced Silicon Modular Block (ASMBL) architecture for implementation purposes. The idea behind this is to enable FPGA platforms to be optimized for different target applications. Looking at the 7 series families, we can see how different configurations of slices are brought together to achieve these goals. We can see how the pieces we covered in this chapter are arranged as columns to give us the resources we will be using for our example projects ahead. Figure 1.10 below shows the internal configuration of the FPGA, which is built by implementing vertical columns of different element types. From the point of view of the FGPA vendor, this simplifies the design and testing of new devices by simply adding more columns, or expanding columns vertically to make larger devices.

Figure 1.10: Example FPGA internal layout based upon ASMBL

Now that we have looked at what makes up the Artix-7 and other 7 series devices, we need to get the Xilinx tools installed so that we can get to our first project.

We’ve looked at FPGAs and the specific FPGA we will target in the book. Now let’s look at the evaluation board that we will target in the book.

Previous PageNext Page
You have been reading a chapter from
The FPGA Programming Handbook - Second Edition
Published in: Apr 2024Publisher: PacktISBN-13: 9781805125594
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (2)

author image
Frank Bruno

Frank Bruno is an experienced high-performance design engineer specializing in FPGAs with some ASIC experience. He has experience working for companies like SpaceX, GM Cruise, Belvedere Trading, Allston Trading, and Number Nine. He is currently working as an FPGA engineer for Belvedere Trading.
Read more about Frank Bruno

author image
Guy Eschemann

Guy Eschemann was an Electrical Engineer with over twenty years of experience designing FPGA-based embedded systems for automotive, industrial, medical, aerospace, military, and telecom applications. He was working as an FPGA engineer at plc2 Design GmbH.
Read more about Guy Eschemann