You're reading from Mastering Embedded Linux Programming - Third Edition

Product typeBook

Published inMay 2021

PublisherPackt

ISBN-139781789530384

Edition3rd Edition

Concepts

Embedded Systems

Authors (2):

Frank Vasquez

Chris Simmonds

View More author details

Chapter 20: Profiling and Tracing

Interactive debugging using a source-level debugger, as described in the previous chapter, can give you an insight into the way a program works, but it constrains your view to a small body of code. In this chapter, we will look at the larger picture to see whether the system is performing as intended.

Programmers and system designers are notoriously bad at guessing where bottlenecks are. So if your system has performance issues, it is wise to start by looking at the full system and then work down, using more sophisticated tools as you go. In this chapter, I'll begin with the well-known top command as a means of getting an overview. Often the problem can be localized to a single program, which you can analyze using the Linux profiler, perf. If the problem is not so localized and you want to get a broader picture, perf can do that as well. To diagnose problems associated with the kernel, I will describe some trace tools, Ftrace, LTTng, and BPF...

Technical requirements

To follow along with the examples, make sure you have the following:

A Linux-based host system
Buildroot 2020.02.9 LTS release
Etcher for Linux
A Micro SD card reader and card
A Raspberry Pi 4
A 5 V 3A USB-C power supply
An Ethernet cable and port for network connectivity

You should have already installed the 2020.02.9 LTS release of Buildroot for Chapter 6, Selecting a Build System. If you have not, then refer to the System requirements section of The Buildroot user manual (https://buildroot.org/downloads/manual/manual.html) before installing Buildroot on your Linux host according to the instructions from Chapter 6.

All of the code for this chapter can be found in the Chapter20 folder from the book's GitHub repository: https://github.com/PacktPublishing/Mastering-Embedded-Linux-Programming-Third-Edition.

The observer effect

Before diving into the tools, let's talk about what the tools will show you. As is the case in many fields, measuring a certain property affects the observation itself. Measuring the electric current in a power supply line requires measuring the voltage drop over a small resistor. However, the resistor itself affects the current. The same is true for profiling: every system observation has a cost in CPU cycles, and that resource is no longer spent on the application. Measurement tools also mess up caching behavior, eat memory space, and write to disk, which all make it worse. There is no measurement without overhead.

I've often heard engineers say that the results of a profiling job were totally misleading. That is usually because they were performing the measurements on something not approaching a real situation. Always try to measure on the target, using release builds of the software, with a valid dataset, using as few extra services as possible...

Beginning to profile

When looking at the entire system, a good place to start is with a simple tool such as
top, which gives you an overview very quickly. It shows you how much memory is being used, which processes are eating CPU cycles, and how this is spread across different cores and times.

If top shows that a single application is using up all the CPU cycles in user space, then you can profile that application using perf.

If two or more processes have a high CPU usage, there is probably something that is coupling them together, perhaps data communication. If a lot of cycles are spent on system calls or handling interrupts, then there may be an issue with the kernel configuration or with a device driver. In either case, you need to start by taking a profile
of the whole system, again using perf.

If you want to find out more about the kernel and the sequencing of events there, use Ftrace, LTTng, or BPF.

There could be other problems that top will not help you with. If...

Profiling with top

The top program is a simple tool that doesn't require any special kernel options or symbol tables. There is a basic version in BusyBox and a more functional version in the procps package, which is available in the Yocto Project and Buildroot. You may also want to consider using htop, which has functionally similar to top but has a nicer user interface (some people think).

To begin with, focus on the summary line of top, which is the second line if you are using BusyBox and the third line if you are using top from procps. Here is an example, using BusyBox's top:

Mem: 57044K used, 446172K free, 40K shrd, 3352K buff, 34452K cached
CPU: 58% usr 4% sys 0% nic 0% idle 37% io 0% irq 0% sirq
Load average: 0.24 0.06 0.02 2/51 105
PID PPID USER STAT VSZ %VSZ %CPU COMMAND
105 104 root R 27912 6% 61% ffmpeg -i track2.wav
[…]

The summary line shows the percentage of time spent running in various states, as shown in this table:

...

The poor man's profiler

You can profile an application just by using GDB to stop it at arbitrary intervals to see what it is doing. This is the poor man's profiler. It is easy to set up and is one way of gathering profile data.

The procedure is simple:

Attach to the process using gdbserver (for a remote debug) or GDB (for a
native debug). The process stops.
Observe the function it stopped in. You can use the backtrace GDB command
to see the call stack.
Type continue so that the program resumes.
After a while, press Ctrl + C to stop it again, and go back to step 2.

If you repeat steps 2 to 4 several times, you will quickly get an idea of whether it is looping or making progress, and if you repeat them often enough, you will get an idea of where the hotspots in the code are.

There is a whole web page dedicated to this idea at http://poormansprofiler.org, together with scripts that make it a little easier. I have used this technique many times...

Introducing perf

perf is an abbreviation of the Linux performance event counter subsystem,
perf_events, and also the name of the command-line tool for interacting with
perf_events. Both have been part of the kernel since Linux 2.6.31. There is plenty of useful information in the Linux source tree in tools/perf/Documentation as well as at https://perf.wiki.kernel.org.

The initial impetus for developing perf was to provide a unified way to access the registers of the performance measurement unit (PMU), which is part of most modern processor cores. Once the API was defined and integrated into Linux, it became logical to extend it to cover other types of performance counters.

At its heart, perf is a collection of event counters with rules about when they actively collect data. By setting the rules, you can capture data from the whole system, just the kernel, or just one process and its children, and do it across all CPUs or just one CPU. It is very flexible. With this one tool, you...

Tracing events

The tools we have seen so far all use statistical sampling. You often want to know more about the ordering of events so that you can see them and relate them to each other. Function tracing involves instrumenting the code with tracepoints that capture information about the event, and may include some or all of the following:

A timestamp
Context, such as the current PID
Function parameters and return values
A callstack

It is more intrusive than statistical profiling and it can generate a large amount of data. The latter problem can be mitigated by applying filters when the sample is captured and later on when viewing the trace.

I will cover three trace tools here: the kernel function tracers Ftrace, LTTng, and BPF.

Introducing Ftrace

The kernel function tracer Ftrace evolved from work done by Steven Rostedt and many others as they were tracking down the causes of high scheduling latency in real-time applications. Ftrace appeared in Linux 2.6.27 and has been actively developed since then. There are a number of documents describing kernel tracing in the kernel source in Documentation/trace.

Ftrace consists of a number of tracers that can log various types of activity in the kernel. Here, I am going to talk about the function and function_graph tracers and the event tracepoints. In Chapter 21, Real-Time Programming, I will revisit Ftrace and use it to show real-time latencies.

The function tracer instruments each kernel function so that calls can be recorded and timestamped. As a matter of interest, it compiles the kernel with the -pg switch to inject the instrumentation. The function_graph tracer goes further and records both the entry and exit of functions so that it can create a call graph...

Using LTTng

The Linux Trace Toolkit (LTT) project was started by Karim Yaghmour as a means of tracing kernel activity and was one of the first trace tools generally available for the Linux kernel. Later, Mathieu Desnoyers took up the idea and re-implemented it as a next-generation trace tool, LTTng. It was then expanded to cover user space traces as well as the kernel. The project website is at https://lttng.org/ and contains a comprehensive user manual.

LTTng consists of three components:

A core session manager
A kernel tracer implemented as a group of kernel modules
A user space tracer implemented as a library

In addition to those, you will need a trace viewer such as Babeltrace (https://babeltrace.org) or the Eclipse Trace Compass plugin to display and filter the raw trace data on the host or target.

LTTng requires a kernel configured with CONFIG_TRACEPOINTS, which is enabled when you select Kernel hacking | Tracers | Kernel Function Tracer.

The description...

Using BPF

BPF (Berkeley Packet Filter) is a technology that was first introduced in 1992 to capture, filter, and analyze network traffic. In 2013, Alexi Starovoitov undertook a rewrite of BPF with help from Daniel Borkmann. Their work, then known as eBPF (extended BPF), was merged into the kernel in 2014, where it has been available since Linux 3.15. BPF provides a sandboxed execution environment for running programs inside the Linux kernel. BPF programs are written in C and are just-in-time (JIT) compiled to native code. Before that can happen, the intermediate BPF bytecode must first pass through a series of safety checks so that a program cannot crash the kernel.

Despite its networking origins, BPF is now a general-purpose virtual machine running inside the Linux kernel. By making it easy to run small programs on specific kernel and application events, BPF has quickly emerged as the most powerful tracer for Linux. Like what cgroups did for containerized deployments, BPF has the...

Using Valgrind

I introduced Valgrind in Chapter 18, Managing Memory, as a tool for identifying memory problems using the memcheck tool. Valgrind has other useful tools for application profiling. The two I am going to look at here are Callgrind and Helgrind. Since Valgrind works by running the code in a sandbox, it can check the code as it runs and report certain behaviors, which native tracers and profilers cannot do.

Callgrind

Callgrind is a call graph-generating profiler that also collects information about processor cache hit rate and branch prediction. Callgrind is only useful if your bottleneck is CPU-bound. It's not useful if heavy I/O or multiple processes are involved.

Valgrind does not require kernel configuration, but it does need debug symbols.
It is available as a target package in both the Yocto Project and Buildroot
(BR2_PACKAGE_VALGRIND).

You run Callgrind in Valgrind on the target like so:

# valgrind --tool=callgrind <program>

This produces...

Using strace

I started the chapter with a simple and ubiquitous tool, top, and I will finish with another: strace. It is a very simple tracer that captures system calls made by a program and, optionally, its children. You can use it to do the following:

Learn which system calls a program makes.
Find those system calls that fail, together with the error code. I find this useful
if a program fails to start but doesn't print an error message or if the message is
too general.
Find which files a program opens.
Find out which syscalls a running program is making, for example, to see whether it is stuck in a loop.

There are many more examples online; just search for strace tips and tricks. Everybody
has their own favorite story, for example, https://alexbilson.dev/posts/strace-debug/.

strace uses the ptrace(2) function to hook calls as they are made from user space to the kernel. If you want to know more about how ptrace works, the manual page is detailed...

Summary

Nobody can complain that Linux lacks options for profiling and tracing. This chapter has given you an overview of some of the most common ones.

When faced with a system that is not performing as well as you would like, start with top and try to identify the problem. If it proves to be a single application, then you can use perf record/report to profile it, bearing in mind that you will have to configure the kernel to enable perf and you will need debug symbols for the binaries and kernel. If the problem is not so well localized, use perf or BCC tools to get a system-wide view.

Ftrace comes into its own when you have specific questions about the behavior of the kernel. The function and function_graph tracers provide a detailed view of the relationship and sequence of function calls. The event tracers allow you to extract more information about functions, including the parameters and return values. LTTng performs a similar role, making use of the event trace mechanism,...

Frank Vasquez is an independent software consultant specializing in consumer electronics. He has over a decade of experience designing and building embedded Linux systems. During that time, he has shipped numerous devices including a rackmount DSP audio server, a diver-held sonar camcorder, and a consumer IoT hotspot. Before his career as an embedded Linux engineer, Frank was a database kernel developer at IBM where he worked on DB2. He lives in Silicon Valley.
Read more about Frank Vasquez

Chris Simmonds

Chris Simmonds is a software consultant and trainer living in southern England. He has almost two decades of experience in designing and building open-source embedded systems. He is the founder and chief consultant at 2net Ltd, which provides professional training and mentoring services in embedded Linux, Linux device drivers, and Android platform development. He has trained engineers at many of the biggest companies in the embedded world, including ARM, Qualcomm, Intel, Ericsson, and General Dynamics. He is a frequent presenter at open source and embedded conferences, including the Embedded Linux Conference and Embedded World.
Read more about Chris Simmonds

Other recommended products

Related to this chapter

Embedded Linux Development using Yocto Projects

Developers are increasingly integrating Linux into their embedded systems because it supports virtually all hardware architectures and many peripherals; it also scales well and offers the full source code. Yocto Project makes it much easier to customize Linux for embedded systems. This book gives you a profound insight into Yocto Project’s build system and addresses the latest tools and topics to help you perform different Linux-based tasks.

BookNov 2017162 pages

Embedded Linux Development Using Yocto Project Cookbook

The Yocto Project has become the de facto distribution build framework for reliable and robust embedded systems with a reduced time to market.

BookJan 2018456 pages

GNU/Linux Rapid Embedded Programming

Embedded computers have become very complex in the last few years and developers need to easily manage them by focusing on how to solve a problem without wasting time in finding supported peripherals or learning how to manage them. The main challenge with experienced embedded programmers and engineers is really how long it takes to turn an idea into reality, and we show you exactly how to do it. This book shows how to interact with external environments through specific peripherals used in the industry. We will use the latest Linux kernel release 4.4.x and Debian/Ubuntu distributions (with embedded distributions like OpenWrt and Yocto).

BookMar 2017732 pages

Linux Device Driver Development Cookbook

Device drivers play a critical role in how the system performs and ensures that the device works in the intended way. With a recipe based approach this book gives you practical recipes on character drivers and related kernel internals. It shows you how to start writing Linux device drivers and tools to understand, debug or modify them.

BookMay 2019356 pages

Linux Device Drivers Development

Linux kernel is a complex, portable, modular, and widely used piece of software, running on around 80% of servers and embedded systems in more than half of devices throughout the World.

BookOct 2017586 pages

Linux Kernel Programming

This book will help you get to grips with Linux kernel development in a hands-on way using helpful code examples. Linux Kernel Programming teaches you how to write high-quality kernel modules with industry best practices and security awareness. The book covers essential kernel internals, the latest 5.4 LTS kernel, and kernel synchronization.

BookMar 2021754 pages

Mastering Linux Device Driver Development

Linux is a fast-growing OS shipped with around 80 percent of the embedded and connected devices around the world. With its improved support for a range of devices, the demand for embedded developers is high. This book offers up-to-date coverage of the complex Linux Kernel framework to help you ease strategic technology decision-making and reduce development efforts.

BookJan 2021646 pages

Linux Kernel Programming Part 2 - Char Device Drivers and Kernel Synchronization

This book follows on from Linux Kernel Programming, helping you explore the Linux character device driver framework and enables you to write 'misc' class drivers. You'll learn how to efficiently interface with user apps, perform I/O on hardware memory, handle hardware interrupts, and leverage kernel delays, timers, kthreads, and workqueues.

BookMar 2021452 pages

Embedded Programming with Modern C++ Cookbook

This book is a collection of practical examples for understanding how embedded development is different from other desktop application development. You’ll learn to build an embedded application and use specialized memory and custom allocators. By the end of the book, you’ll be able to build robust and secure embedded applications with C++20.

BookApr 2020412 pages

Embedded Systems Architecture

Embedded systems are self-contained units that have a dedicated purpose within a device. We come across a variety of applications of embedded systems in navigation tools, telecom applications, and networking equipment, to name just a few. This book will help you create your own parallel and distributed embedded systems.

BookMay 2018324 pages

Hands-On System Programming with Linux

Twenty five years ago, as often happens in our industry, pundits laughed at and called Linux a joke. To say that view has changed is a massive understatement. This book will cement for you both the conceptual 'why' and the practical 'how' of systems programming on Linux, and covers Linux systems programming on the latest 4.x kernels.

BookOct 2018794 pages

Personalised recommendations for you

Based on your interests and search pattern

Architectural Patterns and Techniques for Developing IoT Solutions

This book covers all the patterns and considerations that give you both the power and flexibility to build scalable, secure, and performant IoT solutions by combining various patterns in interesting ways. It also lists the benefits of combining IoT with technologies like blockchain, 3D-printing, 5G, Generative AI, quantum computing, and LLMs.

BookSep 2023304 pages

Arduino Data Communications

Arduino Data Communication focuses on IoT’s Internet aspect, guiding you in setting up your own infrastructure for storing and managing the data collected from sensors. This book goes beyond microcontroller basics, equipping you with the knowledge essential for building real-world projects.

BookNov 2023286 pages5

Arduino IoT Cloud for Developers

From fundamental principles to advanced techniques, this comprehensive book equips you with the knowledge and skills needed to design and deploy IoT applications seamlessly. Explore cloud integration, best practices, and real-world projects to harness the full potential of IoT application development with the Arduino IoT Cloud.

BookNov 2023402 pages

The Azure IoT Handbook

Building IoT Systems with Azure IoT is a comprehensive introduction for those who are new to the Internet of Things and looking to get up to speed in no time. This book will teach you how to create and develop IoT solutions with intelligent edge-to-cloud technologies in the Azure cloud.

BookDec 2023248 pages