Reader small image

You're reading from  Mastering Embedded Linux Programming - Third Edition

Product typeBook
Published inMay 2021
PublisherPackt
ISBN-139781789530384
Edition3rd Edition
Right arrow
Authors (2):
Frank Vasquez
Frank Vasquez
author image
Frank Vasquez

Frank Vasquez is an independent software consultant specializing in consumer electronics. He has over a decade of experience designing and building embedded Linux systems. During that time, he has shipped numerous devices including a rackmount DSP audio server, a diver-held sonar camcorder, and a consumer IoT hotspot. Before his career as an embedded Linux engineer, Frank was a database kernel developer at IBM where he worked on DB2. He lives in Silicon Valley.
Read more about Frank Vasquez

Chris Simmonds
Chris Simmonds
author image
Chris Simmonds

Chris Simmonds is a software consultant and trainer living in southern England. He has almost two decades of experience in designing and building open-source embedded systems. He is the founder and chief consultant at 2net Ltd, which provides professional training and mentoring services in embedded Linux, Linux device drivers, and Android platform development. He has trained engineers at many of the biggest companies in the embedded world, including ARM, Qualcomm, Intel, Ericsson, and General Dynamics. He is a frequent presenter at open source and embedded conferences, including the Embedded Linux Conference and Embedded World.
Read more about Chris Simmonds

View More author details
Right arrow

Chapter 20: Profiling and Tracing

Interactive debugging using a source-level debugger, as described in the previous chapter, can give you an insight into the way a program works, but it constrains your view to a small body of code. In this chapter, we will look at the larger picture to see whether the system is performing as intended.

Programmers and system designers are notoriously bad at guessing where bottlenecks are. So if your system has performance issues, it is wise to start by looking at the full system and then work down, using more sophisticated tools as you go. In this chapter, I'll begin with the well-known top command as a means of getting an overview. Often the problem can be localized to a single program, which you can analyze using the Linux profiler, perf. If the problem is not so localized and you want to get a broader picture, perf can do that as well. To diagnose problems associated with the kernel, I will describe some trace tools, Ftrace, LTTng, and BPF...

Technical requirements

To follow along with the examples, make sure you have the following:

  • A Linux-based host system
  • Buildroot 2020.02.9 LTS release
  • Etcher for Linux
  • A Micro SD card reader and card
  • A Raspberry Pi 4
  • A 5 V 3A USB-C power supply
  • An Ethernet cable and port for network connectivity

You should have already installed the 2020.02.9 LTS release of Buildroot for Chapter 6, Selecting a Build System. If you have not, then refer to the System requirements section of The Buildroot user manual (https://buildroot.org/downloads/manual/manual.html) before installing Buildroot on your Linux host according to the instructions from Chapter 6.

All of the code for this chapter can be found in the Chapter20 folder from the book's GitHub repository: https://github.com/PacktPublishing/Mastering-Embedded-Linux-Programming-Third-Edition.

The observer effect

Before diving into the tools, let's talk about what the tools will show you. As is the case in many fields, measuring a certain property affects the observation itself. Measuring the electric current in a power supply line requires measuring the voltage drop over a small resistor. However, the resistor itself affects the current. The same is true for profiling: every system observation has a cost in CPU cycles, and that resource is no longer spent on the application. Measurement tools also mess up caching behavior, eat memory space, and write to disk, which all make it worse. There is no measurement without overhead.

I've often heard engineers say that the results of a profiling job were totally misleading. That is usually because they were performing the measurements on something not approaching a real situation. Always try to measure on the target, using release builds of the software, with a valid dataset, using as few extra services as possible...

Beginning to profile

When looking at the entire system, a good place to start is with a simple tool such as
top, which gives you an overview very quickly. It shows you how much memory is being used, which processes are eating CPU cycles, and how this is spread across different cores and times.

If top shows that a single application is using up all the CPU cycles in user space, then you can profile that application using perf.

If two or more processes have a high CPU usage, there is probably something that is coupling them together, perhaps data communication. If a lot of cycles are spent on system calls or handling interrupts, then there may be an issue with the kernel configuration or with a device driver. In either case, you need to start by taking a profile
of the whole system, again using perf.

If you want to find out more about the kernel and the sequencing of events there, use Ftrace, LTTng, or BPF.

There could be other problems that top will not help you with. If...

Profiling with top

The top program is a simple tool that doesn't require any special kernel options or symbol tables. There is a basic version in BusyBox and a more functional version in the procps package, which is available in the Yocto Project and Buildroot. You may also want to consider using htop, which has functionally similar to top but has a nicer user interface (some people think).

To begin with, focus on the summary line of top, which is the second line if you are using BusyBox and the third line if you are using top from procps. Here is an example, using BusyBox's top:

Mem: 57044K used, 446172K free, 40K shrd, 3352K buff, 34452K cached
CPU: 58% usr 4% sys 0% nic 0% idle 37% io 0% irq 0% sirq
Load average: 0.24 0.06 0.02 2/51 105
PID PPID USER STAT VSZ %VSZ %CPU COMMAND
105 104 root R 27912 6% 61% ffmpeg -i track2.wav
[…]

The summary line shows the percentage of time spent running in various states, as shown in this table:

...

The poor man's profiler

You can profile an application just by using GDB to stop it at arbitrary intervals to see what it is doing. This is the poor man's profiler. It is easy to set up and is one way of gathering profile data.

The procedure is simple:

  1. Attach to the process using gdbserver (for a remote debug) or GDB (for a
    native debug). The process stops.
  2. Observe the function it stopped in. You can use the backtrace GDB command
    to see the call stack.
  3. Type continue so that the program resumes.
  4. After a while, press Ctrl + C to stop it again, and go back to step 2.

If you repeat steps 2 to 4 several times, you will quickly get an idea of whether it is looping or making progress, and if you repeat them often enough, you will get an idea of where the hotspots in the code are.

There is a whole web page dedicated to this idea at http://poormansprofiler.org, together with scripts that make it a little easier. I have used this technique many times...

Introducing perf

perf is an abbreviation of the Linux performance event counter subsystem,
perf_events, and also the name of the command-line tool for interacting with
perf_events. Both have been part of the kernel since Linux 2.6.31. There is plenty of useful information in the Linux source tree in tools/perf/Documentation as well as at https://perf.wiki.kernel.org.

The initial impetus for developing perf was to provide a unified way to access the registers of the performance measurement unit (PMU), which is part of most modern processor cores. Once the API was defined and integrated into Linux, it became logical to extend it to cover other types of performance counters.

At its heart, perf is a collection of event counters with rules about when they actively collect data. By setting the rules, you can capture data from the whole system, just the kernel, or just one process and its children, and do it across all CPUs or just one CPU. It is very flexible. With this one tool, you...

Tracing events

The tools we have seen so far all use statistical sampling. You often want to know more about the ordering of events so that you can see them and relate them to each other. Function tracing involves instrumenting the code with tracepoints that capture information about the event, and may include some or all of the following:

  • A timestamp
  • Context, such as the current PID
  • Function parameters and return values
  • A callstack

It is more intrusive than statistical profiling and it can generate a large amount of data. The latter problem can be mitigated by applying filters when the sample is captured and later on when viewing the trace.

I will cover three trace tools here: the kernel function tracers Ftrace, LTTng, and BPF.

Introducing Ftrace

The kernel function tracer Ftrace evolved from work done by Steven Rostedt and many others as they were tracking down the causes of high scheduling latency in real-time applications. Ftrace appeared in Linux 2.6.27 and has been actively developed since then. There are a number of documents describing kernel tracing in the kernel source in Documentation/trace.

Ftrace consists of a number of tracers that can log various types of activity in the kernel. Here, I am going to talk about the function and function_graph tracers and the event tracepoints. In Chapter 21, Real-Time Programming, I will revisit Ftrace and use it to show real-time latencies.

The function tracer instruments each kernel function so that calls can be recorded and timestamped. As a matter of interest, it compiles the kernel with the -pg switch to inject the instrumentation. The function_graph tracer goes further and records both the entry and exit of functions so that it can create a call graph...

Using LTTng

The Linux Trace Toolkit (LTT) project was started by Karim Yaghmour as a means of tracing kernel activity and was one of the first trace tools generally available for the Linux kernel. Later, Mathieu Desnoyers took up the idea and re-implemented it as a next-generation trace tool, LTTng. It was then expanded to cover user space traces as well as the kernel. The project website is at https://lttng.org/ and contains a comprehensive user manual.

LTTng consists of three components:

  • A core session manager
  • A kernel tracer implemented as a group of kernel modules
  • A user space tracer implemented as a library

In addition to those, you will need a trace viewer such as Babeltrace (https://babeltrace.org) or the Eclipse Trace Compass plugin to display and filter the raw trace data on the host or target.

LTTng requires a kernel configured with CONFIG_TRACEPOINTS, which is enabled when you select Kernel hacking | Tracers | Kernel Function Tracer.

The description...

Using BPF

BPF (Berkeley Packet Filter) is a technology that was first introduced in 1992 to capture, filter, and analyze network traffic. In 2013, Alexi Starovoitov undertook a rewrite of BPF with help from Daniel Borkmann. Their work, then known as eBPF (extended BPF), was merged into the kernel in 2014, where it has been available since Linux 3.15. BPF provides a sandboxed execution environment for running programs inside the Linux kernel. BPF programs are written in C and are just-in-time (JIT) compiled to native code. Before that can happen, the intermediate BPF bytecode must first pass through a series of safety checks so that a program cannot crash the kernel.

Despite its networking origins, BPF is now a general-purpose virtual machine running inside the Linux kernel. By making it easy to run small programs on specific kernel and application events, BPF has quickly emerged as the most powerful tracer for Linux. Like what cgroups did for containerized deployments, BPF has the...

Using Valgrind

I introduced Valgrind in Chapter 18, Managing Memory, as a tool for identifying memory problems using the memcheck tool. Valgrind has other useful tools for application profiling. The two I am going to look at here are Callgrind and Helgrind. Since Valgrind works by running the code in a sandbox, it can check the code as it runs and report certain behaviors, which native tracers and profilers cannot do.

Callgrind

Callgrind is a call graph-generating profiler that also collects information about processor cache hit rate and branch prediction. Callgrind is only useful if your bottleneck is CPU-bound. It's not useful if heavy I/O or multiple processes are involved.

Valgrind does not require kernel configuration, but it does need debug symbols.
It is available as a target package in both the Yocto Project and Buildroot
(BR2_PACKAGE_VALGRIND).

You run Callgrind in Valgrind on the target like so:

# valgrind --tool=callgrind <program>

This produces...

Using strace

I started the chapter with a simple and ubiquitous tool, top, and I will finish with another: strace. It is a very simple tracer that captures system calls made by a program and, optionally, its children. You can use it to do the following:

  • Learn which system calls a program makes.
  • Find those system calls that fail, together with the error code. I find this useful
    if a program fails to start but doesn't print an error message or if the message is
    too general.
  • Find which files a program opens.
  • Find out which syscalls a running program is making, for example, to see whether it is stuck in a loop.

There are many more examples online; just search for strace tips and tricks. Everybody
has their own favorite story, for example, https://alexbilson.dev/posts/strace-debug/.

strace uses the ptrace(2) function to hook calls as they are made from user space to the kernel. If you want to know more about how ptrace works, the manual page is detailed...

Summary

Nobody can complain that Linux lacks options for profiling and tracing. This chapter has given you an overview of some of the most common ones.

When faced with a system that is not performing as well as you would like, start with top and try to identify the problem. If it proves to be a single application, then you can use perf record/report to profile it, bearing in mind that you will have to configure the kernel to enable perf and you will need debug symbols for the binaries and kernel. If the problem is not so well localized, use perf or BCC tools to get a system-wide view.

Ftrace comes into its own when you have specific questions about the behavior of the kernel. The function and function_graph tracers provide a detailed view of the relationship and sequence of function calls. The event tracers allow you to extract more information about functions, including the parameters and return values. LTTng performs a similar role, making use of the event trace mechanism,...

Further reading

I highly recommend Systems Performance: Enterprise and the Cloud, Second Edition, and BPF Performance Tools: Linux System and Application Observability, both by Brendan Gregg.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Embedded Linux Programming - Third Edition
Published in: May 2021Publisher: PacktISBN-13: 9781789530384
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Frank Vasquez

Frank Vasquez is an independent software consultant specializing in consumer electronics. He has over a decade of experience designing and building embedded Linux systems. During that time, he has shipped numerous devices including a rackmount DSP audio server, a diver-held sonar camcorder, and a consumer IoT hotspot. Before his career as an embedded Linux engineer, Frank was a database kernel developer at IBM where he worked on DB2. He lives in Silicon Valley.
Read more about Frank Vasquez

author image
Chris Simmonds

Chris Simmonds is a software consultant and trainer living in southern England. He has almost two decades of experience in designing and building open-source embedded systems. He is the founder and chief consultant at 2net Ltd, which provides professional training and mentoring services in embedded Linux, Linux device drivers, and Android platform development. He has trained engineers at many of the biggest companies in the embedded world, including ARM, Qualcomm, Intel, Ericsson, and General Dynamics. He is a frequent presenter at open source and embedded conferences, including the Embedded Linux Conference and Embedded World.
Read more about Chris Simmonds