Home Programming Hands-On System Programming with Linux

Hands-On System Programming with Linux

By Kaiwan N. Billimoria , Tigran Aivazian
books-svg-icon Book
eBook $43.99 $29.99
Print $54.99
Subscription $15.99 $10 p/m for three months
$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
eBook $43.99 $29.99
Print $54.99
Subscription $15.99 $10 p/m for three months
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
  1. Free Chapter
    Linux System Architecture
About this book
The Linux OS and its embedded and server applications are critical components of today’s software infrastructure in a decentralized, networked universe. The industry's demand for proficient Linux developers is only rising with time. Hands-On System Programming with Linux gives you a solid theoretical base and practical industry-relevant descriptions, and covers the Linux system programming domain. It delves into the art and science of Linux application programming— system architecture, process memory and management, signaling, timers, pthreads, and file IO. This book goes beyond the use API X to do Y approach; it explains the concepts and theories required to understand programming interfaces and design decisions, the tradeoffs made by experienced developers when using them, and the rationale behind them. Troubleshooting tips and techniques are included in the concluding chapter. By the end of this book, you will have gained essential conceptual design knowledge and hands-on experience working with Linux system programming interfaces.
Publication date:
October 2018
Publisher
Packt
Pages
794
ISBN
9781788998475

 

Linux System Architecture

This chapter informs the reader about the system architecture of the Linux ecosystem. It first conveys the elegant Unix philosophy and design fundamentals, then delves into the details of the Linux system architecture. The importance of the ABI, CPU privilege levels, and how modern operating systems (OSes) exploit them, along with the Linux system architecture's layering, and how Linux is a monolithic architecture, will be covered. The (simplified) flow of a system call API, as well as kernel-code execution contexts, are key points.

In this chapter, the reader will be taken through the following topics:

  • The Unix philosophy in a nutshell
  • Architecture preliminaries
  • Linux architecture layers
  • Linux—a monolithic OS
  • Kernel execution contexts

Along the way, we'll use simple examples to make the key philosophical and architectural points clear.

 

Technical requirements

A modern desktop PC or laptop is required; Ubuntu Desktop specifies the following as recommended system requirements for installation and usage of the distribution:

  • 2 GHz dual core processor or better
  • RAM
    • Running on a physical host: 2 GB or more system memory
    • Running as a guest: The host system should have at least 4 GB RAM (the more, the better and smoother the experience)
  • 25 GB of free hard drive space
  • Either a DVD drive or a USB port for the installer media
  • Internet access is definitely helpful

We recommend the reader use one of the following Linux distributions (can be installed as a guest OS on a Windows or Linux host system, as mentioned):

Note that these distributions are, in their default form, OSS and non-proprietary, and free to use as an end user.

There are instances where the entire code snippet isn't included in the book . Thus the GitHub URL to refer the codes: https://github.com/PacktPublishing/Hands-on-System-Programming-with-Linux.
Also, for the Further reading section, refer to the preceding GitHub link.
 

Linux and the Unix operating system

Moore's law famously states that the number of transistors in an IC will double (approximately) every two years (with an addendum that the cost would halve at pretty much the same rate). This law, which remained quite accurate for many years, is one of the things that clearly underscored what people came to realize, and even celebrate, about the electronics and the Information Technology (IT) industry; the sheer speed with which innovation and paradigm shifts in technology occur here is unparalleled. So much so that we now hardly raise an eyebrow when, every year, even every few months in some cases, new innovations and technology appear, challenge, and ultimately discard the old with little ceremony.

Against this backdrop of rapid all-consuming change, there lives an engaging anomaly: an OS whose essential design, philosophy, and architecture have changed hardly at all in close to five decades. Yes, we are referring to the venerable Unix operating system.

Organically emerging from a doomed project at AT&T's Bell Labs (Multics) in around 1969, Unix took the world by storm. Well, for a while at least.

But, you say, this is a book about Linux; why all this information about Unix? Simply because, at heart, Linux is the latest avatar of the venerable Unix OS. Linux is a Unix-like operating system (among several others). The code, by legal necessity, is unique; however, the design, philosophy, and architecture of Linux are pretty much identical to those of Unix.

 

The Unix philosophy in a nutshell

To understand anyone (or anything), one must strive to first understand their (or its) underlying philosophy; to begin to understand Linux is to begin to understand the Unix philosophy. Here, we shall not attempt to delve into every minute detail; rather, an overall understanding of the essentials of the Unix philosophy is our goal. Also, when we use the term Unix, we very much also mean Linux!

The way that software (particularly, tools) is designed, built, and maintained on Unix slowly evolved into what might even be called a pattern that stuck: the Unix design philosophy. At its heart, here are the pillars of the Unix philosophy, design, and architecture:

  • Everything is a process; if it's not a process, it's a file
  • One tool to do one task
  • Three standard I/O channel
  • Combine tools seamlessly
  • Plain text preferred
  • CLI, not GUI
  • Modular, designed to be repurposed by others
  • Provide the mechanism, not the policy

Let's examine these pillars a little more closely, shall we?

Everything is a process – if it's not a process, it's a file

A process is an instance of a program in execution. A file is an object on the filesystem; beside regular file with plain text or binary content; it could also be a directory, a symbolic link, a device-special file, a named pipe, or a (Unix-domain) socket.

The Unix design philosophy abstracts peripheral devices (such as the keyboard, monitor, mouse, a sensor, and touchscreen) as files – what it calls device files. By doing this, Unix allows the application programmer to conveniently ignore the details and just treat (peripheral) devices as though they are ordinary disk files.

The kernel provides a layer to handle this very abstraction – it's called the Virtual Filesystem Switch (VFS). So, with this in place, the application developer can open a device file and perform I/O (reads and writes) upon it, all using the usual API interfaces provided (relax, these APIs will be covered in a subsequent chapter).

In fact, every process inherits three files on creation:

  • Standard input (stdin: fd 0): The keyboard device, by default
  • Standard output (stdout: fd 1): The monitor (or terminal) device, by default
  • Standard error (stderr: fd 2): The monitor (or terminal) device, by default
fd is the common abbreviation, especially in code, for file descriptor; it's an integer value that refers to the open file in question.

Also, note that we mention it's a certain device by default – this implies the defaults can be changed. Indeed, this is a key part of the design: changing standard input, output, or error channels is called redirection, and by using the familiar <, > and 2> shell operators, these file channels are redirected to other files or devices.

On Unix, there exists a class of programs called filters.

A filter is a program that reads from its standard input, possibly modifies the input, and writes the filtered result to its standard output.

Filters on Unix are very common utilities, such as cat, wc, sort, grep, perl, head, and tail.

Filters allow Unix to easily sidestep design and code complexity. How?

Let's take the sort filter as a quick example. Okay, we'll need some data to sort. Let's say we run the following commands:

$ cat fruit.txt
orange
banana
apple
pear
grape
pineapple
lemon
cherry
papaya
mango
$

Now we consider four scenarios of using sort; based on the parameter(s) we pass, we are actually performing explicit or implicit input-, output-, and/or error-redirection!

Scenario 1: Sort a file alphabetically (one parameter, input implicitly redirected to file):

$ sort fruit.txt
apple
banana
cherry
grape
lemon
mango
orange
papaya
pear
pineapple
$

All right!

Hang on a second, though. If sort is a filter (and it is), it should read from its stdin (the keyboard) and write to its stdout (the terminal). It is indeed writing to the terminal device, but it's reading from a file, fruit.txt.

This is deliberate; if a parameter is provided, the sort program treats it as standard input, as clearly seen.

Also, note that sort fruit.txt is identical to sort < fruit.txt.

Scenario 2: Sort any given input alphabetically (no parameters, input and output from and to stdin/stdout):

$ sort 
mango
apple
pear
^D
apple
mango
pear
$

Once you type sort and press the Enter key, and the sort process comes alive and just waits. Why? It's waiting for you, the user, to type something. Why? Recall, every process by default reads its input from standard input or stdin – the keyboard device! So, we type in some fruit names. When we're done, press Ctrl + D. This is the default character sequence that signifies end-of-file (EOF), or in cases such as this, end-of-input. Voila! The input is sorted and written. To where? To the sort process's stdout – the terminal device, hence we see it.

Scenario 3: Sort any given input alphabetically and save the output to a file (explicit output redirection):

$ sort > sorted.fruit.txt
mango
apple
pear
^D
$

Similar to Scenario 2, we type in some fruit names and then Ctrl + D to tell sort we're done. This time, though, note that the output is redirected (via the > meta-character) to the sorted.fruits.txt file!

So, as expected is the following output:

$ cat sorted.fruit.txt
apple
mango
pear
$

Scenario 4: Sort a file alphabetically and save the output and errors to a file (explicit input-, output-, and error-redirection):

$ sort < fruit.txt > sorted.fruit.txt 2> /dev/null
$

Interestingly, the end result is the same as in the preceding scenario, with the added advantage of redirecting any error output to the error channel. Here, we redirect the error output (recall that file descriptor 2 always refers to stderr) to the /dev/null special device file; /dev/null is a device file whose job is to act as a sink (a black hole). Anything written to the null device just disappears forever! (Who said there isn't magic on Unix?) Also, its complement is /dev/zero; the zero device is a source an infinite source of zeros. Reading from it returns zeroes (the first ASCII character, not numeric 0); it has no end-of-file!

One tool to do one task

In the Unix design, one tries to avoid creating a Swiss Army knife; instead, one creates a tool for a very specific, designated purpose and for that one purpose only. No ifs, no buts; no cruft, no clutter. This is design simplicity at its best.

"Simplicity is the ultimate sophistication."
- Leonardo da Vinci

Take a common example: when working on the Linux CLI (command-line interface), you would like to figure out which of your locally mounted filesystems has the most available (disk) space.

We can get the list of locally mounted filesystems by an appropriate switch (just df would do as well):

$ df --local
Filesystem 1K-blocks Used Available Use% Mounted on
rootfs 20640636 1155492 18436728 6% /
udev 10240 0 10240 0% /dev
tmpfs 51444 160 51284 1% /run
tmpfs 5120 0 5120 0% /run/lock
tmpfs 102880 0 102880 0% /run/shm
$

To sort the output, one would need to first save it to a file; one could use a temporary file for this purpose, tmp, and then sort it, using the sort utility, of course. Finally, we delete the offending temporary file. (Yes, there's a better way, piping; refer to the, Combine tools seamlessly section)

Note that the available space is the fourth column, so we sort accordingly:

$ df --local > tmp
$ sort -k4nr tmp
rootfs 20640636 1155484 18436736 6% /
tmpfs 102880 0 102880 0% /run/shm
tmpfs 51444 160 51284 1% /run
udev 10240 0 10240 0% /dev
tmpfs 5120 0 5120 0% /run/lock
Filesystem 1K-blocks Used Available Use% Mounted on
$

Whoops! The output includes the heading line. Let's first use the versatile sed utility a powerful non-interactive editor tool to eliminate the first line, the header, from the output of df:

$ df --local > tmp
$ sed --in-place '1d' tmp
$ sort -k4nr tmp
rootfs 20640636 1155484 18436736 6% /
tmpfs 102880 0 102880 0% /run/shm
tmpfs 51444 160 51284 1% /run
udev 10240 0 10240 0% /dev
tmpfs 5120 0 5120 0% /run/lock
$ rm -f tmp

So what? The point is, on Unix, there is no one utility to list mounted filesystems and sort them by available space simultaneously.

Instead, there is a utility to list mounted filesystems: df. It does a great job of it, with option switches to choose from. (How does one know which options? Learn to use the man pages, they're extremely useful.)

There is a utility to sort text: sort. Again, it's the last word in sorting text, with plenty of option switches to choose from for pretty much every conceivable sort one might require.

The Linux man pages: man is short for manual; on a Terminal window, type man man to get help on using man. Notice the manual is divided into 9 sections. For example, to get the manual page on the stat system call, type man 2 stat as all system calls are in section 2 of the manual. The convention used is cmd or API; thus, we refer to it as stat(2).

As expected, we obtain the results. So what exactly is the point? It's this: we used three utilities, not one. df , to list the mounted filesystems (and their related metadata), sed, to eliminate the header line, and sort, to sort whatever input its given (in any conceivable manner).

df can query and list mounted filesystems, but it cannot sort them. sort can sort text; it cannot list mounted filesystems.

Think about that for a moment.

Combine them all, and you get more than the sum of its parts! Unix tools typically do one task and they do it to its logical conclusion; no one does it better!

Having said this, I would like to point out a tiny bit sheepishly the highly renowned tool Busybox. Busybox (http://busybox.net) is billed as The Swiss Army Knife of Embedded Linux. It is indeed a very versatile tool; it has its place in the embedded Linux ecosystem precisely because it would be too expensive on an embedded box to have separate binary executables for each and every utility (and it would consume more RAM). Busybox solves this problem by having a single binary executable (along with symbolic links to it from each of its applets, such as ls, ps, df, and sort).
So, nevertheless, besides the embedded scenario and all the resource limitations it implies, do follow the One tool to do one task rule!

Three standard I/O channels

Several popular Unix tools (technically, filters) are, again, deliberately designed to read their input from a standard file descriptor called standard input (stdin) – possibly modify it, and write their resultant output to a standard file descriptor standard output (stdout). Any error output can be written to a separate error channel called standard error (stderr).

In conjunction with the shell's redirection operators (> for output-redirection and < for input-redirection, 2> for stderr redirection), and even more importantly with piping (refer section, Combine tools seamlessly), this enables a program designer to highly simplify. There's no need to hardcode (or even softcode, for that matter) input and output sources or sinks. It just works, as expected.

Let's review a couple of quick examples to illustrate this important point.

Word count

How many lines of source code are there in the C netcat.c source file I downloaded? (Here, we use a small part of the popular open source netcat utility code base.) We use the wc utility. Before we go further, what's wc? word count (wc) is a filter: it reads input from stdin, counts the number of lines, words, and characters in the input stream, and writes this result to its stdout. Further, as a convenience, one can pass filenames as parameters to it; passing the -l option switch has wc only print the number of lines:

$ wc -l src/netcat.c
618 src/netcat.c
$

Here, the input is a filename passed as a parameter to wc.

Interestingly, we should by now realize that if we do not pass it any parameters, wc would read its input from stdin, which by default is the keyboard device. For example is shown as follows:

$ wc -l
hey, a small
quick test
of reading from stdin
by wc!
^D
4
$

Yes, we typed in 4 lines to stdin; thus the result is 4, written to stdout – the terminal device by default.

Here is the beauty of it:

$ wc -l < src/netcat.c > num
$ cat num
618
$

As we can see, wc is a great example of a Unix filter.

cat

Unix, and of course Linux, users learn to quickly get familiar with the daily-use cat utility. At first glance, all cat does is spit out the contents of a file to the terminal.

For example, say we have two plain text files, myfile1.txt and myfile2.txt:

$ cat myfile1.txt
Hello,
Linux System Programming,
World.
$ cat myfile2.txt
Okey dokey,
bye now.
$

Okay. Now check this out:

$ cat myfile1.txt myfile2.txt
Hello,
Linux System Programming,
World.
Okey dokey,
bye now.
$

Instead of needing to run cat twice, we ran it just once, by passing the two filenames to it as parameters.

In theory, one can pass any number of parameters to cat: it will use them all, one by one!

Not just that, one can use shell wildcards too (* and ?; in reality, the shell will first expand the wildcards, and pass on the resultant path names to the program being invoked as parameters):

$ cat myfile?.txt
Hello,
Linux System Programming,
World.
Okey dokey,
bye now.
$

This, in fact, illustrates another key point: any number of parameters or none is considered the right way to design a program. Of course, there are exceptions to every rule: some programs demand mandatory parameters.

Wait, there's more. cat too, is an excellent example of a Unix filter (recall: a filter is a program that reads from its standard input, modifies its input in some manner, and writes the result to its standard output).

So, quick quiz, if we just run cat with no parameters, what would happen?
Well, let's try it out and see:

$ cat
hello,
hello,
oh cool
oh cool
it reads from stdin,
it reads from stdin,
and echoes whatever it reads to stdout!
and echoes whatever it reads to stdout!
ok bye
ok bye
^D
$

Wow, look at that: cat blocks (waits) at its stdin, the user types in a string and presses the Enter key, cat responds by copying its stdin to its stdout – no surprise there, as that's the job of cat in a nutshell!

One realizes the commands shown as follows:

  • cat fname is the same as cat < fname
  • cat > fname creates or overwrites the fname file

There's no reason we can't use cat to append several files together:

$ cat fname1 fname2 fname3 > final_fname
$

There's no reason this must be done with only plain text files; one can join together binary files too.

In fact, that's what the utility does – it concatenates files. Thus its name; as is the norm on Unix, is highly abbreviated – from concatenate to just cat. Again, clean and elegant – the Unix way.

cat shunts out file contents to stdout, in order. What if one wants to display a file's contents in reverse order (last line first)? Use the Unix tac utility yes, that's cat spelled backward!

Also, FYI, we saw that cat can be used to efficiently join files. Guess what: the split (1) utility can be used to break a file up into pieces.

Combine tools seamlessly

We just saw that common Unix utilities are often designed as filters, giving them the ability to read from their standard input and write to their standard output. This concept is elegantly extended to seamlessly combine together multiple utilities, using an IPC mechanism called a pipe.

Also, we recall that the Unix philosophy embraces the do one task only design. What if we have one program that does task A and another that does task B and we want to combine them? Ah, that's exactly what pipes do! Refer to the following code:

prg_does_taskA | prg_does_taskB

A pipe essentially is redirection performed twice: the output of the left-hand program becomes the input to the right-hand program. Of course, this implies that the program on the left must write to stdout, and the program on the read must read from stdin.

An example: sort the list of mounted filesystems by space available (in reverse order).

As we have already discussed this example in the One tool to do one task section, we shall not repeat the same information.

Option 1: Perform the following code using a temporary file (refer section, One tool to do one task):

$ df --local | sed '1d' > tmp
$ sed --in-place '1d' tmp
$ sort -k4nr tmp
rootfs 20640636 1155484 18436736 6% /
tmpfs 102880 0 102880 0% /run/shm
tmpfs 51444 160 51284 1% /run
udev 10240 0 10240 0% /dev
tmpfs 5120 0 5120 0% /run/lock
$ rm -f tmp

Option 2 : Using pipes—clean and elegant:

$ df --local | sed '1d' | sort -k4nr
rootfs 20640636 1155492 18436728 6% /
tmpfs 102880 0 102880 0% /run/shm
tmpfs 51444 160 51284 1% /run
udev 10240 0 10240 0% /dev
tmpfs 5120 0 5120 0% /run/lock
$

Not only is this elegant, it is also far superior performance-wise, as writing to memory (the pipe is a memory object) is much faster than writing to disk.

One can extend this notion and combine multiple tools over multiple pipes; in effect, one can build a super tool from several regular tools by combining them.

As an example: display the three processes taking the most (physical) memory; only display their PID, virtual size (VSZ), resident set size (RSS) (RSS is a fairly accurate measure of physical memory usage), and the name:

$ ps au | sed '1d' | awk '{printf("%6d %10d %10d %-32s\n", $2, $5, $6, $11)}' | sort -k3n | tail -n3
10746 3219556 665252 /usr/lib64/firefox/firefox
10840 3444456 1105088 /usr/lib64/firefox/firefox
1465 5119800 1354280 /usr/bin/gnome-shell
$

Here, we've combined five utilities, ps, sed, awk, sort, and tail, over four pipes. Nice!

Another example: display the process, not including daemons*, taking up the most memory (RSS):

ps aux | awk '{if ($7 != "?") print $0}' | sort -k6n | tail -n1
A daemon is a system background process; we'll cover this concept in Daemon Process here: https://www.packtpub.com/sites/default/files/downloads/Daemon_Processes.pdf.

Plain text preferred

Unix programs are generally designed to work with text as it's a universal interface. Of course, there are several utilities that do indeed operate on binary objects (such as object and executable files); we aren't referring to them here. The point is this: Unix programs are designed to work on text as it simplifies the design and architecture of the program.

A common example: an application, on startup, parses a configuration file. The configuration file could be formatted as a binary blob. On the other hand, having it as a plain text file renders it easily readable (invaluable!) and therefore easier to understand and maintain. One might argue that parsing binary would be faster. Perhaps to some extent this is so, but consider the following:

  • With modern hardware, the difference is probably not significant
  • A standardized plain text format (such as XML) would have optimized code to parse it, yielding both benefits

Remember, simplicity is key!

CLI, not GUI

The Unix OS, and all its applications, utilities, and tools, were always built to be used from a command-line-interface (CLI), typically, the shell. From the 1980s onward, the need for a Graphical User Interface (GUI) became apparent.

Robert Scheifler of MIT, considered the chief design architect behind the X Window System, built an exceedingly clean and elegant architecture, a key component of which is this: the GUI forms a layer (well, actually, several layers) above the OS, providing libraries for GUI clients, that is, applications.

The GUI was never designed to be intrinsic to applications or the OS—it's always optional.

This architecture still holds up today. Having said that, especially on embedded Linux, performance reasons are seeing the advent of newer architectures, such as the frame buffer and Wayland. Also, though Android, which uses the Linux kernel, necessitates a GUI for the end user, the system developer's interface to Android, ADB, is a CLI.

A huge number of production-embedded and server Linux systems run purely on CLI interfaces. The GUI is almost like an add-on feature, for the end user's ease of operation.

Wherever appropriate, design your tools to work in the CLI environment; adapting it into a GUI at a later point is then straightforward.
Cleanly and carefully separating the business logic of the project or product from its GUI is a key to good design.

Modular, designed to be repurposed by others

From its very early days, the Unix OS was deliberately designed and coded with the tacit assumption that multiple programmers would work on the system. Thus, the culture of writing clean, elegant, and understandable code, to be read and worked upon by other competent programmers, was ingrained.

Later, with the advent of the Unix wars, proprietary and legal concerns overrode this sharing model. Interestingly, history shows that the Unix's were fading in relevance and industry use, until the timely advent of none other than the Linux OS – an open source ecosystem at its very best! Today, the Linux OS is widely acknowledged as the most successful GNU project. Ironic indeed!

Provide mechanisms, not policies

Let's understand this principle with a simple example.

When designing an application, you need to have the user enter a login name and password. The function that performs the work of getting and checking the password is called, let's say, mygetpass(). It's invoked by the mylogin() function: mylogin() → mygetpass().

Now, the protocol to be followed is this: if the user gets the password wrong three times in a row, the program should not allow access (and should log the case). Fine, but where do we check this?

The Unix philosophy: do not implement the logic, if the password is specified wrongly three times, abort in the mygetpass() function. Instead, just have mygetpass() return a Boolean (true when the password is right, false when the password is wrong), and have the mylogin() calling function implement whatever logic is required.

Pseudocode

The following is the wrong approach:

mygetpass()
{
numtries=1

<get the password>

if (password-is-wrong) {
numtries ++
if (numtries >= 3)
{
<write and log failure message>
<abort>
}
}
<password correct, continue>
}
mylogin()
{
mygetpass()
}

Now let's take a look at the right approach: the Unix way! Refer to the following code:

mygetpass()
{
<get the password>

if (password-is-wrong)
return false;

return true;
}
mylogin()
{
maxtries = 3

while (maxtries--) {
if (mygetpass() == true)
<move along, call other routines>
}

// If we're here, we've failed to provide the
// correct password
<write and log failure message>
<abort>
}

The job of mygetpass() is to get a password from the user and check whether it's correct; it returns success or failure to the caller – that's it. That's the mechanism. It is not its job to decide what to do if the password is wrong – that's the policy, and left to the caller.

Now that we've covered the Unix philosophy in a nutshell, what are the important takeaways for you, the system developer on Linux?

Learning from, and following, the Unix philosophy when designing and implementing your applications on the Linux OS will provide a huge payoff. Your application will do the following:

  • Be a natural fit on the system; this is very important
  • Have greatly reduced complexity
  • Have a modular design that is clean and elegant
  • Be far more maintainable
 

Linux system architecture

In order to clearly understand the Linux system architecture, one needs to first understand a few important concepts: the processor Application Binary Interface (ABI), CPU privilege levels, and how these affect the code we write. Accordingly, and with a few code examples, we'll delve into these here, before diving into the details of the system architecture itself.

Preliminaries

If one is posed the question, "what is the CPU for?", the answer is pretty obvious: the CPU is the heart of the machine – it reads in, decodes, and executes machine instructions, working on memory and peripherals. It does this by incorporating various stages.

Very simplistically, in the Instruction Fetch stage, it reads in machine instructions (which we represent in various human-readable ways – in hexadecimal, assembly, and high-level languages) from memory (RAM) or CPU cache. Then, in the Instruction Decode phase, it proceeds to decipher the instruction. Along the way, it makes use of the control unit, its register set, ALU, and memory/peripheral interfaces.

The ABI

Let's imagine that we write a C program, and run it on the machine.

Well, hang on a second. C code cannot possibly be directly deciphered by the CPU; it must be converted into machine language. So, we understand that on modern systems we will have a toolchain installed – this includes the compiler, linker, library objects, and various other tools. We compile and link the C source code, converting it into an executable format that can be run on the system.

The processor Instruction Set Architecture (ISA) – documents the machine's instruction formats, the addressing schemes it supports, and its register model. In fact, CPU Original Equipment Manufacturers (OEMs) release a document that describes how the machine works; this document is generally called the ABI. The ABI describes more than just the ISA; it describes the machine instruction formats, the register set details, the calling convention, the linking semantics, and the executable file format, such as ELF. Try out a quick Google for x86 ABI – it should reveal interesting results.

The publisher makes the full source code for this book available on their website; we urge the reader to perform a quick Git clone on the following URL. Build and try it: https://github.com/PacktPublishing/Hands-on-System-Programming-with-Linux.

Let's try this out. First, we write a simple Hello, World type of C program:

 $ cat hello.c
/*
* hello.c
*
****************************************************************
* This program is part of the source code released for the book
* "Linux System Programming"
* (c) Kaiwan N Billimoria
* Packt Publishers
*
* From:
* Ch 1 : Linux System Architecture
****************************************************************
* A quick 'Hello, World'-like program to demonstrate using
* objdump to show the corresponding assembly and machine
* language.
*/
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

int main(void)
{
int a;

printf("Hello, Linux System Programming, World!\n");
a = 5;
exit(0);
}
$

We build the application via the Makefile, with make. Ideally, the code must compile with no warnings:

$ gcc -Wall -Wextra hello.c -o hello
hello.c: In function ‘main':
hello.c:23:6: warning: variable ‘a' set but not used [-Wunused-but-set-variable]
int a;
^
$
Important! Do not ignore compiler warnings with production code. Strive to get rid of all warnings, even the seemingly trivial ones; this will help a great deal with correctness, stability, and security.

In this trivial example code, we understand and anticipate the unused variable warning that gcc emits, and just ignore it for the purpose of this demo.

The exact warning and/or error messages you see on your system could differ from what you see here. This is because my Linux distribution (and version), compiler/linker, library versions, and perhaps even CPU, may differ from yours. I built this on a x86_64 box running the Fedora 27/28 Linux distribution.

Similarly, we build the debug version of the hello program (again, ignoring the warning for now), and run it:

$ make hello_dbg
[...]
$ ./hello_dbg
Hello, Linux System Programming, World!
$

We use the powerful objdump utility to see the intermixed source-assembly-machine language of our program (objdump's --source option switch
-S, --source Intermix source code with disassembly):

$ objdump --source ./hello_dbg
./hello_dbg: file format elf64-x86-64

Disassembly of section .init:

0000000000400400 <_init>:
400400: 48 83 ec 08 sub $0x8,%rsp

[...]

int main(void)
{
400527: 55 push %rbp
400528: 48 89 e5 mov %rsp,%rbp
40052b: 48 83 ec 10 sub $0x10,%rsp
int a;

printf("Hello, Linux System Programming, World!\n");
40052f: bf e0 05 40 00 mov $0x4005e0,%edi
400534: e8 f7 fe ff ff callq 400430 <puts@plt>
a = 5;
400539: c7 45 fc 05 00 00 00 movl $0x5,-0x4(%rbp)
exit(0);
400540: bf 00 00 00 00 mov $0x0,%edi
400545: e8 f6 fe ff ff callq 400440 <exit@plt>
40054a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)

[...]

$
The exact assembly and machine code you see on your system will, in all likelihood, differ from what you see here; this is because my Linux distribution (and version), compiler/linker, library versions, and perhaps even CPU, may differ from yours. I built this on a x86_64 box running Fedora Core 27.

Alright. Let's take the line of source code a = 5; where, objdump reveals the corresponding machine and assembly language:

    a = 5;
400539: c7 45 fc 05 00 00 00 movl $0x5,-0x4(%rbp)

We can now clearly see the following:

C source Assembly language Machine instructions
a = 5; movl $0x5,-0x4(%rbp) c7 45 fc 05 00 00 00

So, when the process runs, at some point it will fetch and execute the machine instructions, producing the desired result. Indeed, that's exactly what a programmable computer is designed to do!

Though we have shown examples of displaying (and even writing a bit of) assembly and machine code for the Intel CPU, the concepts and principles behind this discussion hold up for other CPU architectures, such as ARM, PPC, and MIPS. Covering similar examples for all these CPUs goes beyond the scope of this book; however, we urge the interested reader to study the processor datasheet and ABI, and try it out.

Accessing a register's content via inline assembly

Now that we've written a simple C program and seen its assembly and machine code, let's move on to something a little more challenging: a C program with inline assembly to access the contents of a CPU register.

Details on assembly-language programming are outside the scope of this book; refer to the Further reading section on the GitHub repository.

x86_64 has several registers; let's just go with the ordinary RCX register for this example. We do make use of an interesting trick: the x86 ABI calling convention states that the return value of a function will be the value placed in the accumulator, that is, RAX for the x86_64. Using this knowledge, we write a function that uses inline assembly to place the content of the register we want into RAX. This ensures that this is what it will return to the caller!

Assembly micro-basics includes the following:

at&t syntax:
movq <src_reg>, <dest_reg>
Register : prefix name with %
Immediate value : prefix with $

For more, see the Further reading section on the GitHub repository.

Let's take a look at the following code:

$ cat getreg_rcx.c
/*
* getreg_rcx.c
*
****************************************************************
* This program is part of the source code released for the book
* "Linux System Programming"
* (c) Kaiwan N Billimoria
* Packt Publishers
*
* From:
* Ch 1 : Linux System Architecture
****************************************************************
* Inline assembly to access the contents of a CPU register.
* NOTE: this program is written to work on x86_64 only.
*/
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

typedef unsigned long u64;

static u64 get_rcx(void)
{
/* Pro Tip: x86 ABI: query a register's value by moving its value into RAX.
* [RAX] is returned by the function! */
__asm__ __volatile__(
"push %rcx\n\t"
"movq $5, %rcx\n\t"
"movq %rcx, %rax");
/* at&t syntax: movq <src_reg>, <dest_reg> */
__asm__ __volatile__("pop %rcx");
}

int main(void)
{
printf("Hello, inline assembly:\n [RCX] = 0x%lx\n",
get_rcx());
exit(0);
}
$ gcc -Wall -Wextra getreg_rcx.c -o getreg_rcx
getreg_rcx.c: In function ‘get_rcx':
getreg_rcx.c:32:1: warning: no return statement in function returning non-void [-Wreturn-type]
}
^
$ ./getreg_rcx
Hello, inline assembly:
[RCX] = 0x5
$

There; it works as expected.

Accessing a control register's content via inline assembly

Among the many fascinating registers on the x86_64 processor, there happen to be six control registers, named CR0 through CR4, and CR8. There's really no need to delve into detail regarding them; suffice it to say that they are crucial to system control.

For the purpose of an illustrative example, let's consider the
CR0 register for a moment. Intel's manual states: CR0—contains system control flags that control operating mode and states of the processor.

Intel's manuals can be downloaded conveniently as PDF documents from here (includes the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3 (3A, 3B and 3C): System Programming Guide):

https://software.intel.com/en-us/articles/intel-sdm

Clearly, CR0 is an important register!
We modify our previous program to access and display its content (instead of the ordinary RCX register). The only relevant code (which has changed from the previous program) is the function that queries the CR0 register value:

static u64 get_cr0(void)
{
/* Pro Tip: x86 ABI: query a register's value by moving it's value into RAX.
* [RAX] is returned by the function! */
__asm__ __volatile__("movq %cr0, %rax");
/* at&t syntax: movq <src_reg>, <dest_reg> */
}

Build and run it:

$ make getreg_cr0
[...]
$ ./getreg_cr0
Segmentation fault (core dumped)
$

It crashes!

Well, what happened here? Read on.

CPU privilege levels

As mentioned earlier in this chapter, the essential job of the CPU is to read in machine instructions from memory, decipher, and execute them. In the early days of computing, this is pretty much all the processor did. But then, engineers, thinking deeper on it, realized that there is a critical issue with this: if a programmer can feed an arbitrary stream of machine instructions to the processor, which it, in turn, blindly and obediently executes, herein lies scope to do damage, to hack the machine!

How? Recall from the previous section the Intel processor's
CR0 control register: Contains system control flags that control operating mode and states of the processor. If one has unlimited (read/write) access to the CR0 register, one could toggle bits that could do the following:

  • Turn hardware paging on or off
  • Disable the CPU cache
  • Change caching and alignment attributes
  • Disable WP (write protect) on memory (technically, pages) marked as read-only by the OS

Wow, a hacker could indeed wreak havoc. At the very least, only the OS should be allowed this kind of access.

Precisely for reasons such as the security, robustness, and correctness of the OS and the hardware resources it controls, all modern CPUs include the notion of privilege levels.

The modern CPU will support at least two privilege levels, or modes, which are generically called the following:

  • Supervisor
  • User

You need to understand that code, that is, machine instructions, runs on the CPU at a given privilege level or mode. A person designing and implementing an OS is free to exploit the processor privilege levels. This is exactly how modern OSes are designed. Take a look at the following table Generic CPU Privilege Levels:

Privilege level or mode name Privilege level Purpose Terminology
Supervisor High OS code runs here kernel-space
User Low Application code runs here user-space (or userland)
Table 1: Generic CPU Privilege Levels

Privilege levels or rings on the x86

To understand this important concept better, let's take the popular x86 architecture as a real example. Right from the i386 onward, the Intel processor supports four privilege levels or rings: Ring 0, Ring 1, Ring 2, and Ring 3. On the Intel CPU's, this is how the levels work:

Figure 1: CPU ring levels and privilege

Let's visualize this Figure 1 in the form of a Table 2: x86 privilege or ring levels:

Privilege or ring level Privilege Purpose
Ring 0 Highest OS code runs here
Ring 1 < ring 0 <Unused>
Ring 2 < ring 1 <Unused>
Ring 3 Lowest Application code runs here (userland)
Table 2: x86 privilege or ring levels
Originally, ring levels 1 and 2 were intended for device drivers, but modern OSes typically run driver code at ring 0 itself. Some hypervisors (VirtualBox being one) used to use Ring 1 to run the guest kernel code; this was the case earlier when no hardware virtualization support was available (Intel VT-x, AMD SV).

The ARM (32-bit) processor has seven modes of execution; of these, six are privileged, and only one is the non-privileged mode. On ARM, generically, the equivalent to Intel's Ring 0 is Supervisor (SVC) mode, and the equivalent to Intel's Ring 3 is User mode.

For interested readers, there are more links in the Further reading section on the GitHub repository.

The following diagram clearly shows of all modern OSes (Linux, Unix, Windows, and macOS) running on an x86 processor exploit processor-privilege levels:

Figure 2: User-Kernel separation

Importantly, the processor ISA assigns every machine instruction with a privilege level or levels at which they are allowed to be executed. A machine instruction that is allowed to execute at the user privilege level automatically implies it can also be executed at the Supervisor privilege level. This distinguishing between what can and cannot be done at what mode also applies to register access.

To use the Intel terminology, the Current Privilege Level (CPL) is the privilege level at which the processor is currently executing code.

For example, that on a given processor shown as follows:

  • The foo1 machine instruction has an allowed privilege level of Supervisor (or Ring 0 for x86)
  • The foo2 machine instruction has an allowed privilege level of User (or Ring 3 for x86)

So, for a running application that executes these machine instructions, the following table emerges:

Machine instruction Allowed-at mode CPL (current privilege level) Works?
foo1 Supervisor (0) 0 Yes
3 No
foo2 User (3) 0 Yes
3 Yes
Table 3: Privilege levels an example
So, thinking about it, foo2 being allowed at User mode would also be allowed to execute with any CPL. In other words, if the CPL <= allowed privilege level, it works, otherwise it does not.

When one runs an application on, say, Linux, the application runs as a process (more on this later). But what privilege (or mode or ring) level does the application code run at? Refer to the preceding table: User Mode (Ring 3 on x86).

Aha! So now we see. The preceding code example, getreg_rcx.c, worked because it attempted to access the content of the general-purpose RCX register, which is allowed in User Mode (Ring 3, as well as at the other levels, of course)!

But the code of getreg_cr0.c failed; it crashed, because it attempted to access the content of the CR0 control register, which is disallowed in User Mode (Ring 3), and allowed only at the Ring 0 privilege! Only OS or kernel code can access the control registers. This holds true for several other sensitive assembly-language instructions as well. This approach makes a lot of sense.

Technically, it crashed because the processor raised a General Protection Fault (GPF).

Linux architecture

The Linux system architecture is a layered one. In a very simplistic way, but ideal to start on our path to understanding these details, the following diagram illustrates the Linux system architecture:

Figure 3: Linux – Simplified layered architecture

Layers help, because each layer need only be concerned with the layer directly above and below it. This leads to many advantages:

  • Clean design, reduces complexity
  • Standardization, interoperability
  • Ability to swap layers in and out of the stack
  • Ability to easily introduce new layers as required
On the last point, there exists the FTSE. To quote directly from Wikipedia:

The "fundamental theorem of software engineering (FTSE)" is a term originated by Andrew Koenig to describe a remark by Butler Lampson attributed to the late David J. Wheeler

We can solve any problem by introducing an extra level of indirection.

Now that we understand the concept of CPU modes or privilege levels, and how modern OSes exploit them, a better diagram (expanding on the previous one) of the Linux system architecture would be as follows:

Figure 4: Linux system architecture

In the preceding diagram, P1, P2, …, Pn are nothing but userland processes (Process 1, Process 2) or in other words, running applications. For example, on a Linux laptop, we might have the vim editor, a web browser, and terminal windows (gnome-terminal) running.

Libraries

Libraries, of course, are archives (collections) of code; as we well know, using libraries helps tremendously with code modularity, standardization, preventing the reinvent-the-wheel syndrome, and so on. A Linux desktop system might have libraries numbering in the hundreds, and possibly even a few thousand!

The classic K&R hello, world C program uses the printf API to write the string to the display:

printf(“hello, world\n”);

Obviously, the code of printf is not part of the hello, world source. So where does it come from? It's part of the standard C library; on Linux, due to its GNU origins, this library is commonly called GNU libc (glibc).

Glibc is a critical and required component on a Linux box. It not only contains the usual standard C library routines (APIs), it is, in fact, the programming interface to the operating system! How? Via its lower layer, the system calls.

System calls

System calls are actually kernel functionality that can be invoked from userspace via glibc stub routines. They serve a critical function; they connect userspace to kernel-space. If a user program wants to request something of the kernel (read from a file, write to the network, change a file's permissions), it does so by issuing a system call. Therefore, system calls are the only legal entry point to the kernel. There is no other way for a user-space process to invoke the kernel.

For a list of all the available Linux system calls, see section 2 of the man pages (https://linux.die.net/man/2/). One can also do: man 2 syscalls to see the man page on all supported system calls

Another way to think of this: the Linux kernel internally has literally thousands of APIs (or functions). Of these, only a small fraction are made visible or available, that is, exposed, to userspace; these exposed kernel APIs are system calls! Again, as an approximation, modern Linux glibc has around 300 system calls.

On an x86_64 Fedora 27 box running the 4.13.16-302.fc27.x86_64 kernel, there are close to 53,000 kernel APIs!

Here is the key thing to understand: system calls are very different from all other (typically library) APIs. As they ultimately invoke kernel (OS) code, they have the ability to cross the user-kernel boundary; in effect, they have the ability to switch from normal unprivileged User mode to completely privileged Supervisor or kernel mode!

How? Without delving into the gory details, system calls essentially work by invoking special machine instructions that have the built-in ability to switch the processor mode from User to Supervisor. All modern CPU ABIs will provide at least one such machine instruction; on the x86 processor, the traditional way to implement system calls is to use the special int 0x80 machine instruction. Yes, it is indeed a software interrupt (or trap). From Pentium Pro and Linux 2.6 onward, the sysenter/syscall machine instructions are used. See the Further reading section on the GitHub repository.

From the viewpoint of the application developer, a key point regarding system calls is that system calls appear to be regular functions (APIs) that can be invoked by the developer; this design is deliberate. The reality: the system call APIs that one invokes – such as open(), read(), chmod(), dup(), and write() – are merely stubs. They are a neat mechanism to get at the actual code that is in the kernel (getting there involves populating a register the accumulator on x86 with the system call number, and passing parameters via other general-purpose registers) to execute that kernel code path, and return back to user mode when done. Refer to the following table:

CPU

Machine instruction(s) used to trap to Supervisor (kernel) Mode from User Mode

Allocated Register for system call number

x86[_64]

int 0x80 or syscall

EAX / RAX

ARM

swi / svc

R0 to R7

Aarch64

svc

X8

MIPS

syscall

$v0

Table 4: System calls on various CPU Architectures for better understanding

Linux – a monolithic OS

Operating systems are generally considered to adhere to one of two major architectural styles: monolithic or microkernel.

Linux is decidedly a monolithic OS.

What does that mean?

The English word monolith literally means a large single upright block of stone:

Figure 5: Corinthian columns they're monolithic!

On the Linux OS, applications run as independent entities called processes. A process may be single-threaded (original Unix) or multithreaded. Regardless, for now, we will consider the process as the unit of execution on Linux; a process is defined as an instance of a program in execution.

When a user-space process issues a library call, the library API, in turn, may or may not issue a system call. For example, issuing the atoi(3) API does not cause glibc to issue a system call as it does not require kernel support to implement the conversion of a string into an integer. <api-name>(n) ; n is the man page section.

To help clarify these important concepts, let's check out the famous and classic K&R Hello, World C program again:

#include <stdio.h>
main()
{
printf(“hello, world\n”);
}

Okay, that should work. Indeed it does.
But, the question is, how exactly does the printf(3) API write to the monitor device?

The short answer: it does not.
The reality is that printf(3) only has the intelligence to format a string as specified; that's it. Once done, printf actually invokes the write(2) API – a system call. The write system call does have the ability to write the buffer content to a special device file – the monitor device, seen by write as stdout. Go back to our discussion regarding The Unix philosophy in a nutshell : if it's not a process, it's a file! Of course, it gets really complex under the hood in the kernel; to cut a long story short, the kernel code of write ultimately switches to the correct driver code; the device driver is the only component that can directly work with peripheral hardware. It performs the actual write to the monitor, and return values propagate all the way back to the application.

In the following diagram, P is the hello, world process at runtime:

Fig 6: Code flow: printf-to-kernel

Also, from the diagram, we can see that glibc is considered to consist of two parts:

  • Arch-independent glibc: The regular libc APIs (such as [s|sn|v]printf, memcpy, memcmp, atoi)
  • Arch-dependent glibc: The system call stubs
Here, by arch, we mean CPU.
Also the ellipses (...) represent additional logic and processing within kernel-space that we do not show or delve into here.

Now that the code flow path of hello, world is clearer, let's get back to the monolithic stuff!

It's easy to assume that it works this way:

  1. The hello, world app (process) issues the printf(3) library call.
  2. printf issues the write(2) system call.
  3. We switch from User to Supervisor (kernel) Mode.
  4. The kernel takes over – it writes hello, world onto the monitor.
  5. Switch back to non-privileged User Mode.

Actually, that's NOT the case.

The reality is, in the monolithic design, there is no kernel; to word it another way, the kernel is actually part of the process itself. It works as follows:

  1. The hello, world app (process) issues the printf(3) library call.
  2. printf issues the write(2) system call.
  3. The process invoking the system call now switches from User to Supervisor (kernel) Mode.
  4. The process runs the underlying kernel code, the underlying device driver code, and thus, writes hello, world onto the monitor!
  5. The process is then switched back to non-privileged User Mode.

To summarize, in a monolithic kernel, when a process (or thread) issues a system call, it switches to privileged Supervisor or kernel mode and runs the kernel code of the system call (working on kernel data). When done, it switches back to unprivileged User mode and continues executing userspace code (working on user data).

This is very important to understand:


Fig 7: Life of a process in terms of privilege modes

The preceding diagram attempts to illustrate that the X axis is the timeline, and the Y axis represents User Mode (at the top) and Supervisor (kernel) Mode (at the bottom):

  • time t0: A process is born in kernel mode (the code to create a process is within the kernel of course). Once fully born, it is switched to User (non-privileged) Mode and it runs its userspace code (working on its userspace data items as well).
  • time t1: The process, directly or indirectly (perhaps via a library API), invokes a system call. It now traps into kernel mode (refer the table System Calls on CPU Architectures shows the machine instructions depending on the CPU to do so) and executes kernel code in privileged Supervisor Mode (working on kernel data items as well).
  • time t2: The system call is done; the process switches back to non-privileged User Mode and continues to execute its userspace code. This process continues, until some point in the future.
  • time tn: The process dies, either deliberately by invoking the exit API, or it is killed by a signal. It now switches back to Supervisor Mode (as the exit(3) library API invokes the _exit(2) system call), executes the kernel code of _exit(), and terminates.

In fact, most modern operating systems are monolithic (especially the Unix-like ones).

Technically, Linux is not considered 100 percent monolithic. It's considered to be mostly monolithic, but also modular, due to the fact that the Linux kernel supports modularization (the plugging in and out of kernel code and data, via a technology called Loadable Kernel Modules (LKMs)).
Interestingly, MS Windows (specifically, from the NT kernel onward) follows a hybrid architecture that is both monolithic and microkernel.
 

Execution contexts within the kernel

Kernel code always executes in one of two contexts:

  • Process
  • Interrupt
It's easy to get confused here. Remember, this discussion applies to the context in which kernel code executes, not userspace code.

Process context

Now we understand that one can invoke kernel services by issuing a system call. When this occurs, the calling process runs the kernel code of the system call in kernel mode. This is termed process context kernel code is now running in the context of the process that invoked the system call.

Process context code has the following attributes:

  • Always triggered by a process (or thread) issuing a system call
  • Top-down approach
  • Synchronous execution of kernel code by a process

Interrupt context

At first glance, there appears to be no other way that kernel code executes. Well, think about this scenario: the network receive path. A network packet destined for your Ethernet MAC address arrives at the hardware adapter, the hardware detects that it's meant for it, collects it, and buffers it. It now must let the OS know; more technically, it must let the Network Interface Card (NIC) device driver know, so that it can fetch and process packets as they arrive. It kicks the NIC driver into action by asserting a hardware interrupt.

Recall that device drivers reside in kernel-space, and therefore their code runs in Supervisor or kernel Mode. The (kernel privilege) driver code Interrupt service routine (ISR) now executes, fetches the packet, and sends it up the OS network protocol stack for processing.

The NIC driver's ISR code is kernel code, and it is has run but in what context? It's obviously not in the context of any particular process. In fact, the hardware interrupt probably interrupted some process. Thus, we just call this interrupt context.

The interrupt context code has the following attributes:

  • Always triggered by a hardware interrupt (not a software interrupt, fault or exception; that's still process context)
  • Bottom-up approach
  • Asynchronous execution of kernel code by an interrupt
If, at some point, you do report a kernel bug, it helps if you point out the execution context.

Technically, within interrupt context, we have further distinctions, such as hard-IRQs and softirqs, bottom halves, and tasklets. However, this discussion goes beyond the scope of this book.

 

Summary

This chapter started by explaining the Unix design philosophy, including the central principles or pillars of the Unix philosophy, design, and architecture. We then described the Linux system architecture, where we covered the meaning of CPU-ABI (Application Binary Interface), ISA, and toolchain (using objdump to disassemble a simple program, and accessing CPU registers with inline assembly). CPU privilege levels and their importance in the modern OS were discussed, leading in to the Linux system architecture layers application, libraries, system calls, and the kernel. The chapter finished with a discussion on how Linux is a monolithic OS and then explored kernel execution contexts.

In the next chapter, the reader will delve into the mysteries of, and get a solid grasp of, virtual memory what exactly it means, why it's in all modern OSes, and the key benefits it provides. We will discuss relevant details of the making of process virtual address space.

About the Authors
  • Kaiwan N. Billimoria

    Kaiwan N. Billimoria taught himself BASIC programming on his dad's IBM PC back in 1983. He was programming in C and Assembly on DOS until he discovered the joys of Unix, and by around 1997, Linux! Kaiwan has worked on many aspects of the Linux system programming stack, including Bash scripting, system programming in C, kernel internals, device drivers, and embedded Linux work. He has actively worked on several commercial/FOSS projects. His contributions include drivers to the mainline Linux OS and many smaller projects hosted on GitHub. His Linux passion feeds well into his passion for teaching these topics to engineers, which he has done for well over two decades now. He's also the author of Hands-On System Programming with Linux, Linux Kernel Programming (and its Part 2 book) and Linux Kernel Debugging. It doesn't hurt that he is a recreational ultrarunner too.

    Browse publications by this author
  • Tigran Aivazian

    Tigran Aivazian has a Master's degree in Computer Science and a Master's degree in Theoretical Physics. He has written BFS and Intel Microcode Update drivers which became part of the official Linux kernel. He is the author of a book “Linux 2.4 Kernel Internals” which is available in several languages at the Linux Documentation Project. He worked at VERITAS as a Linux Kernel Architect, improving the kernel and teaching OS internals. For the Bible societies Tigran produced scholarly Bible editions in Hebrew, Greek, Syriac, Slavonic and ancient Armenian. Recently he published “The British Study Edition of the Urantia Papers”. He is currently working on the foundations of Quantum Mechanics on a branch of physics called Quantum Infodynamics.

    Browse publications by this author
Latest Reviews (8 reviews total)
Top! Love these books. Very simple process of purchase. Thank you!
I give 5 Stars and like to by another one next month
Ouvrage très théorique sur l'organisation du système.
Hands-On System Programming with Linux
Unlock this book and the full library FREE for 7 days
Start now