What do you get with Print?

Instant access to your digital copy whilst your Print order is Shipped

Paperback book shipped to your preferred address

Redeem a companion digital copy on all Print orders

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

AI Assistant (beta) to help accelerate your learning

The Software Developer's Guide to Linux

Working with Processes

As a developer, you are already intuitively familiar with processes. They are the fruits of your labor: after writing and debugging code, your program finally executes, transforming into a beautiful operating system process!

A process on Linux can be a long-running application, a quick shell command like ls, or anything that the kernel spawns to do some work on the system. If something is getting done in Linux, a process is doing it. Your web browser, text editor, vulnerability scanner, and even things like reading files and the commands you’ve learned so far all spawn a process.

Linux’s process model is important to understand because the abstraction it gives you – the Linux process – is what all the commands and tools you’ll use to manage processes depend on. Gone are the details you’re used to seeing from a developer’s perspective: variables, functions, and threads have all been encapsulated as “a process.” You’re left with a different, external set of knobs to manipulate and gauges to check: process ID, status, resource usage, and all the other process attributes we’ll be covering in this chapter.

First, we’ll take a close look at the process abstraction itself, and then we’ll dive into useful, practical things you can do with Linux processes. While we’re covering the practical aspects, we’ll pause to add detail to a few aspects that are a common source of problems, like permissions, and give you some heuristics for troubleshooting processes.

In this chapter, you’ll learn about the following topics:

What a Linux process is, and how to see the processes currently running on your system
The attributes a process has, so you know what information you can gather while troubleshooting
Common commands for viewing and finding processes
More advanced topics that can come in handy for a developer actually writing programs that execute as Linux processes: Signals and inter-process communication, the /proc virtual filesystem, seeing open file handles with the lsof command, and how processes are created in Linux

You’ll also get a practical review of everything you’ve learned in an example troubleshooting session that uses the theory and commands we cover in this chapter. Now, let’s dive into what exactly a Linux process is.

Process basics

When we refer to a “process” in Linux, we’re referring to the operating system’s internal model of what exactly a running program is. Linux needs a general abstraction that works for all programs, which can encapsulate the things the operating system cares about. A process is that abstraction, and it enables the OS to track some of the important context around programs that are executing; namely:

Memory usage
Processor time used
Other system resource usage (disk access, network usage)
Communication between processes
Related processes that a program starts, for example, firing off a shell command

You can get a listing of all system processes (at least the ones your user is allowed to see) by running the ps program with the aux flags:

Figure 2.1: List of system processes

We’ll cover the attributes most relevant to your work as a developer in this chapter.

What is a Linux process made of?

From the perspective of the operating system, a “process” is simply a data structure that makes it easy to access information like:

Process ID (PID in the ps output above). PID 1 is the init system – the original parent of all other processes, which bootstraps the system. The kernel starts this as one of the first things it does after starting to execute. When a process is created, it gets the next available process ID, in sequential order. Because it is so important to the normal functioning of the operating system, init cannot be killed, even by the root user. Different Unix operating systems use different init systems – for example, most Linux distributions use systemd, while macOS uses launchd, and many other Unixes use SysV. Regardless of the specific implementation, we’ll refer to this process by the name of the role it fills: “init.”

Note

In containers, processes are namespaced – in the “real” environment, all container processes might be PID 3210, while that single PID maps to lots of processes (1..n, where n is the number of running processes in the container). You can see this from outside but not inside the container.

Parent Process PID (PPID). Each process is spawned by a parent. If the parent process dies while the child is alive, the child becomes an “orphan.” Orphaned processes are re-parented to init (PID 1).
Status (STAT in the ps output above). man ps will show you an overview:
- D – uninterruptible sleep (usually IO)
- I – idle kernel thread
- R – running or runnable (on run queue)
- S – interruptible sleep (waiting for an event to complete)
- T – stopped by job control signal
- t – stopped by debugger during tracing
- X – dead (should never be seen)
- Z – defunct (“zombie”) process, terminated but not reaped by its parent
Priority status (“niceness” – does this process allow other processes to take priority over it?).
A process Owner (USER in the ps output above); the effective user ID.
Effective Group ID (EGID), which is used.
An address map of the process’s memory space.
Resource usage – open files, network ports, and other resources the process is using (VSZ and RSS for memory usage in the ps output above).

(Citation: from the Unix and Linux System Administration Handbook, 5th edition, p.91.)

Let’s take a closer look at a few of the process attributes that are most important for developers and occasional troubleshooters to understand.

Process ID (PID)

Each process is uniquely identifiable by its process ID, which is just a unique integer that is assigned to a process when it starts. Much like a relational database with IDs that uniquely identify each row of data, the Linux operating system keeps track of each process by its PID.

A PID is by far the most useful label for you to use when interacting with processes.

Effective User ID (EUID) and Effective Group ID (EGID)

These determine which system user and group your process is running as. Together, user and group permissions determine what a process is allowed to do on the system.

As you’ll see in Chapter 5, Introducing Files, files have user and group ownership set on them, which determines who their permissions apply to. If a file’s ownership and permissions are essentially a lock, then a process with the right user/group permissions is like a key that opens the lock and allows access to the file. We’ll dive deeper into this later, when we talk about permissions.

Environment variables

You’ve probably used environment variables in your applications – they’re a way for the operating system environment that launches your process to pass in data that the process needs. This commonly includes things like configuration directives (LOG_DEBUG=1) and secret keys (AWS_SECRET_KEY), and every programming language has some way to read them out from the context of the program.

For example, this Python script gets the user’s home directory from the HOME environment variable, and then prints it:

import os
home_dir = os.environ['HOME']
print("The home directory for this user is", home_dir)

In my case, running this program in the python3 REPL on a Linux machine results in the following output:

The home directory for this user is /home/dcohen

Working directory

A process has a “current working directory,” just like your shell (which is just a process, anyway). Typing pwd in your shell prints its current working directory, and every process has a working directory. The working directory for a process can change, so don’t rely on it too much.

This concludes our overview of the process attributes that you should know about. In the next section, we’ll step away from theory and look at some commands you can use to start working with processes right away.

Advanced process concepts and tools

This marks the beginning of the “advanced” section of this chapter. While you don’t need to master all the concepts in this section to work effectively with Linux processes, they can be extremely helpful. If you have a few extra minutes, we recommend at least familiarizing yourself with each one.

Signals

How does systemctl tell your web server to re-read its configuration files? How can you politely ask a process to shut down cleanly? And how can you kill a malfunctioning process immediately, because it’s bringing your production application to its knees?

In Unix and Linux, all of this is done with signals. Signals are numerical messages that can be sent between programs. They’re a way for processes to communicate with each other and with the operating system, allowing processes to send and receive specific messages.

These messages can be used to communicate a variety of things to a process, for example, indicating that a particular event has happened or that a specific action or response is required.

Practical uses of signals

Let’s look at a few examples of the practical value that the signal mechanism enables. Signals can be used to implement inter-process communication; for example, one process can send a signal to another process indicating that it’s finished with a particular task and that the other process can now start working. This allows processes to coordinate their actions and work together in a smooth and efficient manner, much like execution threads in programming languages (but without the associated memory sharing).

Another common application of process signals is to handle program errors. For example, a process can be designed to catch the SIGSEGV signal, which indicates a segmentation fault. When a process receives this signal, it can trap that signal and then take action to log the error, dump core for debugging purposes, or clean up any resources that were being used before shutting down gracefully.

Process signals can also be used to implement graceful shutdowns. For example, when a system is shutting down, a signal can be sent to all processes to give them a chance to save their state and clean up any resources they were using, via “trapping” signals.

Trapping

Many of the signals can be “trapped” by the processes that receive them: this is essentially the same idea as catching and handling an error in a programming language.

If the receiving process has a handler function for the signal that’s being sent, then that handler function is run. That’s how programs re-read their configuration without restarting, and finish their database writes and close their file handles after receiving the shutdown signal.

The kill command

However, it’s not just processes that communicate via signals: the frighteningly named (and, technically speaking, incorrectly named) kill is a program that allows users to send signals to processes, too.

One of the most common uses of user-sent processes via the kill command is to interrupt a process that is no longer responding. For example, if a process is stuck in an infinite loop, a “kill” signal can be sent to force it to stop.

The kill command allows you to send a signal to a process by specifying its PID. If the process you’d like to terminate has PID 2600, you’d run:

kill 2600

This command would send signal 15 (SIGTERM, or “terminate”) to the process, which would then have a chance to trap the signal and shut down cleanly.

Note

As you can see from the included table of standard signal numbers, the default signal that kill sends is “terminate” (signal 15), not “kill” (SIGKILL is 9). The kill program is not just for killing processes but also for sending any kind of signal. It’s really confusingly named and I’m sorry about that – it’s just one of those idiosyncrasies of Unix and Linux that you’ll get used to.

If you don’t want to send the default signal 15, you can specify the signal you’d like to send with a dash; to send a SIGHUP to the same process, you’d run:

kill –1 2600

Running man signal will give you a list of signals that you can send:

Figure 2.6: Example of output of the man signal command

It pays – sometimes quite literally, in engineering interviews – to be familiar with a few of these:

SIGHUP (1) – “hangup”: interpreted by many applications – for example, nginx – as “re-read your configuration because I’ve made changes to it.”
SIGINT (2) – “interrupt”: often interpreted the same as SIGTERM - “please shut down cleanly.”
SIGTERM (15) – “terminate”: nicely asks a process to shut down.
SIGUSR1 (30) and SIGUSR2 (31) are sometimes used for application-defined messaging For example, SIGUSR1 asks nginx to re-open the log files it’s writing to, which is useful if you’ve just rotated them.
SIGKILL (9) – SIGKILL cannot be trapped and handled by processes. If this signal is sent to a program, the operating system will kill that program immediately. Any cleanup code, like flushing writes or safe shutdown, is not performed, so this is generally a last resort, since it could lead to data corruption.

If you want to explore Linux a bit deeper, feel free to poke around the /proc directory. That’s definitely beyond the basics, but it’s a directory that contains a filesystem subtree for every process, where live information about the processes is looked up as you read those files.

/proc

In practice, this knowledge can come in handy during troubleshooting when you’ve identified a misbehaving (or mysterious) process and want to know exactly what it’s doing in real time.

You can learn a lot about a process by poking around in its /proc subdirectory and casually googling.

Many of the tools we show you in this chapter actually use /proc to gather process information, and only show you a subset of what’s there. If you want to see everything and do the filtering yourself, /proc is the place to look.

lsof – show file handles that a process has open

The lsof command shows all files that a process has opened for reading and writing. This is useful because it only takes one small bug for a program to leak file handles (internal references to files that it has requested access to). This can lead to resource usage issues, file corruption, and a long list of strange behavior.

Thankfully, getting a list of files that a process has open is easy. Just run lsof and pass the –p flag with a PID (you’ll usually have to run this as root). This will return the list of files that the process (in this case, with PID 1589) has open:

  ~ lsof -p 1589

Figure 2.7: Example of list of files opened by the 1589 process using the lsof -p 1589 command

The above is the output for an nginx web server process. The first line shows you the current working directory for the process: in this case, the root directory (/). You can also see that it has file handles open on its own binary (/usr/sbin/nginx) and various libraries in /usr/lib/.

Further down, you might notice a few more interesting filepaths:

Figure 2.8: Further opened files of the 1589 process

This listing includes the log files nginx is writing to, and socket files (Unix, IPv4, and IPv6) that it’s reading and writing to. In Unix and Linux, network sockets are just a special kind of file, which makes it easy to use the same core toolset across a wide variety of use cases – tools that work with files are extremely powerful in an environment where almost everything is represented as a file.

Inheritance

Except for the very first process, init (PID 1), all processes are created by a parent process, which essentially makes a copy of itself and then “forks” (splits) that copy off. When a process is forked, it typically inherits its parent’s permissions, environment variables, and other attributes.

Although this default behavior can be prevented and changed, it’s a bit of a security risk: software that you run manually receives the permissions of your current user (or even root privileges, if you use sudo). All child processes that might be created by that process – for example, during installation, compilation, and so on – inherit those permissions.

Imagine a web server process that was started with root privileges (so it could bind to a network port) and environment variables containing cloud authentication keys (so it could grab data from the cloud). When this main process forks off a child process that needs neither root privileges nor sensitive environment variables, it’s an unnecessary security risk to pass those along to the child. As a result, dropping privileges and clearing environment variables is a common pattern in services spawning child processes.

From a security perspective, it is important to keep this in mind to prevent situations where information such as passwords or access to sensitive files could be leaked. While it is outside the scope of this book to go into details of how to avoid this, it’s important to be aware of this if you’re writing software that’s going to run on Linux systems.

Review – example troubleshooting session

Let’s look at an example troubleshooting session. All we know is that one specific Linux server is running extremely slowly.

To begin with, we want to see what’s happening on the system. You just learned that you can see a live view of processes running on a system by running the interactive top command. Let’s try that now.

Figure 2.9: Example of output of the top command

By default, the top command sorts processes by CPU usage, so we can simply look at the first listed process to find the offending one. Indeed, the top process is using 94% of one CPU’s available processing time.

As a result of running top, we’ve gotten a few useful pieces of information:

The problem is CPU usage, as opposed to some other kind of resource contention.
The offending process is PID 1763, and the command being run (listed in the COMMAND column) is bzip2, which is a compression program.

We determine that this bzip2 process doesn’t need to be running here, and we decide to stop it. Using the kill command, we ask the process to terminate:

kill 1763

After waiting a few seconds, we check to see if this (or any other) bzip2 process is running:

pgrep bzip2

Unfortunately, we see that the same PID is still running. It’s time to get serious:

kill –9 1763

This orders the operating system to kill the process without allowing the process to trap (and potentially ignore) the signal. A SIGKILL (signal #9) simply kills the process where it stands.

Now that you’ve killed the offending process, the server is running smoothly again and you can start tracking down the developer who thought it was a good idea to compress large source directories on this machine.

In this example, we followed the most common systems troubleshooting pattern in existence:

We looked at resource usage (via top in this example). This can be any of the other tools we discussed, depending on which resource is the one being exhausted.
We found a PID to investigate.
We acted on that process. In this example, no further investigation was necessary and we sent a signal, asking it to shut down (15, SIGTERM).

Key benefits

A practical, no-nonsense guide specifically written for developers (not sysadmins) who need to quickly learn command-line skills

Expand your practical skills and look like a wizard on the command line

Build practical skills to work effectively with the most common CLI tools on Unix-like systems

Description

Developers are always looking to raise their game to the next level, yet most are completely lost when it comes to the Linux command line. This book is the bridge that will take you to the next level in your software development career. Most of the skills in the book can be immediately put to work to make you a more efficient developer. It’s written specifically for software engineers, not Linux system administrators, so each chapter will equip you with just enough theory to understand what you’re doing before diving into practical commands that you can use in your day-to-day work as a software developer. As you work through the book, you’ll quickly absorb the basics of how Linux works while you get comfortable moving around the command line. Once you’ve got the core skills, you’ll see how to apply them in different contexts that you’ll come across as a software developer: building and working with Docker images, automating boring build tasks with shell scripts, and troubleshooting issues in production environments. By the end of the book, you’ll be able to use Linux and the command line comfortably and apply your newfound skills in your day-to-day work to save time, troubleshoot issues, and be the command-line wizard that your team turns to.

Who is this book for?

This book is for software developers who want to build practical Command-Line (CLI) and Linux skills and who want to quickly fill the gap to advance their skills and their career. Basic knowledge of editing text, working with files and folders, having some idea of what “operating systems” are, installing software, and using a development environment is assumed.

What you will learn

Learn useful command-line tricks and tools that make software development, testing, and troubleshooting easy

Understand how Linux and command line environments actually work

Create powerful, customized tools and save thousands of lines of code with developer-centric Linux utilities

Gain hands-on experience with Docker, SSH, and Shell scripting tasks that make you a more effective developer

Get comfortable searching logs and troubleshooting problems on Linux servers

Handle common command-line situations that stump other developers

What do you get with Print?

Instant access to your digital copy whilst your Print order is Shipped

Paperback book shipped to your preferred address

Redeem a companion digital copy on all Print orders

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

AI Assistant (beta) to help accelerate your learning

Frequently bought together

$31.99 ~~$39.99~~

$31.99 ~~$39.99~~

Interactive Visualization and Plotting with Julia

$37.59 ~~$46.99~~

Total $ 101.57 126.97 25.40 saved

Filter reviews by

All

Packt verified reviews

Amazon verified reviews

Stefan Dec 14, 2023

Can't wait for this book to come out. I've already had lots of fun with find and exec and further reading. Hope you include a chapter on designing scripts for parallel execution.

Subscriber review

Matthew Sanabria Jan 29, 2024

This book is great for engineers looking to cover the Linux fundamentals that matter on the job without getting lost in the fluff provided by other books. The author makes liberal use of terminal screenshots combined with succinct explanations of every command and argument that's being executed. The best part for busy engineers is this book doesn't require you to read it linearly. You can jump to the chapters that interest you the most without worrying that you're missing context from other chapters. Granted, there are some basic skills you might want to cover first, but this book presents those skills in a easy to digest format that you can read them on the fly. As someone that has witnessed software engineers struggling with Linux, I finally have a resource to recommend them that'll help them hone their Linux skills. Thank you for a wonderful book!

Amazon Verified review

Omar Jan 29, 2024

I’ve been a Linux admin in the past, and I find this book to be a great refresher for anyone looking to brush up on their Linux command line and utilities. It’s straightforward, well-structured, and covers all the essentials. Whether you’re new to Linux or just need to dust off your skills, this book is perfect. I only wish I had this resource when I first started in Linux administration. I’ll definitely be sharing and recommending this book to friends and colleague.

Leroy Jenkins Jan 29, 2024

It's easy to read, providing approachable and useful examples for novice Linux users. If you're new to Linux CLI, this would be a great introductory book to gain a basic understanding of some of the more commonly used Linux commands. The table of contents makes it easy to look up the commands you're interested in, allowing you to learn what you need in the order you need, not necessarily in the order it was written making it a useful go-to reference when you can't remember how to do something on the Linux CLI.

Vaibhav Nanoti Jul 30, 2024

I recently started reading The Software Developer’s Guide to Linux by David Cohen and Christian Sturm, and I wanted to share my thoughts on this remarkable workSummary: This book is for software developers who are passionate about Linux and command line or who are out of practice and quickly dust off their skill.Why I Recommend It:This book offers unique insights, practical adviceIt has profoundly impacted my understanding of Containerizing Applications with Docker, and I believe it can be incredibly valuable for anyone interested in Linux development field.

The Software Developer's Guide to Linux: A practical, no-nonsense guide to using the Linux command line and utilities as a software developer

What do you get with Print?

The Software Developer's Guide to Linux

Working with Processes

Process basics

What is a Linux process made of?

Process ID (PID)

Effective User ID (EUID) and Effective Group ID (EGID)

Environment variables

Working directory

Practical commands for working with Linux processes

Advanced process concepts and tools

Signals

Practical uses of signals

Trapping

The kill command

lsof – show file handles that a process has open

Inheritance

Review – example troubleshooting session

Conclusion

Learn more on Discord

Page 1 of 6

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the authors

FAQs

The Software Developer's Guide to Linux: A practical, no-nonsense guide to using the Linux command line and utilities as a software developer

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the authors

FAQs

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access