Mastering Bash

5 (6 reviews total)
By Giorgio Zarrelli
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Let's Start Programming

About this book

System administration is an everyday effort that involves a lot of tedious tasks, and devious pits. Knowing your environment is the key to unleashing the most powerful solution that will make your life easy as an administrator, and show you the path to new heights. Bash is your Swiss army knife to set up your working or home environment as you want, when you want.

This book will enable you to customize your system step by step, making your own real, virtual, home out of it. The journey will take you swiftly through the basis of the shell programming in Bash to more interesting and challenging tasks. You will be introduced to one of the most famous open source monitoring systems—Nagios, and write complex programs with it in any languages. You’ll see how to perform checks on your sites and applications.

Moving on, you’ll discover how to write your own daemons so you can create your services and take advantage of inter-process communication to let your scripts talk to each other. So, despite these being everyday tasks, you’ll have a lot of fun on the way. By the end of the book, you will have gained advanced knowledge of Bash that will help you automate routine tasks and manage your systems.

Publication date:
June 2017
Publisher
Packt
Pages
502
ISBN
9781784396879

 

Let's Start Programming

Mastering Bash is the art of taking advantage of your environment to make the best out of it. It is not just a matter of dealing with boring routine tasks that can be automated. It is crafting your working space so that it becomes more efficient for your goals. Thus, even though Bash scripting is not as expressive as other more complex languages, such as Python or JavaScript, it is simple enough to be grabbed in a short time, and so flexible that it will suffice for most of your everyday tasks, even the trickiest ones.

But is Bash so plain and easy? Let's have a look at our first lines in Bash. Let's begin with something easy:

gzarrelli:~$ time echo $0
/bin/bash
real 0m0.000s
user 0m0.000s
sys 0m0.000s
gzarrelli:~$

Now, let us do it again in a slightly different way:

gzarrelli:~$ time /bin/echo $0/bin/bash
real 0m0.001s
user 0m0.000s
sys 0m0.000s

What is interesting here is that the value of real is slightly different between the two commands. OK, but why? Let's dig a bit further with the following commands:

gzarrelli:~$ type echo
echo is a shell builtin
gzarrelli:~$ type /bin/echo
/bin/echo is /bin/echo

Interestingly enough, the first seems to be a shell builtin, the second simply a system program, an external utility, and it is here that lies the difference. builtin is a command that is built into the shell, the opposite of a system program, which is invoked by the shell. An internal command, the opposite to an external command.

To understand the difference between internal and external shell commands that lead to such different timing, we have to understand how an external program is invoked by the shell. When an external program is to be executed, Bash creates a copy of itself with the same environment of the parent shell, giving birth to a new process with a different process ID number. So to speak, we just saw how forking is carried out. Inside the new address space, a system exec is called to load the new process data.

For the builtin commands, it is a different story, Bash executes them without any forks, and this leads to a couple of the following interesting outcomes:

  • The builtin execution is faster because there are no copies and no executables invoked. One side note is that this advantage is more evident with short-running programs because the overhead is before any executable is called: once the external program is invoked, the difference in the pure execution time between the builtin command and the program is negligible.
  • Being internal to Bash, the builtin commands can affect its internal state, and this is not possible with the external program. Let's take into account a classic example using builtincd. If cd were an external program, once invoked from shell as:
cd /this_dir  
  • The first operation would be our shell forking a process for cd, and this latter would change the current directory for its own process, not for the one we are inside and that was forked to give birth to the cd process. The parent shell would remain unaffected. So, we would not go anywhere.

Curious about which bulitins are available? You have some options, to either execute the following builtin:

compgen -b  

Or this other builtin:

enable -a | awk '{ print $2 }'  

To better understand why there is a difference between the execution of a builtin and an external program, we must see what happens when we invoke a command.

  • First, remember that the shell works from left to right and takes all the variable assignments and redirections and saves them in order to process later.
  • If nothing else is left, the shell takes the first word from the command line as the name of the command itself, while all the rest is considered as its arguments.
  • The next step is dealing with the required input and output redirection.
  • Finally, before being assigned to a variable, all the text following the sign = is subject to tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal.
  • If no command name comes out as a result of the last operation, the variable can then affect the environment. If an assignment fails, an error is raised and the command invoked exits with a non-zero status.
  • If no command name is the outcome of the operation seen before, all the redirections are applied, but differently from variables, they do not affect the current environment. Again, if any error occurs, there is a non-zero status exit.

Once the preceding operations are performed, the command is then executed and exited with a status, depending on whether one or more expansions contain command substitutions. The overall exit status will be the one from the last command substitution, and if no command substitution were performed, the exit status will be zero.

At this point, we are finally left with a command name and some optional arguments. It is at this point the roads of builtins and external programs divert.

  • At first, the shell looks at the command name, and if there are no slashes, it searches for its location
  • If there are no slashes, the shell tries to see if there is a function with that name and executes it
  • If no functions are found, the shell tries to hit builtin, and if there is anyone with that name, it is executed

OK, now if there is any builtin, it already got invoked. What about an external program?

  • Our Bash goes on, and if it finds no builtins by that name on the command line, there are three chances:
    • The full path of the command to execute is already contained into its internal hash table, which is a structure used to speed up the search
    • If the full path is not in the hash, the shell looks for it into the content of the environmental PATH variable, and if it finds it, it is added to the hash table
    • The full path is not available in the PATH variable, so the shell returns with an exit status of 127

Hash can even be invoked as follows:

gzarrelli:~$ hash
hits command
1 /usr/bin/which
1 /usr/bin/ld
24 /bin/sh
1 /bin/ps
1 /usr/bin/who
1 /usr/bin/man
1 /bin/ls
1 /usr/bin/top

The second column will then tell you not only which commands have been hashed, but also how many times each of them has been executed during the current session (hits).

Let's say that the search found the full path to the command we want to execute; now we have a full path, and we are in the same situation as if the Bash found one or more slashes into the command name. In either case, the shell thinks that it has a good path to invoke a command and executes the latter in a forked environment.

This is when we are lucky, but it can happen that the file invoked is not an executable, and in this case, given that our path does not point to a directory instead of a file, the Bash makes an educated guess and thinks to run a shell script. In this case, the script is executed in a subshell that is at all a new environment, which inherits the content of the hash table of the parent shell.

Before doing anything else, the shell looks at the first line of the script for an optional sha-bang (we will see later what this is) - after the sha-bang, there is the path to the interpreter used to manage the script and some optional arguments.

At this point, and only at this point, your external command, if it is a script, is executed. If it is an executable, it is invoked a bit before, but way after any builtin.

During these first paragraphs, we saw some commands and concepts that should sound familiar to you. The next paragraphs of this chapter will quickly deal with some basic elements of Bash, such as variables, expansions, and redirections. If you already know them, you will be able to use the next pages as a reference while working on your scripts. If, on the contrary, you are not so familiar with them, have a look at what comes next because all you will read will be fundamental in understanding what you can do in and with the shell.

 

I/O redirection

As we saw in the previous pages, redirection is one of the last operations undertaken by Bash to parse and prepare the command line that will lead to the execution of a command. But what is a redirection? You can easily guess from your everyday experience. It means taking a stream that goes from one point to another and making it go somewhere else, like changing the flow of a river and making it go somewhere else. In Linux and Unix, it is quite the same, just keep in mind the following two principles:

  • In Unix, each process, except for daemons, is supposed to be connected to a standard input, standard output, and standard error device
  • Every device in Unix is represented by a file

You can also think of these devices as streams:

  • Standard input, named stdin, is the intaking stream from which the process receives input data
  • Standard output, named stdout, is the outbound stream where the process writes its output data
  • Standard error, named stderr, is the stream where the process writes its error messages

These streams are also identified by a standard POSIX file descriptor, which is an integer used by the kernel as a handler to refer to them, as you can see in the following table:

Device

Mode

File descriptor

stdin

read

0

stdout

write

1

stderr

write

2

So, tinkering with the file descriptors for the three main streams means that we can redirect the flows between stdin and stdout, but also stderr, from one process to the other. So, we can make different processes communicate with each other, and this is actually a form of IPC, inter-process communication, which we will look at it in more detail later in this book.

How do we redirect the Input/Output (I/O), from one process to another? We can get to this goal making use of some special characters:

>  

Let's start stating that the default output of a process, usually, is the stdout. Whatever it returns is returned on the stdout which, again usually, is the monitor or the terminal. Using the > character, we can divert this flow and make it go to a file. If the file does not exist, it is created, and if it exists, it is flattened and its content is overwritten with the output stream of the process.

A simple example will clarify how the redirection to a file works:

gzarrelli:~$ echo "This is some content"
This is some content

We used the command echo to print a message on the stdout, and so we see the message written, in our case, to the text terminal that is usually connected to the shell:

gzarrelli:~$ ls -lah
total 0
drwxr-xr-x 2 zarrelli gzarrelli 68B 20 Jan 07:43 .
drwxr-xr-x+ 47 zarrelli gzarrelli 1.6K 20 Jan 07:43 ..

There is nothing on the filesystem, so the output went straight to the terminal, but the underlying directory was not affected. Now, time for a redirection:

gzarrelli:~$ echo "This is some content" > output_file.txt  

Well, nothing to the screen; no output at all:

gzarrelli:~$ ls -lah
total 8
drwxr-xr-x 3 gzarrelli gzarrelli 102B 20 Jan 07:44 .
drwxr-xr-x+ 47 gzarrelli gzarrelli 1.6K 20 Jan 07:43 ..
-rw-r--r-- 1 gzarrelli gzarrelli 21B 20 Jan 07:44
output_file.txt

Actually, as you can see, the output did not vanish; it was simply redirected to a file on the current directory which got created and filled in:

gzarrelli:~$ cat output_file.txt
This is some content

Here we have something interesting. The cat command takes the content of the output_file.txt and sends it on the stdout. What we can see is that the output from the former command was redirected from the terminal and written to a file.

>>  

This double mark answers a requirement we often face: How can we add more content coming from a process to a file without overwriting anything? Using this double character, which means no file is already in place, create a new one; if it already exists, just append the new data. Let's take the previous file and add some content to it:

gzarrelli:~$ echo "This is some other content" >> output_file.txt
gzarrelli:~$ cat output_file.txt
This is some content
This is some other content

Bingo, the file was not overwritten and the new content from the echo command was added to the old. Now, we know how to write to a file, but what about reading from somewhere else other than the stdin?

<  

If the text terminal is the stdin, the keyboard is the standard input for a process, where it expects some data from. Again, we can divert the flow or data reading and get the process read from a file. For our example, we start creating a file containing a set of unordered numbers:

gzarrelli:~$ echo -e '5\n9\n4\n1\n0\n6\n2' > to_sort  

And let us verify its content, as follows:

gzarrelli:~$ cat to_sort
5
9
4
1
0
6
2

Now we can have the sort command read this file into its stdin, as follows:

gzarrelli:~$ sort < to_sort
0
1
2
4
5
6
9

Nice, our numbers are now in sequence, but we can do something more interesting:

gzarrelli:~$ sort < to_sort > sorted  

What did we do? We simply gave the file to_sort to the command sort into its standard input, and at the same time, we concatenated a second redirection so that the output of sort is written into the file sorted:

gzarrelli:~$ cat sorted
0
1
2
4
5
6
9

So, we can concatenate multiple redirections and have some interesting results, but we can do something even trickier, that is, chaining together inputs and outputs, not on files but on processes, as we will see now.

|  

The pipe character does exactly what its name suggests, pipes the stream; could be the stdout or stderr, from one process to another, creating a simple interprocess communication facility:

gzarrelli:~$ 
ps aux | awk '{print $2, $3, $4}' | grep -v [A-Z] | sort -r -k 2
-g | head -n 3

95 0.0 0.0
94 0.0 0.0
93 0.0 0.0

In this example, we had a bit of fun, first getting a list of processes, then piping the output to the awk utility, which printed only the first, eleventh, and twelfth fields of the output of the first command, giving us the process ID, CPU percentage, and memory percentage columns. Then, we got rid of the heading PID %CPU %MEM, piping the awk output to the input of grep, which performed a reverse pattern matching on any strings containing a character, not a number. In the next stage, we piped the output to the sort command, which reverse-ordered the data based on the values in the second column. Finally, we wanted only the three lines, and so we got the PID of the first three heaviest processes relying on CPU occupation.

Redirection can also be used for some kind of fun or useful stuff, as you can see in the following screenshot:

As you can see, there are two users on the same machine on different terminals, and remember that each user has to be connected to a terminal. To be able to write to any user's terminal, you must be root or, as in this example, the same user on two different terminals. With the who command we can identify which terminal (ttys) the user is connected to, also known as reads from, and we simply redirect the output from an echo command to his terminal. Because its session is connected to the terminal, he will read what we send to the stdin of his terminal device (hence, /dev/ttysxxx).

Everything in Unix is represented by a file, be it a device, a terminal, or anything we need access to. We also have some special files, such as /dev/null, which is a sinkhole - whatever you send to it gets lost:

gzarrelli:~$ echo "Hello" > /dev/null
gzarrelli:~$

And have a look at the following example too:

root:~$ ls
output_file.txtsortedto_sort
root:~$ mv output_file.txt /dev/null
root:~$ ls
to_sort

Great, there is enough to have fun, but it is just the beginning. There is a whole lot more to do with the file descriptors.

 

Messing around with stdin, stdout, and stderr

Well, if we tinker a little bit with the file descriptors and special characters we can have some nice, really nice, outcomes; let's see what we can do.

  • x < filename: This opens a file in read mode and assigns the descriptor named a, whose value falls between 3 and 9. We can choose any name by the means of which we can easily access the file content through the stdin.
  • 1 > filename: This redirects the standard output to filename. If it does not exist, it gets created; if it exists, the pre-existing data is overwritten.
  • 1 >> filename: This redirects the standard output to filename. If it does not exist, it is created; otherwise, the contents get appended to the pre-existing data.
  • 2 > filename: This redirects the standard error to filename. If it does not exist, it gets created; if it exists, the pre-existing data is overwritten.
  • 2 >> filename: This redirects the standard error to filename. If it does not exist, it is created; otherwise, the contents get appended to the pre-existing data.
  • &> filename: This redirects both the stdout and the stderr to filename. This redirects the standard error to filename. If it does not exist, it gets created; if it exists, the pre-existing data is overwritten.
  • 2>&1: This redirects the stderr to the stdout. If you use this with a program, its error messages will be redirected to the stdout, that is, usually, the monitor.
  • y>&x: This redirects the file descriptor for y to x so that the output from the file pointed by descriptor y will be redirected to the file pointed by descriptor x.
  • >&x: This redirects the file descriptor 1 that is associated with the stdout to the file pointed by the descriptor x, so whatever hits the standard output will be written in the file pointed by x.
  • x<> filename: This opens a file in read/write mode and assigns the descriptor x to it. If the file does not exist, it is created, and if the descriptor is omitted, it defaults to 0, the stdin.
  • x<&-: This closes the file opened in read mode and associated with the descriptor x.
  • 0<&- or <&-: This closes the file opened in read mode and associated with the descriptor 0, the stdin , which is then closed.
  • x>&-: This closes the file opened in write mode and associated with the descriptor x.
  • 1>&- or >&-: This closes the file opened in write mode and associated with the descriptor 1, the stdout, which is then closed.

If you want to see which file descriptors are associated with a process, you can explore the /proc directory and point to the following:

/proc/pid/fd  

Under that path, change PID with the ID of the process you want to explore; you will find all the file descriptors associated with it, as in the following example:

gzarrelli:~$ ls -lah /proc/15820/fd
total 0
dr-x------ 2 postgres postgres 0 Jan 20 17:59 .
dr-xr-xr-x 9 postgres postgres 0 Jan 20 09:59 ..
lr-x------ 1 postgres postgres 64 Jan 20 17:59 0 -> /dev/null
(deleted)

l-wx------ 1 postgres postgres 64 Jan 20 17:59 1 -> /var/log/postgresql/postgresql-9.4-main.log
lrwx------ 1 postgres postgres 64 Jan 20 17:59 10 -> /var/lib/postgresql/9.4/main/base/16385/16587
lrwx------ 1 postgres postgres 64 Jan 20 17:59 11 -> socket:[13135]
lrwx------ 1 postgres postgres 64 Jan 20 17:59 12 -> socket:[1502010]
lrwx------ 1 postgres postgres 64 Jan 20 17:59 13 -> /var/lib/postgresql/9.4/main/base/16385/16591
lrwx------ 1 postgres postgres 64 Jan 20 17:59 14 -> /var/lib/postgresql/9.4/main/base/16385/16593
lrwx------ 1 postgres postgres 64 Jan 20 17:59 15 -> /var/lib/postgresql/9.4/main/base/16385/16634
lrwx------ 1 postgres postgres 64 Jan 20 17:59 16 -> /var/lib/postgresql/9.4/main/base/16385/16399
lrwx------ 1 postgres postgres 64 Jan 20 17:59 17 -> /var/lib/postgresql/9.4/main/base/16385/16406
lrwx------ 1 postgres postgres 64 Jan 20 17:59 18 -> /var/lib/postgresql/9.4/main/base/16385/16408
l-wx------ 1 postgres postgres 64 Jan 20 17:59 2 -> /var/log/postgresql/postgresql-9.4-main.log
lr-x------ 1 postgres postgres 64 Jan 20 17:59 3 -> /dev/urandom
l-wx------ 1 postgres postgres 64 Jan 20 17:59 4 -> /dev/null
(deleted)

l-wx------ 1 postgres postgres 64 Jan 20 17:59 5 -> /dev/null
(deleted)

lr-x------ 1 postgres postgres 64 Jan 20 17:59 6 -> pipe:[1502013]
l-wx------ 1 postgres postgres 64 Jan 20 17:59 7 -> pipe:[1502013]
lrwx------ 1 postgres postgres 64 Jan 20 17:59 8 -> /var/lib/postgresql/9.4/main/base/16385/11943
lr-x------ 1 postgres postgres 64 Jan 20 17:59 9 -> pipe:[13125]

Nice, isn't it? So, let us do something that is absolute fun:

First, let's open a socket in read/write mode to the web server of a virtual machine created for this book and assign the descriptor 9:

gzarrelli:~$ exec 9<> /dev/tcp/172.16.210.128/80 || exit 1  

Then, let us write something to it; nothing complex:

gzarrelli:~$ printf 'GET /index2.html HTTP/1.1\nHost: 172.16.210.128\nConnection: close\n\n' >&9

We just requested a simple HTML file created for this example.

And now let us read the file descriptor 9:

gzarrelli:~$ cat <&9
HTTP/1.1 200 OK
Date: Sat, 21 Jan 2017 17:57:33 GMT
Server: Apache/2.4.10 (Debian)
Last-Modified: Sat, 21 Jan 2017 17:57:12 GMT
ETag: "f3-5469e7ef9e35f"
Accept-Ranges: bytes
Content-Length: 243
Vary: Accept-Encoding
Connection: close
Content-Type: text/html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<HTML>
<HEAD>
<TITLE>This is a test file</TITLE>
</HEAD>
<BODY>
<P>And we grabbed it through our descriptor!
</BODY>
</HTML>

That's it! We connected the file descriptor to a remote server through a socket, we could write to it and read the response, redirecting the streams over the network.

For dealing just with the command line, we have done a lot so far, but if we want to go further, we have to see how to script all these commands and make the most out of them. It is time for our first script!

 

Time for the interpreter: the sha-bang

When the game gets tougher, a few concatenations on the command line cannot be enough to perform the tasks we are meant to accomplish. Too many bits on single lines are too messy, and we lack clarity, so better to store our commands or builtins in a file and have it executed.

When a script is executed, the system loader parses the first line looking for what is named the sha-bang or shebang, a sequence of characters.

#!  

This will force the loader to treat the following characters as a path to the interpreter and its optional arguments to be used to further parse the script, which will then be passed as another argument to the interpreter itself. So, at the end, the interpreter will parse the script and, this time, we will ignore the sha-bang, since its first character is a hash, usually indicating a comment inside a script and comments do not get executed. To go a little further, the sha-bang is what we call a 2-bit magic number, a constant sequence of numbers or text values used in Unix to identify file or protocol types. So, 0x23 0x21 is actually the ASCII representation of #!.

So, let's make a little experiment and create a tiny one line script:

gzarrelli:~$ echo "echo \"This should go under the sha-bang\"" > test.sh  

Just one line. Let's have a look:

gzarrelli:~$ cat test.sh 
echo "This should go under the sha-bang"

Nice, everything is as we expected. Has Linux something to say about our script? Let's ask:

gzarrelli:~$ file test.sh 
test.sh: ASCII text

Well, the file utility says that it is a plain file, and this is a simple text file indeed. Time for a nice trick:

gzarrelli:~$ sed -i '1s/^/#!\/bin\/sh\n/' test.sh  

Nothing special; we just added a sha-bang pointing to /bin/sh:

gzarrelli:~$ cat test.sh 
#!/bin/sh
echo "This should go under the sha-bang"

As expected, the sha-bang is there at the beginning of our file:

gzarrelli:~$ file test.sh 
test.sh: POSIX shell script, ASCII text executable

No way, now it is a script! The file utility makes three different tests to identify the type of file it is dealing with. In order: file system tests, magic number tests, and language tests. In our case, it identified the magic numbers that represent the sha-bang, and thus a script, and this is what it told us: it is a script.

Now, a couple of final notes before moving on.

  • You can omit the sha-bang if your script is not using a shell builtins or shell internals
    • Pay attention to /bin/sh, not everything that looks like an innocent executable is what it seems:
gzarrelli:~$ ls -lah /bin/sh
lrwxrwxrwx 1 root root 4 Nov 8 2014 /bin/sh -> dash

In some systems, /bin/sh is a symbolic link to a different kind of interpreter, and if you are using some internals or builtins of Bash, your script could have unwanted or unexpected outcomes.

Calling your script

Well, we have our two-line script; time to see if it really does what we want it to do:

gzarrelli:~$ ./test.sh
-bash: ./test.sh: Permission denied

No way! It is not executing, and from the error message, it seems related to the file permissions:

gzarrelli:~$ ls -lah test.sh 
-rw-r--r-- 1 gzarrelli gzarrelli 41 Jan 21 18:56 test.sh

Interesting. Let us recap what the file permissions are. As you can see, the line describing the properties of a file starts with a series of letters and lines.

Type

User

Group

Others

-

rw-

r--

r--

For type, we can have two main values, d - this is actually a directory, or - and means this is a regular file. Then, we can see what permissions are set for the user owning the file, for the group owning the file, and for all other users. As you may guess, r stands for permission to read; w stands for being able to write; x stands for permission to execute; and - means no right. These are all in the same order, first r, then w, then x. So wherever you see a - instead of an r, w, or x , it means that particular right is not granted.

The same works for directory permission, except that x means you can traverse the directory; r means that you can enumerate the content of it; w means that you can modify the attributes of the directory and removes the entries that are eventually in it.

Indicator

File type

-

Regular file

b

Block file (disk or partition)

c

Character file, like the terminal under /dev

d

Directory

l

Symbolic link

p

Named pipe (FIFO)

s

Socket

So, going back to our file, we do not see any execution bit set. Why? Here, a shell builtin can help us:

gzarrelli:~$ umask
0022

Does it make any sense to you? Well, it should, once we see how the permissions on files can be represented in numeric form. Think of permissions as bits of metadata pertaining to a file, one bit for each grant; no grant is 0:

r-- = 100
-w- = 010
--x = 001

Now, let's convert from binary to decimal:

Permission

Binary

Decimal

r

100

4

w

010

2

x

001

1

Now, just combine the decimal values to obtain the final permission, but remember that you have to calculate read, write, and execution grants in triplets - one set for the user owning the file, one for the group, and one for the others.

Back again to our file, we can change its permissions in a couple of ways. Let's say we want it to be readable, writable, and executable by the user; readable and writable by the group; and only readable by the others. We can use the command chmod to accomplish this goal:

chmod u+rwx filename
chmod g+wfilename

So, + or - add or subtract the permissions to the file or directory pointed and u, g, w to define which of the three sets of attributes we are referring to.

But we can speed things up using the numeric values:

User - rwx: 4+2+1 =7
Group - rw: 4+2 = 6
Other - r = 4

So, the following command should do the trick in one line:

chmod  764 test.sh  

Time to verify:

gzarrelli:~$ ls -lah test.sh 
-rwxrw-r-- 1 gzarrelli gzarrelli 41 Jan 21 18:56 test.sh

Here we are. So we just need to see whether our user can execute the file, as the permissions granted suggest:

gzarrelli:~$ ./test.sh  

This should go under the sha-bang.

Great, it works. Well, the script is not that complex, but served our purposes. But we left one question behind: Why was the file created with that set of permissions? As a preliminary explanation, I ran the command umask, and the result was 0022 but did not go further.

Count the digits in umask, and those in the numeric modes for chmod. Four against three. What does that leading digit means? We have to introduce some special permission modes that enable some interesting features:

  • Sticky bit. Think of it as a user right assertion on a file or directory. If a sticky bit is set on a directory, the files inside it can be deleted or renamed only by the file owner, the owner of the directory the file is in, or by root. Really useful in a shared directory to prevent one user from deleting or renaming some other user's file. The sticky bit is represented by the t letter at the end of the of the list of permissions or by the octal digit 1 at the beginning. Let's see how it works:
gzarrelli:~$ chmod +t test.sh
gzarrelli:~$ ls -lah test.sh
-rwxrw-r-T 1 gzarrelli gzarrelli 41 Jan 22 09:05 test.sh
  • Interestingly, the t is capital, not lower, as we were talking about. Maybe this sequence of commands will make everything clearer:
gzarrelli:~$ chmod +t test.sh 
gzarrelli:~$ ls -lah test.sh
-rwxrw-r-T 1 gzarrelli gzarrelli 41 Jan 22 09:05 test.sh
gzarrelli:~$ chmod o+x test.sh
gzarrelli:~$ ls -lah test.sh
-rwxrw-r-t 1 gzarrelli gzarrelli 41 Jan 22 09:05 test.sh
  • You probably got it: the t attribute is a capital when, on the file or directory, the execution bix (x) is not set for the others (o).
  • And now, back to the origins:
gzarrelli:~$ chmod 0764 test.sh 
gzarrelli:~$ ls -lah test.sh
-rwxrw-r-- 1 gzarrelli gzarrelli 41 Jan 22 09:05 test.sh
  • We used the four-digit notations, and the leading 0 cleared out the 1 which referred to the sticky bit. Obviously, we could also use chmod -t to accomplish the same goal. One final note, if sticky bit and GUID are in conflicts, the sticky bit prevails in granting permissions.
    • Set UID: The Set User ID (SUID upon execution) marks an executable, so that when it runs, it will do so as the file owner, with his privileges, and not as the user invoking it. Another tricky use is that, if assigned to a directory, all the files created or moved to that directory will have the ownership changed to the owner of the directory and not to the user actually performing the operation. Visually, it is represented by an s in the position of the user execution rights. The octal number referring to it is 4:
gzarrelli:~$ chmod u+s test.sh
gzarrelli:~$ ls -lah test.sh
-rwsrw-r-- 1 gzarrelli gzarrelli 41 Jan 22 09:05 test.sh
    • Set GID: The SGID (Set Group ID upon execution) marks an executable, so that when it is run, it does as the user invoking it was in the group that owns the file. If applied to a directory, every file created or moved to the directory will have the group set to the group owning the directory rather than the one the user performing the operation belongs to. Visually, it is represented by an s in the position of the group execution rights. The octal number referring to it is 2.
  • Let's reset the permissions on our test file:
gzarrelli:~$ chmod 0764 test.sh
gzarrelli:~$ ls -lah test.sh
-rwxrw-r-- 1 gzarrelli gzarrelli 41 Jan 22 09:05 test.sh
  • Now we apply SGID using the octal digit referring to it:
gzarrelli:~$ chmod 2764 test.sh
gzarrelli:~$ ls -lah test.sh
-rwxrwSr-- 1 gzarrelli gzarrelli 41 Jan 22 09:05 test.sh

In this example, the s is capital because we do not have the execution permission granted on the group; the same applies for SUID.

So, now we can go back again to our umask, and at this point you probably already know what is the meaning of the four-digit notation is. It is a command that modifies the permissions on a file creation, denying the permission bits. Taking our default creation mask for directory:

0777  

We can think of umask of 0022 as:

0777 -
0022
------
0755

Do not pay attention to the first 0; it is the sticky bit and simply subtracts from the default grant mask for a directory, rwx for user, group, and others, the value of the umask. The remaining value is the current permission mask for file creation. If you are not comfortable with the numeric notation, you can see the umask values in the familiar rwx notation using:

gzarrelli:~$ umask -S
u=rwx,g=rx,o=rx

For the files, the default mask is 666, so:

0666 -
0022
--------
0644

It is actually a tad more complicated than this, but this rule of thumb will let you calculate the masks quickly. Let us try to create a new umask. First, let's reset the umask value:

gzarrelli:~$ umask
0000
gzarrelli:~$ umask -S
u=rwx,g=rwx,o=rwx

As we can see, nothing gets subtracted:

zarrelli:~$ touch test-file
gzarrelli:~$ mkdir test-dir
gzarrelli:~$ ls -lah test-*
-rw-rw-rw- 1 gzarrelli gzarrelli 0 Jan 22 18:01 test-file

test-dir:
total 8.0K
drwxrwxrwx 2 gzarrelli gzarrelli 4.0K Jan 22 18:01 .
drwxr-xr-x 4 gzarrelli gzarrelli 4.0K Jan 22 18:01 ..

The test file has 666 access rights and the directory 777. This is really way too much:

zarrelli:~$ umask o-rwx,g-w
gzarrelli:~$ umask -S
u=rwx,g=rx,o=

gzarrelli:~$ touch 2-test-file
gzarrelli:~$ mkdir 2-test-dir
gzarrelli:~$ ls -lah 2-test-*
-rw-r----- 1 gzarrelli gzarrelli 0 Jan 22 18:03 2-test-file

2-test-dir:
total 8.0K
drwxr-x--- 2 gzarrelli gzarrelli 4.0K Jan 22 18:03 .
drwxr-xr-x 5 gzarrelli gzarrelli 4.0K Jan 22 18:03 ..

As you can see, the permissions are 750 for directories and 640 for files. A bit of math will help:

0777 -
0750
--------
0027

You would get the same result from the umask command:

gzarrelli:~$ umask
0027

All these settings last as long as you are logged in to the session, so if you want to make them permanent, just add the umask call with the appropriate argument to/etc/bash.bashrc, or /etc/profile for a system-wide effect or, for a single user mask, add it to the .bashrc file inside the user home directory.

Something went wrong, let's trace it

So, we have a new tiny script named disk.sh:

gzarrelli:~$ cat disk.sh
#!/bin/bash
echo "The total disk allocation for this system is: "
echo -e "\n"
df -h
echo -e "\n
df -h | grep /$ | awk '{print "Space left on root partition: " $4}'

Nothing special, a shebang, a couple of echoes on a new line just to have some vertical spacing, the output of df -h and the same command but parsed by awk to give us a meaningful message. Let's run it:

zarrelli:~$ ./disk.sh  

The total disk allocation for this system is:

Filesystem      Size  Used Avail Use% Mounted on
/dev/dm-0 19G 15G 3.0G 84% /
udev 10M 0 10M 0% /dev
tmpfs 99M 9.1M 90M 10% /run
tmpfs 248M 80K 248M 1% /dev/shm
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 248M 0 248M 0% /sys/fs/cgroup
/dev/sda1 236M 33M 191M 15% /boot
tmpfs 50M 12K 50M 1% /run/user/1000
tmpfs 50M 0 50M 0% /run/user/0
Space left on root partition: 3.0G

Nothing too complicated, a bunch of easy commands, which in case of failure print an error message on the standard output. However, let's think for a moment that we have a more flexible script, more lines, some variable assignments, loops, and other constructs, and something goes wrong, but the output does not tell us anything. In this case, be handy to see a method that is actually running inside our script so that we can see the output of the commands, the variable assignments, and so forth. In Bash, this is possible; thanks to the set command associated with the -x argument, which shows all the commands and arguments in the script printed to the stdout, after the commands have been expanded and before they are actually invoked. The same behavior can be obtained running a subshell with the -x argument. Let's see what would happen if it was used with our script:

gzarrelli:~$ bash -x disk.sh
+ echo 'The total disk allocation for this system is: '
The total disk allocation for this system is:
+ echo -e '\n'
+ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/dm-0 19G 15G 3.0G 84% /
udev 10M 0 10M 0% /dev
tmpfs 99M 9.1M 90M 10% /run
tmpfs 248M 80K 248M 1% /dev/shm
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 248M 0 248M 0% /sys/fs/cgroup
/dev/sda1 236M 33M 191M 15% /boot
tmpfs 50M 12K 50M 1% /run/user/1000
tmpfs 50M 0 50M 0% /run/user/0
+ echo -e '\n'
+ awk '{print "Space left on root partition: " $4}'
+ grep /dm-0
+ df -h
Space left on root partition: 3.0G

Now it is quite easy to understand how the stream of data flows inside the script: all the lines beginning with a + sign are commands, and the following lines are outputs.

Let's think for a moment that we have longer scripts; for most parts, we are sure that things work fine. For some lines, we are not completely sure of the outcome. Debugging everything would be noisy. In this case, we can use set-x to enable the logging only for those lines we need to inspect, turning it off with set+x when it is no longer needed. Time to modify the script, as follows:

#!/bin/bash  
set -x
echo "The total disk allocation for this system is: "
echo -e "\n"
df -h
echo -e "\n"
set +x
df -h | grep /dm-0 | awk '{print "Space left on root partition: " $4}'

And now, time to run it again, as follows:

gzarrelli:~$ ./disk.sh
+ echo 'The total disk allocation for this system is: '
The total disk allocation for this system is:
+ echo -e '\n'
+ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/dm-0 19G 15G 3.0G 84% /
udev 10M 0 10M 0% /dev
tmpfs 99M 9.1M 90M 10% /run
tmpfs 248M 80K 248M 1% /dev/shm
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 248M 0 248M 0% /sys/fs/cgroup
/dev/sda1 236M 33M 191M 15% /boot
tmpfs 50M 12K 50M 1% /run/user/1000
tmpfs 50M 0 50M 0% /run/user/0
+ echo -e '\n'
+ set +x
Space left on root partition: 3.0G

As you can see, we see the instructions given in the block marked by set-x, and we also see the set+x instruction given, but then, after this, the line with awk disappears and we see only its output, filtering out what was not so interesting for us and leaving only the part we want to focus on.

This is not a powerful debugging system typical of more complex programming languages, but it can be really helpful in scripts of hundreds of lines where we can lose track of sophisticated structures, such as evaluations, cycles, or variable assignments, which make the scripts more expressive but even more difficult to get hold of and master. So, now that we are clear on how to debug a file, which permissions are needed to make it safely executable, and how to shell parse the command line, we are ready to spice things up looking at how we can use variables to add more flexibility to our hand-crafted tools.

 

Variables

What is a variable? We could answer that it is something not constant; nice joke, but it would not help us so much. Better to think of it as a bucket where we can store some information for later processing: at a certain point of your script you get a value, a piece of info that you do not want to process at that very moment, so you fit it into a variable that you will recall later in the script. This is, in an intuitive way, the use of a variable, a way to allocate a part of the system memory to hold your data.

So far, we have seen that our scripts could retrieve some pieces of information from the system and had to process them straight away, since, without the use of a variable, we had no way to further process the information except for concatenating or redirecting the output to another program. This forced us to have a linear execution, no flexibility, no complexity: once you get some data, you process it straight away redirecting the file descriptors, one link in the chain after the other.

A variable is nothing really new; a lot of programming languages use them to store different types of data, integers, floating, strings, and you can see many different kinds of variables related to different kinds of data they hold. So, you have probably heard about casting a variable, which means, roughly, changing its type: you get a value as a string of numbers and you want to use it as an integer, so you cast it as an int and proceed processing it using some math functions.

Our shell is not so sophisticated, and it has only one type of variable or, better, it has none: whatever you store in it can be later processed without any casting. This can be nice because you do not have to pay attention to what type of data you are holding; you get a number as a string and can process it straight away as an integer. Nice and easy, but we must remember that restrictions are in place not just to prevent us from doing something, but also to help us not do something that would be unhealthy for our code, and this is exactly the risk in having flat variables, to write some piece of code that simply does not work, cannot work.

Assigning a variable

As we just saw, a variable is a way to store a value: we get a value, assign it to a variable and refer to the latter to access the former. The operation of retrieving the content of a variable is named variable substitution. A bit like, if you think about descriptors, the way that you use them to access files. The way you assign a variable is quite straightforward:

LABEL=value  

LABEL can be any string, can have upper and lowercase, start with or contain numbers and underscores, and it is case sensitive.

The assignment is performed by the = character, which, be wary, is not the same as the equal to == sign; they are two different things and are used in different contexts. Finally, whatever you put at the right of the assignment operator becomes the value of the variable. So, let's assign some value to our first variable:

gzarrelli:~$ FIRST_VARIABLE=amazing  

Now we can try to access the value trying to perform an action on the variable itself:

gzarrelli:~$ echo FIRST_VARIABLE
FIRST_VARIABLE

Not exactly what we expected. We want the content, not the name of the variable. Have a look at this:

gzarrelli:~$ echo $FIRST_VARIABLE
amazing

This is better. Using the $ character at the beginning of the variable name identified this as a variable and not a plain string, so we had access to the content. This means that, from now on, we can just use the variable with any commands instead of referring to the whole content of it. So, let us try again:

gzarrelli:~$ echo $first_variable    
gzarrelli:~$

The output is null, and not 0; we will see later on that zero is not the same as null, since null is no value but zero is indeed a value, an integer. What does the previous output mean? Simply that our labels are case sensitive, change one character from upper to lower or vice versa, and you will have a new variable which, since you did not assign any value to it, does not hold any value, hence the null you receive once you try to access it.

Keep the variable name safe

We just saw that $label is the way we reference the content of a variable, but if you have a look at some scripts, you can find another way of retrieving variable content:

${label}  

The two ways of referencing the content of a variable are both valid, and you can use the first, more compact, in any case except when concatenating the variable name to any characters, which could change the variable name itself. In this case, it becomes mandatory to use the extended version of the variable substitution, as the following example will make clear.

Let's start printing our variable again:

gzarrelli:~$ echo $FIRST_VARIABLE
amazing

Now, let's do it again using the extended version of substitution:

gzarrelli:~$ echo ${FIRST_VARIABLE}
amazing

Exactly the same output since, as we said, these two methods are equivalent. Now, let us add a string to our variable name:

gzarrelli:~$ echo $FIRST_VARIABLEngly    
gzarrelli:~$

Nothing, and we can understand why the name of the variable changed; so we have no content to access to. But now, let us try the extended way:

gzarrelli:~$ echo ${FIRST_VARIABLE}ly
amazingly

Bingo! The name of the variable has been preserved so that the shell was able to reference its value and then concatenated it to the ly string we added to the name.

Keep this difference in mind, because the graphs will be a handy way to concatenate strings to a variable to spice your scripts up and, as a good rule of thumb, refer to variables using the graphs. This will help you avoid unwanted hindrances.

Variables with limited scope

As we said before, variables have no type in shell, and this makes them somehow easy to use, but we must pay attention to some sorts of limits to their use.

  • First, the content of a variable is accessible only after the value has been assigned
  • An example will make everything clearer:
gzarrelli:~$ cat disk-space.sh 
#!/bin/bash
echo -e "\n"
echo "The space left is ${disk_space}"
disk_space=`df -h | grep /$ | awk '{print $4}'`
echo "The space left is ${disk_space}

We used the variable disk space to store the result of the df command and try to reference its value on the preceding and following lines. Let us run it in debug mode:

gzarrelli:~$ sh -x disk-space.sh 
+ echo -e \n
-e
+ echo The space left is
The space left is
+ awk {print $4}
+ grep /dm-0
+ df -h
+ disk_space=3.0G
+ echo The space left is 3.0G
The space left is 3.0G

As we can see, the flow of execution is sequential: you access the value of the variable only after it is instanced, not before. And bear in mind that the first line actually printed something: a null value. Well, now let us print the variable on the command line:

gzarrelli:~$ echo ${disk_space}    
gzarrelli:~$

The variable is instanced inside the script, and it is confined there, inside the shell spawned to invoke the command and nothing passed to our main shell.

We can ourselves impose some restrictions to a variable, as we will see with the next example. In this new case, we will introduce the use of a function, something that we are going to look at in more detail further in this book and the keyword local:

gzarrelli:~$ cat disk-space-function.sh
#!/bin/bash
echo -e "\n"
echo "The space left is ${disk_space}"
disk_space=`df -h | grep /dm-0 | awk '{print $4}'`
print () {
echo "The space left inside the function is ${disk_space}"
local available=yes
last=yes
echo "Is the available variable available inside the function? ${available}"
}
echo "Is the last variable available outside the function before it is invoked? ${last}"
print
echo "The space left outside is ${disk_space}"
echo "Is the available variable available outside the function? ${available}"
echo "Is the last variable available outside the function after it is invoked? ${last}"

Now let us run it:

gzarrelli:~$ cat di./pace-function.sh
The space left is
Is the last variable available outside the function before it is invoked?
The space left inside the function is 3.0G
Is the available variable available inside the function? yes
The space left outside is 3.0G
Is the available variable available outside the function?
Is the last variable available outside the function after it is invoked? yes

What can we see here?

The content of variable disk_space is not available before the variable itself is instanced. We already knew this.

The content of a variable instanced inside a function is not available when it is defined in the function, but when the function itself is invoked.

A variable marked by the keyword local and defined inside a function is available only inside the function and only when the function is invoked. Outside the block of code defined by the function itself; the local variable is not visible to the rest of the script. So, using local variables can be handy to write recursive code, even though not recommended.

So, we just saw a few ways to make a variable really limited in its scope, and we also noted that its content is not available outside the script it was instanced in. Wouldn't it be nice to have some variables with a broader scope, capable of influencing the execution of each and every script, something at environment level? It would, and from now on we are going to explore the environment variables.

Environment variables

As we discussed earlier, the shell comes with an environment, which dictates what it can do and what not, so let's just have a look at what these variables are using the env command:

zarrelli:~$ env    
...
LANG=en_GB.utf8
...
DISPLAY=:0.0
...
USER=zarrelli
...
DESKTOP_SESSION=xfce
...
PWD=/home/zarrelli/Documents
...
HOME=/home/zarrelli
...
SHELL=/bin/bash
...
LANGUAGE=en_GB:en
...
GDMSESSION=xfce
...
LOGNAME=zarrelli
...
PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
_=/usr/bin/env

Some of the variables have been omitted for the sake of clarity; otherwise, the output would have been too long, but still we can see something interesting. We can have a look at the PATH variable content, which influences where the shell will look for a program or script to execute. We can see which shell is being currently used, by which user, what the current directory is and the previous one.

But environment variables can not only be read; they can be instanced using the export command:

zarrelli:~$ export TEST_VAR=awesome  

Now, let us read it:

zarrelli:~/$ echo ${TEST_VAR}
awesome

That is it, but since this was just a test, it is better to unset the variable so that we do not leave unwanted values around the shell environment:

zarrelli:~$ unset TEST_VAR  

And now, let us try to get the content of the variable:

zarrelli:~/$ echo ${TEST_VAR}
zarrelli:~/$

No way! The variable content is no more, and as you will see now, the environment variables disappear once their shell is no more. Let's have a look at the following script:

zarrelli:~$ cat setting.sh 
#!/bin/bash
export MYTEST=NOWAY
env | grep MYTEST
echo ${MYTEST}

We simply instance a new variable, grep for it in the environment and then print its content to the stdout. What happens once invoked?

[email protected]:~$ ./setting.sh ; echo ${MYTEST}
MYTEST=NOWAY
NOWAY
zarrelli:~$

We can easily see that the variable was grepped on the env output, so this means that the variable is actually instanced at the environment level and we could access its content and print it. But then we executed the echo of the content of MYTEST outside the script again, and we could just print a blank line. If you remember, when we execute a script, the shell forks a new shell and passes to it its full environment, thus the command inside the program shell can manipulate the environment. But then, once the program is terminated, the related shell is terminated, and its environment variables are lost; the child shell inherits the environment from the parent, the parent does not inherit the environment from the child.

Now, let us go back to our shell, and let us see how we can manipulate the environment to our advantage. If you remember, when the shell has to invoke a program or a script, it looks inside the content of the PATH environment variable to see if it can find it in one of the paths listed. If it is not there, the executable or the script cannot be invoked just with their names, they have to be called passing the full path to it. But have a look at what this script is capable of doing:

#!/bin/bash    
echo "We are into the directory"
pwd

We print our current user directory:

echo "What is our PATH?"
echo ${PATH}

And now we print the content of the environment PATH variable:

echo "Now we expand the path for all the shell"
export PATH=${PATH}:~/tmp

This is a little tricky. Using the graphs, we preserve the content of the variable and add a, which is the delimiter for each path inside the list held by PATH, plus the ~/tmp, which literally means the tmp directory inside the home directory of the current user:

echo "And now our PATH is..."
echo ${PATH}
echo "We are looking for the setting.sh script!"
which setting.sh
echo "Found it!"

And we actually found it. Well, you could also add some evaluation to make the echo conditional, but we will see such a thing later on. Time for something funny:

echo "Time for magic!"
echo "We are looking for the setting.sh script!"
env PATH=/usr/bin which setting.sh
echo "BOOOO, nothing!"

Pay attention to the line starting with env; this command is able to overrun the PATH environment variable and to pass its own variable and related value. The same behavior can be obtained using export instead of env:

echo "Second try..."
env PATH=/usr/sbin which setting.sh
echo "No way..."

This last try is even worse. We modified the content of the $PATH variable which now points to a directory where we cannot find the script. So, not being in the $PATH, the script cannot be invoked by just its name:

zarrelli:~$ ./setenv.sh   

We are in the directory:

/home/zarrelli/Documents/My books/Mastering bash/Chapter 1/Scripts  

What is our PATH?

/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games  

Now we expand the path for all the shell.

And now our PATH is:

/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home
/zarrelli/tmp

We are looking for the setting.sh script!

/home/zarrelli/tmp/setting.sh  

Found it!

Time for magic!

We are looking for the setting.sh script!

BOOOO, nothing!

Second try...

env: 'which': No such file or directory

No way...

Environment variable

Use

BASH_VERSION

The version of the current Bash session

HOME

The home directory of the current user

HOSTNAME

The name of the host

LANG

The locale used to manage the data

PATH

The search path for the shell

PS1

The prompt configuration

PWD

The path to the current directory

USER

The name of the currently logged in user

LOGNAME

Same as user

We can also use env with the -i argument to strip down all the environment variables and just pass to the process what we want, as we can see in the following examples. Let's start with something easy:

zarrelli:~$ cat env-test.sh 
#!/bin/bash
env PATH=HELLO /usr/bin/env | grep -A1 -B1 ^PATH

Nothing too difficult, we modified the PATH variable passing a useless value because HELLO is not a searchable path, then we had to invoke env using the full path because PATH became useless. Finally, we piped everything to the input of grep, which will select all the rows (^) starting with the string PATH, printing that line and one line before and after:

zarrelli:~$ ./env-test.sh     
2705-XDG_CONFIG_DIRS=/etc/xdg
2730:PATH=HELLO
2741-SESSION_MANAGER=local/moveaway:@/tmp/.ICE-unix/888,unix/moveaway:/tmp/.ICE-unix/888

Now, let's modify the script, adding -i to the first env:

zarrelli:~$ cat env-test.sh 
#!/bin/bash
env -i PATH=HELLO /usr/bin/env | grep -A1 -B1 ^PATH

And now let us run it:

zarrelli:~/$ ./env-test.sh 
PATH=HELLO
zarrelli:~/$

Can you guess what happened? Another change will make everything clearer:

env -i PATH=HELLO /usr/bin/env   

No grep; we are able to see the complete output of the second env command:

zarrelli:~$ env -i PATH=HELLO /usr/bin/env
PATH=HELLO
zarrelli:~$

Just PATH=HELLO env with the argument -i passed to the second env process, a stripped down environment with only the variables specified on the command line:

zarrelli:~$ env -i PATH=HELLO LOGNAME=whoami/usr/bin/env
PATH=HELLO
LOGNAME=whoami/usr/bin/env
zarrelli:~$

Because we are engaged in stripping down, let us see how we can make a function disappear with the well-known unset -f command:

#!/bin/bash    
echo -e "\n"
echo "The space left is ${disk_space}"
disk_space=`df -h | grep vg-root | awk '{print $4}'`
print () {
echo "The space left inside the function is ${disk_space}"
local available=yes
last=yes
echo "Is the available variable available inside the function? ${available}"
}
echo "Is the last variable available outside the function before it is invoked? ${last}"
print
echo "The space left outside is ${disk_space}"
echo "Is the available variable available outside the function? ${available}"
echo "Is the last variable available outside the function after it is invoked? ${last}"
echo "What happens if we unset a variable, like last?"
unset last
echo "Has last a referrable value ${last}"
echo "And what happens if I try to unset a while print functions using unset -f"
t
print
unset -f print
echo "Unset done, now let us invoke the function"
print

Time to verify what happens with the unset command:

zarrelli:~$ ./disk-space-function-unavailable.sh   

The space left is:

Is the last variable available outside the function before it is invoked? 
The space left inside the function is 202G
Is the available variable available inside the function? yes
The space left outside is 202G
Is the available variable available outside the function?
Is the last variable available outside the function after it is invoked? yes
What happens if we unset a variable, like last?
Has last a referrable value
And what happens if I try to unset a while print functions using
unset -f

The space left inside the function is 202G
Is the available variable available inside the function? yes

Unset done, now let us invoke the function:

zarrelli:~$   

The print function works well, as expected before we unset it, and also the variable content becomes no longer available. Speaking about variables, we can actually unset some of them on the same row using the following:

unset -v variable1 variable2 variablen  

We saw how to modify an environment variable, but what if we want to make it read-only so to protect its content from an unwanted modification?

zarrelli:~$ cat readonly.sh 
#!/bin/bash
echo "What is our PATH?"
echo ${PATH}
echo "Now we make it readonly"
readonly PATH
echo "Now we expand the path for all the shell"
export PATH=${PATH}:~/tmp

Look at the line readonlyPATH, and now let's see what the execution of this script leads us to:

zarrelli:~$ ./readonly.sh 
What is our PATH?
/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
Now we make it readonly
Now we expand the path for all the shell
./readonly.sh: line 10: PATH: readonly variable
zarrelli:~$

What happened is that our script tried to modify the PATH variable that was just made readonly a few lines before and failed. This failure then led us out of the screen with a failure, and this is confirmed by printing the value of the $? variable, which holds the exit state of the last command invoked:

zarrelli:~$ echo $?
1
zarrelli:~$ echo $?
0

We will see the use of such a kind of variable later, but now what interests us is to know what that 0 and 1 mean: the first time we issued the echo command, right after invoking the script, it gave us the exit code 1, which means failure, and this makes sense because the script exited abruptly with an error. The second time we ran echo, it showed 0, which means that the last command executed, the previous echo went well, without any errors.

Variable expansion

The variable expansion is the method we have to access and actually change the content of a variable or parameter. The simplest way to access or reference the variable value is as in the following example:

x=1 ; echo $x    
zarrelli:~$ x=1 ; echo $x
1

So, we assigned a value to the variable x and then referenced the value preceding the variable name with the dollar sign $. So, echo$x prints the content of x, 1, to the standard output. But we can do something even more subtle:

zarrelli:~$ x=1 ; y=$x; echo "x is $x" ; echo "y is $y"
x is 1
y is 1

So, we gave a value to the variable x, then we instanced the variable y referencing the content of the variable x. So, y got its assignment referencing the value of x through the $ character, not directly using a number after the = char. So far, we saw two different ways to reference a variable:

$x
${x}

The first one is terser, but it would be better to stick to the second way because it preserves the name of the variable and, as we saw a few pages before, it allows us to concatenate a string to the variable without losing the possibility of referencing it.

We just saw the simplest among different ways to manipulate the value held by a variable. What we are going to see now is how to thinker with a variable to have default values and messages, so we make the interaction with the variable more flexible. Before proceeding, just bear in mind that we can use two notations for our next example and they are equivalent:

${variable-default}
${variable:-default}

So, you could see either of the two in a script, and both are correct:

${variable:-default} ${variable-default}  

Simply, if a variable is not set, return a default value, as we can see in the following example:

#!/bin/bash
echo "Setting the variable x"
x=10
echo "Printing the value of x using a default fallback value"
echo "${x:-20}"
echo "Unsetting x"
unset -v x
echo "Printing the value of x using a default fallback value"
echo "${x:-20}"
echo "Setting the value of x to null"
x=
echo "Printing the value of x with x to null"
echo "${x:-30}

Now, let's execute it:

zarrelli:~$ ./variables.sh 
Setting the variable x
Printing the value of x using a default fallback value
10
Unsetting x
Printing the value of x using a default fallback value
20
Setting the value of x to null
Printing the value of x with x to null
30

As mentioned before, the two notations, with or without the colon, are quite the same. Let us see what happens if in the previous script we substitute ${x:-somenumber} with ${x-somenumber}.

Let's run the modified script:

Setting the variable x
Printing the value of x using a default fallback value
10
Unsetting x
Printing the value of x using a default fallback value
20
Setting the value of x to null
Printing the value of x with x to null
zarrelli:$

Everything is fine, but the last line. So what is the difference at play here? Simple:

  • *${x-30}: The notation with a colon forces a check on the existence of a value for the variable and this value may well be null. In case you have a value, it does print the value of the variable, ignoring the fallback.
    • unset -f x: It unsets the variable, so it has no value and we have a fallback value
    • x=: It gives a null to x; so the fallback does not come in to play, and we get back the variable value, for example, null
  • ${x:-30}: This forces a fallback value in case the value of a variable is null or nonexistent
    • unset -f x: It unsets the variable, so it has no value and we have a fallback value
    • x=: It gives a null to x, but the fallback comes in to play and we get a default value

Default values can be handy if you are writing a script which expects an input or the customer: if the customer does not provide a value, we can use a fallback default value and have our variable instanced with something meaningful:

#!/bin/bash        
echo "Hello user, please give me a number: "
read user_input
echo "The number is: ${user_input:-99}"

We ask the user for an input. If he gives us a value, we print it; otherwise, we fallback the value of the variable to 99 and print it:

zarrelli:~$ ./userinput.sh 
Hello user, please give me a number:
10
The number is: 10
zarrelli:~/$
zarrelli$ ./userinput.sh
Hello user, please give me a number:
The number is: 99
zarrelli:~/$
${variable:=default} ${variable=default}

If the variable has a value, it is returned; otherwise, the variable has a default value assigned. In the previous case, we got back a value if the variable had no value; or null, here the variable is actually assigned a value. Better to see an example:

#!/bin/bash    
#!/bin/bash
echo "Setting the variable x"
x=10
echo "Printing the value of x"
echo ${x}
echo "Unsetting x"
unset -v x
echo "Printing the value of x using a default fallback value"
echo "${x:-20}"
echo "Printing the value of x"
echo ${x}
echo "Setting the variable x with assignement"
echo "${x:=30}"
echo "Printing the value of x again"
echo ${x}

We set a variable and then print its value. Then, we unset it and print its value, but because it is unset, we get back a default value. So we try to print the value of x, but since the number we got in the preceding operation was not obtained by an assignment, x is still unset. Finally, we use echo "${x:=30}" and get the value 30 assigned to the variable x, and indeed, when we print the value of the variable, we get something. Let us see the script in action:

Setting the variable x
Printing the value of x
10
Unsetting x
Printing the value of x using a default fallback value
20
Printing the value of x
Setting the variable x with assignement
30
Printing the value of x again
30

Notice the blank line in the middle of the output: we just got a value from the preceding operation, not a real variable assignment:

${variable:+default} ${variable+default}  

Force a check on the existence of a non null value for a variable. If it exists, it returns the default value; otherwise it returns null:

#!/bin/bash    
#!/bin/bash
echo "Setting the variable x"
x=10
echo "Printing the value of x"
echo ${x}
echo "Printing the value of x with a default value on
assigned value"

echo "${x:+100}"
echo "Printing the value of x after default"
echo ${x}
echo "Unsetting x"
unset -v x
echo "Printing the value of x using a default fallback value"
echo "${x:+20}"
echo "Printing the value of x"
echo ${x}
echo "Setting the variable x with assignement"
echo "${x:+30}"
echo "Printing the value of x again"
echo ${x}

Now, let us run it and check, as follows:

Setting the variable x
Printing the value of x
10
Printing the value of x with a default value on assigned value
100
Printing the value of x after default
10
Unsetting x
Printing the value of x using a default fallback value
Printing the value of x
Setting the variable x with assignement
Printing the value of x again
zarrelli:~$

As you can see, when the variable is correctly instanced, instead of returning its value, it returns a default 100 and this is double-checked in the following rows where we print the value of x and it is still 10: the 100 we saw was not a value assignment but just a default returned instead of the real value:

${variable:?message} ${variable?message}
#!/bin/bash
x=10
y=
unset -v z
echo ${x:?"Should work"}
echo ${y:?"No way"}
echo ${y:?"Well"}

The results are quite straightforward:

zarrelli:~$ ./set-message.sh 
10
./set-message.sh: line 8: y: No way

As we tried to access a void variable, but for the unset would have been the same, the script exited with an error and the message we got from the variable expansion. All good with the first line, x has a value and we printed it but, as you can see, we cannot arrive to the third line, which remains unparsed: the script exited abruptly with a default message printed.

Nice stuff, isn't it? Well, there is a lot more, we have to look at the pattern matching against variables.

Pattern matching against variables

We have a few ways to fiddle with variables, and some of these have a really interesting use in scripts, as we will see later on in this book. Let's briefly recap what we can do with variables and how to do it, but remember we are dealing with values that are returned, not assigned back to the variable:

${#variable)  

It gives us the length of the variable, or if it is an array, the length of the first element of an array. Here is an example:

zarrelli:~$ my_variable=thisisaverylongvalue
zarrelli:~$ echo ${#my_variable}
20

And indeed thisisaverylongvalue is made up of 20 characters. Now, let us see an example with arrays:

zarrelli:~$ fruit=(apple pear banana)  

Here, we instantiated an array with three elements apple, pear, and banana. We will see later in this book how to work with arrays in detail:

[email protected]:~$ echo ${fruit[2]}
banana

We printed the third element of the array. Arrays start with an index of 0, so the third element is at index 2, and it is banana, a 6 characters long word:

[email protected]:~$ echo ${fruit[1]}
pear

We print the second element of the array: pear,a 4 characters long word:

[email protected]:~$ echo ${fruit[0]}
apple

And now, the first element, that is, apple is 5 characters long. Now, if the example we saw is true, the following command should return 5.

zarrelli:~$ echo ${#fruit}
5

And indeed, the length of the word apple is 5 characters:

${variable#pattern) 

If you need to tear out your variable, for a part of it you can use a pattern and remove the shortest occurrence of the pattern from the beginning of the variable and return the resulting value. It is not a variable assignment, not so easy to grasp, but an example will make it clear:

zarrelli:~$ shortest=1010201010
zarrelli:~$ echo ${shortest#10}
10201010
zarrelli:~$ echo ${shortest}
1010201010
${variable##pattern)

This form is like the preceding one but with a slight difference, the pattern is used to remove its largest occurrence in the variable:

zarrelli:~$ my_variable=10102010103  

We instanced the variable with a series of recurring digits:

zarrelli:~$ echo ${my_variable#1*1}
02010103

Then, we tried to match a pattern, which means any digit between a leading and ending 1, the shortest occurrence. So it took out 10102010103:

zarrelli:~$ echo ${my_variablet##1*1}
03

Now, we cut away the widest occurrence of the pattern, and so 10102010103, resulting in a meager 03 as the value returned:

${variable%pattern)  

Here, we cut away the shortest occurrence of the pattern but now from the end of the variable value:

zarrelli:~$ ending=10102010103
zarrelli:~$ echo ${ending%1*3}
10102010

So, the shortest occurrence of the 1*3 pattern counted from the end of the file is 10102010103 so we get 10102010 back:

${variable%%pattern)  

Similar to the previous example, with ##, in this case, we cut away the longest occurrence of the pattern from the end of the variable value:

zarrelli:~$ ending=10102010103
zarrelli:~$ echo ${ending}
10102010103
zarrelli:~$ echo ${ending%1*3}
10102010
zarrelli:~$ echo ${ending%%1*3}
zarrelli:~$

Quite clear, isn't it? The longest occurrence is 1*3 is 10102010103, so we tear away everything and we return nothing, as this example which makes use of the evaluation of -z (is empty) will show:

zarrelli:~$ my_var=${ending%1*3}
zarrelli:~$ [[ -z "$my_var" ]] && echo "Empty" || echo "Not empty"
Not empty
zarrelli:~$ my_var=${ending%%1*3}
zarrelli:~$ [[ -z "$my_var" ]] && echo "Empty" || echo "Not empty"
Empty
${variable/pattern/substitution}

The reader familiar with regular expressions probably already understood what the outcome is: replace the first occurrence of the pattern in the variable by substitution. If substitution does not exist, then delete the first occurrence of a pattern in variable:

zarrelli:~$ my_var="Give me a banana"
zarrelli:~$ echo ${my_var}
Give me a banana
zarrelli:~$ echo ${my_var/banana/pear}
Give me a pear
zarrelli:~$ fruit=${my_var/banana/pear}
zarrelli:~$ echo ${fruit}
Give me a pear

Not so nasty, and we were able to instance a variable with the output of our find and replace:

${variable//pattern/substitution}  

Similar to the preceding, in this case, we are going to replace the occurrences of a pattern in the variable:

[email protected]:~$ fruit="A pear is a pear and is not a banana"
[email protected]:~$ echo ${fruit//pear/watermelon}
A watermelon is a watermelon and is not a banana

Like the preceding example, if substitution is omitted, a pattern is deleted from the variable:

${variable/#pattern/substitution}  

If the prefix of the variable matches, then replace the pattern with substitution in variable, so this is similar to the preceding but matches only at the beginning of the variable:

zarrelli:~$ fruit="a pear is a pear and is not a banana"
zarrelli:~$ echo ${fruit/#"a pear"/}
is a pear and is not a banana
zarrelli:~$ echo ${fruit/#"a pear"/"an apple"}
an apple is a pear and is not a banana

As usual, omitting means deleting the occurrence of the pattern from the variable.

${variable/%pattern/substitution}  

Once again, a positional replacement, this time at the end of the variable value:

zarrelli:~$ fruit="a pear is not a banana even tough I would 
like to eat a banana"

zarrelli:~$ echo ${fruit/%"a banana"/"an apple"}
a pear is not a banana even though I would like to eat an apple

A lot of nonsense, but it makes sense:

${!prefix_variable*}
${[email protected]}

Match the name of the variable names starting with the highlighted prefix:

zarrelli:~$ firstvariable=1
zarrelli:~$ secondvariable=${!first*}
[email protected]:~$ echo ${secondvariable}
firstvariable
zarrelli:~$ thirdvariable=${secondvariable}
zarrelli:~$ echo ${thirdvariable}
firstvariable
${variable:position}

We can decide from which position we want to start the variable expansion, so determining what part of its value we want to get back:

zarrelli:~$ picnic="Either I eat an apple or I eat a raspberry"
zarrelli:~$ echo ${picnic:25}
I eat a raspberry

So, we just took a part of the variable, and we decided the starting point, but we can also define for how long cherry-picking is done:

${variable:position:offset}
zarrelli:~$ wheretogo="I start here, I go there, no further"
zarrelli:~$ echo ${wheretogo:14:10}
I go there

So we do not go further, start at a position and stop at the offset; this way, we can extract whatever consecutive characters/digits we want from the value of a variable.

So far, we have seen many different ways to access and modify the content of a variable or, at least, of what we get from a variable. There is a class of very special variables left to look at, and these will be really handy when writing a script.

Special variables

Let's see now some variables which have some spacial uses that we can benefit from:

${1}, ${n}

The first interesting variables we want to explore have a special role in our scripts because they will let us capture more than an argument on our first command-line execution. Have a look at this bunch of lines:

!/bin/bash    
fistvariable=${1}
secondvariable=${2}
thirdvariable=${3}
echo "The value of the first variable is ${1}, the second
is ${2}, the third is ${3}"

Pay attention to $1, $2, $3:

zarrelli:~$ ./positional.sh 
The value of the first variable is , the second is , the third is

First try, no arguments on the command line, we see nothing printed for the variables:

zarrelli:~$ ./positional.sh 1 2 3
The value of the first variable is 1, the second is 2,
the third is 3

Second try, we invoke the script and add three digits separated by spaces and, actually, we can see them printed. The first on the command line corresponds to $1, the second to $2, and the third to $3:

zarrelli:~$ ./positional.sh Green Yellow Red  

The value of the first variable is Green; the second is Yellow; and the third is Red.

Third try, we use words with the same results. But notice here:

zarrelli:~$ ./positional.sh "One sentence" "Another one" 
A third one

The value of the first variable is One sentence, the second
is Another one, the third is A

We used a double quote to prevent the space between one sentence and another being interpreted as a divider for the command-line bits, and in fact, the first and second sentences were added as a complete string to the variables, but the third came up just with an A because the subsequent spaces, not quoted, were considered to be separators and the following bits taken as $4, $5, and $n. Note that we could also mix the order of assignment, as follows:

thirdvariable=${3}
fistvariable=${1}
secondvariable=${2}

The result would be the same. What is important is not the position of the variable we declare, but what positional we associate with it.

As you saw, we used two different methods to represent a positional variable:

${1}
$1

Are they the same? Almost. Look here:

#!/bin/bash    
fistvariable=${1}
secondvariable=${2}
thirdvariable=${3}
eleventhvariable=$11
echo "The value of the first variable is ${fistvariable},
the second is ${secondvriable}, the third is ${thirdvariable},
the eleventh is ${eleventhvariable}"

Now, let's execute the script:

zarrelli:~$ ./positional.sh "One sentence" "Another one" A 
third one

The value of the first variable is One sentence, the second
is Another one, the third is A, the eleventh is One sentence1

Interesting, the eleventhvariable has been interpreted as it were the positional $1 and added a 1. Odd, let's rewrite the echo in the following way:

eleventhvariable=${11}  

And run the script again:

zarrelli$ ./positional.sh "One sentence" "Another one" A third one
The value of the first variable is One sentence, the second is
Another one, the third is A, the eleventh is

Now we are correct. We did not pass an eleventh positional value on the command line, so the eleventhvariable has not been instantiated and we do not see anything printed to the video. Be cautious, always use ${}; it will preserve the value of the variable in your complex scripts when having a grasp of every single detail would be really difficult:

${0}  

This expands to the full path to the script; it gives you a way to handle it in your script. So, let's add the following line at the end of the script and execute it:

echo "The full path to the script is $0"
zarrelli:~$ ./positional.sh 1 2 3
The value of the first variable is 1, the second is 2, the
third is 3, the eleventh is

The full path to the script is ./positional.sh

In our case, the path is local, since we called the script from inside the directory that is holding it:

${#}  

Expands into the number of the arguments passed to the script, showing us the number of arguments that have been passed on the command line to the script. So, let's add the following line to our script and let's see what comes out of it:

echo "We passed ${#} arguments to the script"    
zarrelli:~$ ./positional.sh 1 2 3 4 5 6 7
The value of the first variable is 1, the second is 2, the
third is 3, the eleventh is

The full path to the script is ./positional.sh
We passed 7 arguments to the script
${@}
${*}

Gives us the list of arguments passed on the command line to the script, with one difference: ${@} preserves the spaces, the second doesn't:

#!/bin/bash
fistvariable=${1}
secondvariable=${2}
thirdvariable=${3}
eleventhvariable=${11}
export IFS=*
echo "The value of the first variable is ${fistvariable},
the second is ${secondvariable}, the third is ${thirdvariable},
the eleventh is ${eleventhvariable}"

echo "The full path to the script is $0"
echo "We passed ${#} arguments to the script"
echo "This is the list of the arguments ${@}"
echo "This too is the list of the arguments ${*}"
IFS=
echo "This too is the list of the arguments ${*}"

We changed the characters used by the shell as a delimiter to identify single words. Now, let us execute the script:

zarrelli:~$ ./positional.sh 1 2 3
The value of the first variable is 1, the second is 2,
the third is 3, the eleventh is

The full path to the script is ./positional.sh
We passed 3 arguments to the script
This is the list of the arguments 1 2 3
This too is the list of the arguments 1*2*3
This too is the list of the arguments 123

Here, you can see the difference at play:

  • *: This expands to the positional parameters, starting from the first and when the expansion is within double quotes, it expands to a single word and separates each positional parameter using the first character of IFS. If the latter is null, a space is used, if it is null the words are concatenated without separators.
  • @: This expands to the positional parameter, starting from the first, and if the expansion occurs within a double quote, each positional parameter is expanded to a word on its own:
${?}  

This special variable expands to the exit value of the last command executed, as we have already seen:

zarrelli:~$ /bin/ls disk.sh ; echo ${?} ; tt ; echo ${?}
disk.sh
0
bash: tt: command not found
127

The first command was successful, so the exit code is 0 ; the second gave an error 127command not found, since such a command as tt does not exist.

${$} expands to the process number of the current shell and for a script is the shell in which it is running. Let us add the following line to our positional.sh script:

echo "The process id of this script is ${$}"  

Then let's run it:

zarrelli:~$ ./positional.sh 1 2 3
The value of the first variable is 1, the second is 2, the
third is 3, the eleventh is

The full path to the script is ./positional.sh
We passed 3 arguments to the script
This is the list of the arguments 1 2 3
This too is the list of the arguments 1*2*3
This too is the list of the arguments 123
The process id of this script is 13081

Step by step, our script is telling us more and more:

${!}  

This is tricky; it expands to the process number of the last command executed in the background. Time to add some other lines to our script:

echo "The background process id of this script is ${!}"
echo "Executing a ps in background"
nohup ps &
echo "The background process id of this script is ${!}"

And now execute it:

zarrelli:~$ ./positional.sh 1 2 3
The value of the first variable is 1, the second is 2,
the third is 3, the eleventh is

The full path to the script is ./positional.sh
We passed 3 arguments to the script
This is the list of the arguments 1 2 3
This too is the list of the arguments 1*2*3
This too is the list of the arguments 123
The process id of this script is 13129
The background process id of this script is
Executing a ps in background
The background process id of this script is 13130
nohup: appending output to 'nohup.out'

We used nohup ps & to send the ps in the background (&) and detach it from the current terminal (nohup). We will see later, in more details the use of background commands; it suffices now to see how, before sending the process in to the background, we had no value to print for ${!} ; it was instanced only after we sent ps in to the background.

Do you see that?

nohup: appending output to 'nohup.out'  

Well, for our purposes, it has no value, so how can we redirect this useless output and get rid of it during the execution of our script? You know what? It is a tiny exercise for you to do before you start reading the next chapter, which will deal with the operators and much more fun.

 

Summary

In this chapter, we touched on some of the very basics of the shell, such as things that you should know how to deal with in the correct way. Failing to preserve variable names can, for instance, lead us to unwanted results and, on a different side, knowing how to access environment variables will help us create a better environment for our day-to-day tasks. As we said, basic but important things that a Bash master should know by heart, because unmasks, file descriptors, and fiddling with variables are what let you play awesome tricks and are the building blocks to becoming more advanced. So, do not just overlook them; they will help you.

About the Author

  • Giorgio Zarrelli

    Giorgio Zarrelli is a passionate GNU/Linux system administrator and Debian user, but has worked over the years with Windows, Mac, and OpenBSD, writing scripts, programming, installing and configuring services--whatever is required from an IT guy. He started tinkering seriously with servers back in his university days, when he took part in the Computational Philosophy Laboratory and was introduced to the Prolog language. As a young guy, he had fun being paid for playing games and write about them in video game magazines. Then he grew up and worked as an IT journalist and Nagios architect, and recently moved over to the threat intelligence field, where a lot of interesting stuff is happening nowadays.

    Over the years, he has worked for start-ups and well-established companies, among them In3 incubator and Onebip as a database and systems administrator, IBM as QRadar support, and Anomali as CSO, trying to find the best ways to help companies make the best out of IT.

    Giorgio has written several books in Italian on different topics related to IT, from Windows security to Linux system administration, covering MySQL DB administration and Bash scripting.

    Browse publications by this author

Latest Reviews

(6 reviews total)
ce qu'il me manquait sur bash
BASH step by step, deep and details
As a *nix user (linux, pfsense and macOS) bash knowledge is more than needed, besides it's fun to script. The book is very well organized and good quality expert contents.

Recommended For You

Book Title
Unlock this book and the full library for FREE
Start free trial