Linux Shell Script: Monitoring Activities


Linux Shell Scripting Cookbook

Linux Shell Scripting Cookbook

Solve real-world shell scripting problems with over 110 simple but incredibly effective recipes

  • Master the art of crafting one-liner command sequence to perform tasks such as text processing, digging data from files, and lot more
  • Practical problem solving techniques adherent to the latest Linux platform
  • Packed with easy-to-follow examples to exercise all the features of the Linux shell scripting language
  • Part of Packt's Cookbook series: Each recipe is a carefully organized sequence of instructions to complete the task as efficiently as possible


        Read more about this book      

Disk usage hacks

Disk space is a limited resource. We frequently perform disk usage calculation on hard disks or any storage media to find out the free space available on the disk. When free space becomes scarce, we will need to find out large-sized files that are to be deleted or moved in order to create free space. Disk usage manipulations are commonly used in shell scripting contexts. This recipe will illustrate various commands used for disk manipulations and problems where disk usages can be calculated with a variety of options.

Getting ready

df and du are the two significant commands that are used for calculating disk usage in Linux. The command df stands for disk free and du stands for disk usage. Let's see how we can use them to perform various tasks that involve disk usage calculation.

How to do it...

To find the disk space used by a file (or files), use:


For example:

$ du file.txt

The result is, by default, shown as size in bytes.

In order to obtain the disk usage for all files inside a directory along with the individual disk usage for each file showed in each line, use:


-a outputs results for all files in the specified directory or directories recursively.

Running du DIRECTORY will output a similar result, but it will show only the size consumed by subdirectories. However, they do not show the disk usage for each of the files. For printing the disk usage by files, -a is mandatory.

For example:

$  du -a test
4  test/output.txt
4  test/
4  test/
16  test

An example of using du DIRECTORY is as follows:

$ du test
16  test

There's more...

Let's go through additional usage practices for the du command.

Displaying disk usage in KB, MB, or Blocks

By default, the disk usage command displays the total bytes used by a file. A more human-readable format is when disk usage is expressed in standard units KB, MB, or GB. In order to print the disk usage in a display-friendly format, use –h as follows:


For example:

$ du -sh test/
4.0K  test/
# Multiple file arguments are accepted


$ du -h hack/
16K  hack/

Finding the 10 largest size files from a given directory

Finding large-size files is a regular task we come across. We regularly require to delete those huge size files or move them. We can easily find out large-size files using du and sort commands. The following one-line script can achieve this task:

$ du -ak SOURCE_DIR | sort -nrk 1 | head

Here -a specifies all directories and files. Hence du traverses the SOURCE_DIR and calculates the size of all files. The first column of the output contains the size in Kilobytes since -k is specified and the second column contains the file or folder name.

sort is used to perform numerical sort with column 1 and reverse it. head is used to parse the first 10 lines from the output.

For example:

$ du -ak /home/slynux | sort -nrk 1 | head -n 4
50220 /home/slynux
43296 /home/slynux/.mozilla
43284 /home/slynux/.mozilla/firefox
43276 /home/slynux/.mozilla/firefox/8c22khxc.default

One of the drawbacks of the above one-liner is that it includes directories in the result. However, when we need to find only the largest files and not directories we can improve the one-liner to output only the large-size files as follows:

$ find . -type f -exec du -k {} \; | sort -nrk 1 | head

We used find to filter only files to du rather than allow du to traverse recursively by itself.

Calculating execution time for a command

While testing an application or comparing different algorithms for a given problem, execution time taken by a program is very critical. A good algorithm should execute in minimum amount of time. There are several situations in which we need to monitor the time taken for execution by a program. For example, while learning about sorting algorithms, how do you practically state which algorithm is faster? The answer to this is to calculate the execution time for the same data set. Let's see how to do it.

How to do it...

time is a command that is available with any UNIX-like operating systems. You can prefix time with the command you want to calculate execution time, for example:

$ time COMMAND

The command will execute and its output will be shown. Along with output, the time command appends the time taken in stderr. An example is as follows:

$ time ls
real    0m0.008s
user    0m0.001s
sys     0m0.003s

It will show real, user, and system times for execution. The three different times can be defined as follows:

  • Real is wall clock time—the time from start to finish of the call. This is all elapsed time including time slices used by other processes and the time that the process spends when blocked (for example, if it is waiting for I/O to complete).
  • User is the amount of CPU time spent in user-mode code (outside the kernel) within the process. This is only the actual CPU time used in executing the process. Other processes and the time that the process spends when blocked do not count towards this figure.
  • Sys is the amount of CPU time spent in the kernel within the process. This means executing the CPU time spent in system calls within the kernel, as opposed to library code, which is still running in the user space. Like 'user time', this is only the CPU time used by the process.

An executable binary of the time command is available at /usr/bin/time as well as a shell built-in named time exists. When we run time, it calls the shell built-in by default. The shell built-in time has limited options. Hence, we should use an absolute path for the executable (/usr/bin/time) for performing additional functionalities.

We can write this time statistics to a file using the -o filename option as follows:

$ /usr/bin/time -o output.txt COMMAND

The filename should always appear after the –o flag.

In order to append the time statistics to a file without overwriting, use the -a flag along with the -o option as follows:

$ /usr/bin/time -a -o output.txt COMMAND

We can also format the time outputs using format strings with the -f option. A format string consists of parameters corresponding to specific options prefixed with %. The format strings for real time, user time, and sys time are as follows:

  • Real time - %e
  • f User - %U
  • f sys - %S

By combining parameter strings, we can create formatted output as follows:

$ /usr/bin/time -f "FORMAT STRING" COMMAND

For example:

$ /usr/bin/time -f "Time: %U" -a -o timing.log uname

Here %U is the parameter for user time.

When formatted output is produced, the formatted output of the command is written to the standard output and the output of the COMMAND, which is timed, is written to standard error. We can redirect the formatted output using a redirection operator (>) and redirect the time information output using the (2>) error redirection operator. For example:

$ /usr/bin/time -f "Time: %U" uname> command_output.txt 2>time.log
$ cat time.log
Time: 0.00
$ cat command_output.txt

Many details regarding a process can be collected using the time command. The important details include, exit status, number of signals received, number of context switches made, and so on. Each parameter can be displayed by using a suitable format string.

The following table shows some of the interesting parameters that can be used:

For example, the page size can be displayed using the %Z parameters as follows:

$ /usr/bin/time -f "Page size: %Z bytes" ls> /dev/null
Page size: 4096 bytes

Here the output of the timed command is not required and hence the standard output is directed to the /dev/null device in order to prevent it from writing to the terminal.



        Read more about this book      

Printing the 10 most frequently-used commands

Terminal is the tool used to access the shell prompt where we type and execute commands. Users run many commands in the shell. Many of them are frequently used. A user's nature can be identified easily by looking at the commands he frequently uses. This recipe is a small exercise to find out 10 most frequently-used commands.

Getting ready

Bash keeps track of previously typed commands by the user and stores in the file ~/.bash_history. But it only keeps a specific number (say 500) of the recently executed commands. The history of commands can be viewed by using the command history or cat ~/.bash_history. We will use this for finding out frequently-used commands.

How to do it...

We can get the list of commands from ~/.bash_history, take only the command excluding the arguments, count the occurrence of each command, and find out the 10 commands with the highest count.

The following script can be used to find out frequently-used commands:

#Description: Script to list top 10 used commands
printf "COMMAND\tCOUNT\n" ;
cat ~/.bash_history | awk '{ list[$1]++; } \
for(i in list)
printf("%s\t%d\n",i,list[i]); }
}'| sort -nrk 2 | head

A sample output is as follows:

$ ./
ping    80
ls      56
cat     35
ps      34
sudo    26
du      26
cd      26
ssh     22
sftp    22
clear   21

How it works...

In the above script, the history file ~/.bash_history is the source file used. The source input is passed to awk through a pipe. Inside awk, we have an associative array list. This array can use command names as index and it stores the count of the commands in array locations. Hence for each arrival or occurrence of a command it will increment by one (list[$1]++). $1 is used as the index. $1 is the first word of text in a line input. If $0 were used it would contain all the arguments for the command also. For example, if ssh is a line from .bash_history, $0 equals to ssh and $1 equals to ssh.

Once all the lines of the history files are traversed, we will have the array with command names as indexes and their count as the value. Hence command names with maximum count values will be the commands most frequently used. Hence in the END{} block of awk, we traverse through the indexes of commands and print all command names and their counts. sort -nrk 2 will perform a numeric sort based on the second column (COUNT) and reverse it. Hence we use the head command to extract only the first 10 commands from the list. You can customize the top 10 to top 5 or any other number by using the argument head -n NUMBER.

Listing the top 10 CPU consuming process in a hour

CPU time is a major resource and sometimes we require to keep track of the processes that consume the most CPU cycles in a period of time. In regular desktops or laptops, it might not be an issue that the CPU is heavily consumed. However, for a server that handles numerous requests, CPU is a critical resource. By monitoring the CPU usage for a certain period we can identify the processes that keep the CPU busy all the time and optimize them to efficiently use the CPU or to debug them due to any other issues. This recipe is a practice with process monitoring and logging.

Getting ready

ps is a command used for collecting details about the processes running on the system. It can be used to gather details such as CPU usage, commands under execution, memory usage, status of process, and so on. Processes that consume the CPU for one hour can be logged, and the top 10 can be determined by proper usage of ps and text processing.

How to do it...

Let's go through the following shell script for monitoring and calculating CPU usages in one hour:

#Description: Script to calculate cpu usage by processes for 1 hour
#Change the SECS to total seconds for which monitoring is to be
#UNIT_TIME is the interval in seconds between each sampling
echo Watching CPU usage... ;
  ps -eo comm,pcpu | tail -n +2 >> /tmp/cpu_usage.$$
  sleep $UNIT_TIME
echo CPU eaters :
cat /tmp/cpu_usage.$$ | \
awk '
{ process[$1]+=$2; }
  for(i in process)
    printf("%-20s %s",i, process[i] ;
   }' | sort -nrk 2 | head
rm /tmp/cpu_usage.$$
#Remove the temporary log file

A sample output is as follows:

$ ./
Watching CPU usage...
CPU eaters :
Xorg        20
firefox-bin   15
bash        3
evince      2
pulseaudio    1.0         0.3
wpa_supplicant  0
wnck-applet     0
watchdog/0      0
usb-storage     0

How it works...

In the above script, the major input source is ps -eocomm, pcpu. comm stands for command name and pcpu stands for the CPU usage in percent. It will output all the process names and the CPU usage in percent. For each process there exists a line in the output. Since we need to monitor the CPU usage for one hour, we repeatedly take usage statistics using ps -eo comm,pcpu | tail -n +2 and append to a file /tmp/cpu_usage.$$ running inside a for loop with 60 seconds wait in each iteration. This wait is provided by sleep 60. It will execute ps once in each minute.

tail -n +2 is used to strip off the header and COMMAND %CPU in the ps output.

$$ in cpu_usage.$$ signifies that it is the process ID of the current script. Suppose PID is 1345, during execution it will be replaced as /tmp/cpu_usage.1345. We place this file in /tmp since it is a temporary file.

The statistics file will be ready after one hour and will contain 60 entries corresponding to the process status for each minute. Then awk is used to sum the total CPU usage for each process. An associative array process is used for the summation of CPU usages. It uses the process name as an array index. Finally, it sorts the result with a numeric reverse sort according to the total CPU usage and pass through head to obtain top 10 usage entries.

Monitoring command outputs with watch

We might need to continuously watch the output of a command for a period of time in equal intervals. For example, for a large file copy, we need to watch the growing file size. In order to do that, newbies repeatedly type commands and press return a number of times. Instead we can use the watch command to view output repeatedly. This recipe explains how to do that.

How to do it...

The watch command can be used to monitor the output of a command on the terminal at regular intervals. The syntax of the watch command is as follows:

$ watch COMMAND

For example:

$ watch ls


$ watch 'COMMANDS'

For example:

$ watch 'ls -l | grep "^d"'
# list only directories

This command will update the output at a default interval of two seconds.

We can also specify the time interval at which the output needs to be updated, by using -n SECONDS. For example:

$ watch -n 5 'ls -l'
#Monitor the output of ls -l at regular intervals of 5 seconds

There's more

Let's explore an additional feature of the watch command.

Highlighting the differences in watch output

In watch, there is an option for updating the differences that occur during the execution of the command at an update interval to be highlighted using colors. Difference highlighting can be enabled by using the -d option as follows:

$ watch -d 'COMMANDS'

Remote disk usage health monitor

A network consists of several machines with different users. The network requires centralized monitoring of disk usage of remote machines. The system administrator of the network needs to log the disk usage of all the machines in the network every day. Each log line should contain details such as the date, IP address of the machine, device, capacity of the device, used space, free space, percentage usage, and health status. If the disk usage of any of the partitions in any remote machine exceeds 80 percent, the health status should be set to ALERT, else it should be set to SAFE. This recipe will illustrate how to write a monitoring script that can collect details from remote machines in a network.

Getting ready

We need to collect the disk usage statistics from each machine on the network, individually, and write a log file in the central machine. A script that collects the details and writes the log can be scheduled to run everyday at a particular time. The SSH can be used to log in to remote systems to collect disk usage data.

How to do it…

First we have to set up a common user account on all the remote machines in the network. It is for the disklog program to log in to the system. We should configure auto-login with SSH for that particular user. We assume that there is a user called test in all remote machines configured with auto-login. Let's go through the shell script:

#Description: Intruder reporting tool with auth.log input
if [[ -n $1 ]];
  echo Using Log file : $AUTHLOG
grep -v "invalid" $AUTHLOG > $LOG
users=$(grep "Failed password" $LOG | awk '{ print $(NF-5) }' | sort |
printf "%-5s|%-10s|%-10s|%-13s|%-33s|%s\n" "Sr#" "User" "Attempts" "IP
address" "Host_Mapping" "Time range"
ip_list="$(egrep -o "[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+" $LOG | sort |
for ip in $ip_list;
  grep $ip $LOG > /tmp/temp.$$.log
for user in $users;
  grep $user /tmp/temp.$$.log> /tmp/$$.log
  cut -c-16 /tmp/$$.log > $$.time
  tstart=$(head -1 $$.time);
  start=$(date -d "$tstart" "+%s");
  tend=$(tail -1 $$.time);
  end=$(date -d "$tend" "+%s")
  limit=$(( $end - $start ))
  if [ $limit -gt 120 ];
    let ucount++;
    IP=$(egrep -o "[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+" /tmp/$$.log | head
-1 );
  ATTEMPTS=$(cat /tmp/$$.log|wc -l);
  HOST=$(host $IP | awk '{ print $NF }' )
 printf "%-5s|%-10s|%-10s|%-10s|%-33s|%-s\n" "$ucount" "$user"
rm /tmp/valid.$$.log /tmp/$$.log $$.time /tmp/temp.$$.log 2> /dev/null

We can schedule using the cron utility to run the script at regular intervals. For example, to run the script everyday at 10 am, write the following entry in the crontab:

#Description: Monitor disk usage health for remote systems
if [[ -n $1 ]]
if [ ! -e $logfile ]
  printf "%-8s %-14s %-9s %-8s %-6s %-6s %-6s %s\n" "Date" "IP
address" "Device" "Capacity" "Used" "Free" "Percent" "Status" >
#provide the list of remote machine IP addresses
for ip in $IP_LIST;
  ssh slynux@$ip 'df -H' | grep ^/dev/ > /tmp/$$.df
  while read line;
   cur_date=$(date +%D)
   printf "%-8s %-14s " $cur_date $ip
   echo $line | awk '{ printf("%-9s %-8s %-6s %-6s
%-8s",$1,$2,$3,$4,$5); }'
  pusg=$(echo $line | egrep -o "[0-9]+%")
  if [ $pusg -lt 80 ];
    echo SAFE
    echo ALERT
  done< /tmp/$$.df
) >> $logfile

Run the command crontab –e. Add the above line and save the text editor.

You can run the script manually as follows:

00 10 * * * /home/path/ /home/user/diskusg.log

A sample output log for the above script is as follows:

How it works…

In the script, we can provide the logfile path as a command-line argument or else it will use the default logfile. If the logfile does not exists, it will write the logfile header text into the new file. –e $logfile is used to check whether the file exists or not. The list of IP addresses of remote machines are stored in the variable IP_LIST delimited with spaces. It should be made sure that all the remote systems listed in the IP_LIST have a common user test with auto-login with SSH configured. A for loop is used to iterate through each of the IP addresses. A remote command df –H is executed to get the disk free usage data using the ssh command. It is stored in a temporary file. A while loop is used to read the file line by line. Data is extracted using awk and is printed. The date is also printed. The percentage usage is extracted using the egrep command and % is replaced with none to get the numeric value of percent. It is checked whether the percentage value exceeds 80. If it is less than 80, the status is set as SAFE and if greater than or equal to 80, the status is set as ALERT. The entire printed data should be redirected to the logfile. Hence the portion of code is enclosed in a subshell () and the standard output is redirected to the logfile.


In this article we saw how to use different commands to monitor different activities in Linux Shell Scripting.

In the next article, Linux Shell Script: Logging Tasks, we will go through the logging techniques and their usages.

You've been reading an excerpt of:

Linux Shell Scripting Cookbook

Explore Title