Penetration Testing with the Bash shell

4 (2 reviews total)
By Keith Makan
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this book

This book teaches you to take your problem solving capabilities to the next level with the Bash shell, to assess network and application level security by leveraging the power of the command-line tools available with Kali Linux.

The book begins by introducing some of the fundamental bash scripting and information processing tools. Building on this, the next few chapters focus on detailing ways to customize your Bash shell using functionalities such as tab completion and rich text formatting. After the fundamental customization techniques and general purpose tools have been discussed, the book breaks into topics such as the command-line-based security tools in the Kali Linux operating system. The general approach in discussing these tools is to involve general purpose tools discussed in previous chapters to integrate security assessment tools. This is a one stop solution to learn Bash and solve information security problems.

Publication date:
May 2014


Chapter 1. Getting to Know Bash

The Bourne Again SHell (bash) is arguably one of the most important pieces of software in existence. Without bash shell's many utilities and the problem-solving potential it gives its users by integrating and interfacing system utilities in a programmable way (called bash scripting), many of the very important security-related problems of the modern world would be very tedious to solve. Utilities such as grep, wget, vi, and awk enable their users to do very powerful string processing, data mining, and information management. System administrators, developers, security engineers, and penetration testers all across the world for many years have sworn by its sheer problem-solving potential and effectiveness in enabling them to tackle their day-to-day technical challenges.

Why are discussing the bash shell? Why is it so popular among system administrators, penetration testers, and developers? Well, there may be other reasons, but fundamentally the bash shell is the most standardized and is usually, with regard to most popular operating systems, implemented from a single code base—one source for the official source code. This means one can guarantee a certain base set of execution behaviors for a bash script or collection of commands regardless of the operating system hosting the bash implementation. Operating systems popularly have unique implementations of the Korn Shell (ksh) and other terminal emulator software.

The only disadvantage, if any, of the Linux or Unix environment that bash is native to is that for most people, especially those accustomed to the Graphical User Interface (GUI), the learning curve may be a little steep. This is mainly because the way information is represented. The general Linux/Unix culture and conventions can often be difficult to appreciate for newcomers and possibly due to the lack of tooltips, hints, and rich graphical interaction design and user experience engineering GUIs often benefit from. This book and especially this chapter will introduce some of the witty but brilliant Linux/Unix culture and conventions so that you can get comfortable enough with the bash shell and eventually find your own way around and follow the more advance topics later on in the book.

Throughout the book, the bash environment or the host operating system that will be discussed will be Kali Linux. Kali Linux is a distribution adapted from Debian, and it is packed with utilities focused purely on technical security problem solving and testing. Because knowing how to wield your terminal is strongly associated with knowing your operating system and its various nuances, this chapter and the following chapters will introduce some topics related to the Kali Linux operating system, its configuration setup, and default behavior to enable you to properly use your terminal utilities.

If you're already a seasoned "basher", feel free to skip this chapter and move on to the more security-focused topics in this book.


Getting help from the man pages

Bash shells typically come bundled with a very useful utility called man files, short for manual files. It's a utility that gives you a standardized format to document the purpose and usage of most of the utilities, libraries, and even system calls available to you in your Unix/Linux environment.

In the following sections, we will frequently make use of the conventions and descriptive style used in man files so that you can comfortably switch over to using the man pages to support what you've learnt in the following sections and chapters.

Using man files is pretty easy; all you need to do is fire off the following command from your terminal:


In the previous command, [SECTION NUMBER] is the number of the man page section to be referenced and [MAN PAGE NAME] is, well, the name of the man page. Usually, it is the name of the command, system call, or library itself. For example, if you want to look up the man page for the man command itself, you would execute the following command from your terminal:

man 1 man

In the previous command, 1 tells man to use section 1 and the man argument suffixing the command is the name of the man page, which is also the name of the command to which the page is dedicated.

Man page sections are numbered according to a specification of their own. Here's how the numbers are appropriated:

  1. General commands: You usually use this section to look up the information about commands used on the command line. In a previous example in this section, we used it to look up information about the man file.

  2. System calls; This section documents the arguments and purpose of common system calls facilitated by the host operating system.

  3. C library functions: This section is very useful for C developers and developers who use languages developed as C derivatives such as Python. It will give you information about the arguments, defining header files, behavior, and purpose of certain fundamental C library function calls.

  4. Special files: This section documents special-purpose files, typically those in the /dev/ directory, for instance, character devices, pseudo terminals, and so on. Try picking a couple files in the /dev/ directory of your operating system and executing the following command:

    man 4 [FILENAME]

    For instance:

    man 4 pts
    man 4 tty 
    man 4 urandom
  5. File formats and conventions: This section documents common file formats used to structure information about the system, for instance, logfile formats, the password file formats, and so on. Usually, any file is used to document the information generated by common operating system utilities.

  6. Games and Screensavers: This section contains information about games and screensavers.

  7. Miscellanea: This section contains information about miscellaneous commands and other information. It is reserved for documentation of anything that does not fit into the other categories.

  8. System administration commands and daemons: This section is dedicated to administration commands and information about system daemons.

For a synopsis and full description of these sections, try checking out the intro man files for each of them. You can reach these files by executing the following command for each section number:

man [SECTION NUMBER] intro

I've documented all the man page section numbers and their traditional purpose here. Of course, it is up to developers to uphold these conventions, but generally all you will be interested in is section 1, and if you're going to do some reverse engineering, section 2, 3, and 4 will also be of great help.

The man page layout is standardized to contain a certain collection of sections. Each section of the man page describes a given property of the command, system call, or library being discussed. The following list explains the purpose of the common sections in man file:

  • Name: This is the name of the command, function, system call, or file format.

  • Synopsis: This is a formal description of the command, system call, file format, or what have you describing the usage specification. The way the syntax or usage specifications for commands are specified takes a little understanding to appreciate properly. You may notice the braces in the specification, these are not to be interpreted as literal parts of the command invocation. In fact, they indicate that whatever appears inside the brackets is an optional argument. Also, the "|" character indicates that either the symbols preceding it or following it can be specified as part of the command invocation but not both; think of it as a logical OR.

  • Description: This is an informal description and discussion of the man page topic, detailing its purpose and more information about the options and possible arguments mentioned in the Synopsis section.

  • Examples: This is a collection of examples for the usage of the man page topic.

  • See also: This is a collection of references, web pages, and other resources containing further information about the topic being discussed.

For more about the Linux manual pages, please see the Further reading section at the end of this chapter.


Navigating and searching the filesystem

Navigating and searching the Linux filesystem is one of the most essential skills the developers, system administrators, and penetration testers will need to master in order to realize the full potential of their bash consoles and utilities. To properly master this skill, you will need a good understanding of the organization of your host operating system though it is a little out of context of this book to have a thorough discussion of the Kali Linux operating system's inner workings and organization.

Navigating a filesystem requires the use of a sample collection of the tools and utilities. Here's a breakdown of these tools:

Command name

Common name



Change Directory

This changes your current working directory



This lists the contents of the current working directory


Print Working Directory

This displays the current working directory



This locates or verifies the existence of a file based on a the values of certain attributes

Navigating directories

Navigating directories is popularly done by using the cd command, which is probably one of the simplest commands to use. All you need to do is supply the directory you wish to change to and cd will do the rest. It also has very useful shorthands to speed up the most common tasks users perform when navigating their filesystems.

The following is what the command usage specification looks like:

cd [ -L | -P ] [directory]

In the syntax specification, [directory] is the directory you wish to change your current working directory to and [-L|-P] may be any one of the following:

  • -L: When changing directory, symbolic links should not be respected. The current directory will be changed to include the name of the symbolic link and not its target. This is described in documentation as making the symbolic link logical, since it forces the name of the symbolic link to be treated as logical element in the path being set as the working directory.


    Symbolic links are constructs on a filesystem that allow one file or directory to act purely as a reference to another file. These links affect the way path resolution occurs, since in some situations when a symbolic link is followed, it will allow one path to direct the current directory to a file represented by another name, as opposed to a pathname resolving strictly as it is named.

  • -P: This is the opposite of the -L command. This specifies that should the file being set as the current directory be symbolic link, it should be resolved completely before being set as the current directory. This means if you visit a symbolic link, your current path will not reflect the name of the symbolic link you used to reach it, unless of course if the link has the same name as its target.

The following is a typical usage example of the cd command:

cd /  

The preceding command will change your current directory to the root directory, which is named /; everything hosted on your filesystem is usually reachable from this directory.

The following are some more examples:

  • cd ~: This command is used to navigate to the current user's home directory

  • cd ../: This command is used to navigate to the directory directly above the current one

In the preceding command, one can have cd navigate an arbitrary number of directories above the current one, for instance, by supplying it a command as follows:

cd ../../../../../

The following are some other commands that can be used to navigate to different directories:

  • cd .: This command is used to navigate to the current directory

  • cd –: This command is used to navigate to the previous directory

  • cd --: This command is used to navigate to the second-last directory

To see whether you have indeed changed your current working directory to the one you've specified, you can invoke the pwd command that will print your working directory. The syntax for the pwd command is as follows:

pwd [-L|-P] [--help] [--version]
pwd [--logical | --physical ] 

The –L or --logical and –P or --physical invocation options serve the same purpose as in the cd command.

Listing directory contents

It's not enough to just move between directories. You will eventually want to find out what's inside these directories. You can do this by using the ls command.

The following is the usage specification for the ls command—adapted from its man page:

ls [-aAlbBCdDfFghHiIklLmNopqQrRsStTuvwxXZ1] [FILE/DIRECTORY]

The previous command specification is another popular Linux/Unix convention. It's a shorthand to specify that any of the letters appearing in the brackets can be specified as part of the command invocation. Also, any number of them may be specified at the same time. For instance, consider the following commands:

ls –Ham
ls –and
ls –Rotti

According to the command specification, they are all acceptable ways to use the ls command. Whether or not any of these will actually do something useful depends on how each switch affects the ls command's behavior. You should keep in mind that some options may have opposing effects or certain combinations may have no effect, like a general note when reading usage specifications such as the one for ls.

The [FILE] or [DIRECTORY] argument would be any path or file at which you wish to fire ls. Without any arguments, ls will list the current working directory's entries.


A switch is a popular jargon for the options, that is, anything directly following the hyphen, specified as part of the command invocation. For example, –l is a switch.

Here's what some of the switches do—we will only discuss some of the most important switches here for the sake of brevity. Keep in mind that the ls command lists directory contents, so all its options will be focused on organizing and presenting a given directory's contents in a specified way.

The following are some of the ls command's invocation options:

  • -a –-all: This displays all the directory entries and does not omit directories or file starting with "." in their names.

  • -d –directory: This lists the directory entries and not their contents. This will also force ls not to dereference symbolic links.

  • -h: This prints sizes in human-readable format, for instance, instead of the number of bytes only it will display file sizes in gigabytes, kilobytes, or megabytes where applicable.

  • -i: This prints the inode number of each file.


    Inodes or i-nodes are data structures assigned to files that represent detailed information about their access rights, access times, sizes, owners, and the location of the file on the actual block devices—the physical medium hosting the file—as well as other important housekeeping-orientated details.

  • -l: This lists the entries in long format.

  • -R –-recursive: This recursively lists directory contents. This tells ls to nest down all the levels of the specified path and enumerate all the reachable file paths, instead of stopping once the working directory is listed—as is the default.

  • -S: This lists the entries sorted by file size.

  • -x: This sorts entries alphabetically by extension, for example, all PDFs after MP3s.

The following are some examples of these options in action. For instance, if you'd like to say sort a bunch of files by their size, while displaying human-readable file sizes and all the access rights and creation times—which seems like a lot of work—you would run the following command:

ls –alSh

You're output could look something like the following screenshot:

Another very useful example would be checking the volume of logins to the system. This can be done by looking at the output of the following command:

ls –alSh /var/log/auth* 

Generally, keeping track of the contents of the /var/log/ directory will always be a good way to grab a good synopsis of the activity on a system.

Searching the filesystem

Another important skill is being able to find resources on your filesystem in a compact yet powerful way. One of the ways you can do this is by using the aptly named find command. The following command is how find works:

find [-H] [-L] [-P] [-D debugopts] [-0level] [path…] [expression]

You can find out more about the find command by checking out the man file on it. This can be done by executing the following command:

man 1 find. 

This was discussed in the Getting help from the man pages section earlier in this chapter.

Moving on, the first three switches, namely, -H, -L, and –P, all control the way symbolic links are treated. The following list tells what they do:

  • -H: This tells find not to follow symbolic links. Symbolic links will be treated as normal files and will not resolve them to their targets. Putting it simply, if a directory contains a symbolic link, the symbolic link will be treated as any other file. This does not affect symbolic links that form part of the selection criteria; these will be resolved.

  • -L: This forces find to follow symbolic links in the directories being processed.

  • -P: This forces find to treat symbolic links as normal files. If a symbolic link is encountered during execution, find will inspect the properties of the symbolic link itself and not its target.

The –D switch is used to allow find to print debug information if you need to know a little about what find is up to while it's searching for the files you want. -0level controls how find optimizes tests and it also allows you to reorder some tests. The level part can be specified as any number between 0 and 3 (inclusive).

The [path...] part of the argument is used to tell find where to look for files. You can also use the . and .. shorthands to specify the current and directory one level up respectively, as with the cd command.

The next argument, or rather group of arguments, is quite an important one: the [expression]. It consists of all the arguments that control the following:

  • Options: This tells what kind of files find should look for

  • Tests: This tells how to identify the files it is looking for

  • Actions: This tells what find should do with the files once they are found

The following is the structural breakdown of the find expression:

[expression] := [options][[test][OPERATOR][test][OPERATOR]...][actions]

[options] :=  [-d][-daystart][-depth][-follow][-help]...
[tests] := [-amin n][-atime file][-cmin n][-cnewer file]...
[OPERATOR] := [()][!][-not][-a][-and][-or]...
[actions] := [-delete][-exec command [;|{} +]][-execdir command]...


The previous code only serves as information about the structure of the expression, to let you know which options go where. Many of the switches for each section have been omitted for brevity. The := characters mean that whatever is on the left-hand side is defined by whatever is defined on the right-hand side.

So now that you know where everything goes, let's look at what some of these arguments do. The find command has quite a number of very powerful options and operational modes, and one could quite literally write an entire book about find itself. So to make sure you don't get short changed—buying a book about "command line hacking" and instead learning only about find—we will only discuss some of the most common options and arguments penetration testers, system administrators, and developers use. The rest of the find command's power can be learned from the Linux manual files.

The following is a summary of some of the find command's possible arguments for options, tests, and actions.

Directory traversal options

The following are some of the options arguments you can use with find:

  • -maxdepth n: This specifies that tests must only be applied to entries in directories at most n levels below the current directory. This option is useful if you're searching through directories that have a similar structure. For instance, if each directory below the one you're searching has something like a lib directory that contains uninteresting files, you can skip all such directories by specifying this option.

  • -mindepth n: This specifies that tests should only be applied to files at depth of at least n directories lower than the specified path.

  • -daystart: This forces any –amin, -atime, -cmin, -ctime, or equivalent time-related tests to use the time starting from the beginning of the current day, rather than 24 hours ago—as is the default behavior.

  • -mount: This forbids find from traveling into other filesystems.


    The find command allows you to specify numeric arguments using convenient shorthands to indicate an "at least" or "at most" type comparison with the specified time:

  • +n: This indicates the specified argument is to be compared as greater than, or at least n

  • -n: This indicates the specified argument is to be compared as less than or at most n

  • n: This forces find to compare n as is, and the attribute must have the exact value of n

File testing options

Tests are applied to a file and either return true or false: either the file being tested has the desired attribute or it doesn't. More than one test can also be supplied, in which case a logical combination—which can also be specified—is applied. By default, if no Boolean is supplied to combined to tests, a logical AND is assumed. This means both tests must be true for the file to be found or reported. The following are some of the file testing options:

  • -amin n: This specifies that the last access time of the file should be n minutes ago. For example:

    • -amin 20: This means the file must have been accessed exactly 20 minutes ago

    • -amin +35: This means the file must have been accessed at most 35 minutes ago

  • -atime n: This specifies that the file should have been access n*24 hours ago, meaning n days. Any fractional part of this number is ignored.

  • -mmin n: This specifies that the file should have been modified n minutes ago.

  • -mtime n: This is the same as –atime, except it matches against the files modified time.

  • -executable | -readable | -writable: This matches any file that has access rights indicating that the file is executable, readable, or writable, respectively.

  • -perm: This mode specifies that the file group should be name. The –perm option offers a myriad of different ways to specify the access mode being tested, here's how it works.


    The access mode bits can be prefixed with anyone of the following:

    • mode: This means no prefix and the mode must be matched exactly.

    • -mode: This means the file's mode must have at least the specified bits set. This will match files with other bits set as long as the specified bits are set as well.

    • /mode: This means that any of the specified bits must be set for the file.

    The mode itself can also be specified in two different ways, symbolically using characters to indicate user types and access modes or the octal decimal mode specification.

  • -iname nAmE: This specifies that the name of the file should match nAmE if the case is ignored; in other words, case-insensitive name matching.

  • -regex pattern: This matches the specified pattern as a regular expression against the file's pathname. Your regular expression must describe the entire pathname.


    Regular expressions are merely ways to describe a set of strings with a specified number of properties in common. If you want to describe a string, you must be able to detail all the properties of the string from beginning to the end. If you don't describe a single character in some or other way, the regular expression won't match!

    Regular expression are in themselves a language, for instance, you could write a regular expressions to describe regular expressions! This means you will need to know how to speak this language in order to use regular expressions properly. To find out how to do this, see the Further reading section at the end of this chapter.

The following are a few simple examples of the –regex option's usage:

  • Find all the files directly under the /etc/ directory that start with the letter p and end in anything using the following command:

    find / -regex '^/etc/p[a-z]*$'
  • Find all the files on the filesystem that are called configuration, ignoring case, and accommodating abbreviations such as confg, cnfg, and cnfig using the following command:

    find / -regex '^[/a-z_]*[cC]+[Oo]*[nN]+[fF]+[iI]*[gF]+$'

    See the following screenshot for a practical example of the previous command:

The regular expression used here must describe the entire file's path! For instance, consider the difference in results between the following two regular expressions:

find / -regex '^[/a-z_]*/$' #matches only the / directory
find / -regex '^[/a-z_]*/*$' #matches everything reachable from the / directory!


Bash script comments

Any bash command or text fed to the bash interpreter and preceded by a hash character is considered a comment, and it will not interpreted.

File action options

The following are some of the action arguments you can use with find:

  • -delete: This action forces find to delete any file for which the specified test returns true. For instance, consider the following command:

    find / -regex '^/[a-z_\-]*/[Vv][iI][rR][uS]*$' –delete

    This command will find and delete anything reachable one level from the root that has a name such as 'virus'—case-insensitive.

  • -exec: This allows you to specify an arbitrary command to execute on all files that match.

    The way this argument works is to build a command line—which is probably passed to some exec* type system call—using the results of the find operation for every result. The find command will use any argument after the –exec switch as a literal argument to the command being executed and any instance of the {} chars as a placeholder for the name of the file, until a ; character is encountered.

    For instance, consider the following as the –exec argument:

    find /etc/ -maxdepth 1 -name passwd -exec stat {} \;

    The actual command line(s) that will be run will look something like the following command, since the only file that will match will be /etc/passwd:

    stat /etc/passwd

    See the following screenshot for a comparison of the stat and find –exec commands:

  • -execdir: This works the same way –exec does, except it will isolate execution of the specified command to the directory of the match file. This works great if you'd like to execute commands based on the contents of a directory that has certain files. For instance, you may want to edit all the .bashrc files for users that don't have .vimrc, which is a configuration script for the VIM text editor. We will discuss more about the .bashrc code later.

  • -print0: This prints the file's full name to standard output. This argument also has the added benefit of terminating filenames with a NULL character, or 0x0 character, so as to allow filenames to contain newlines. It also helps make sure that any program interpreting the output of find will be able to determine the separation between filenames, as they will be strictly separated by NULL characters.


    NULL characters are traditionally used to mark the end of a character string. The NULL character itself is represented at memory level as a 0 value so that compilers and operating systems can clearly recognize the delimitation between strings appearing in memory.

  • -ls: This lists the current file by executing ls –dils, and the output is printed to standard output. The –dils option makes sure that the directory entries are printed. If the matched file is a directory, then inode is printed, and the entry appears in the ls command's long listing format as well as the size of the file.

There are a couple more actions you can specify. For the rest of them, please see the manual file on the find command, which you can access using the man find command.

So as far as searching your filesystem for files, directories, or generally any other interesting things, that's pretty much it. The next fundamental skill you'll need to master is redirecting output from one command to another.


Using I/O redirection

I/O redirection is one of the easiest things to master when it comes to the bash scripting. It's as simple as knowing where you want your input to go and where it's coming from. It may seem like this is a very interesting topic and you might not see why you need to know this, but redirecting output—if you truly get to understand what it's all about—will be what you're doing on your command line almost 80 percent of the time! It's essentially the one thing that allows you to combine different utilities and have them work together quite effectively on the command line in a compact and simple way. For instance, you may want to search through the output from nmap or tcpdump or a key-logger by feeding its output to another file or program to analyze.

Redirecting output

To redirect the output of one program that is invoked from the command line into a file, all you need to do is add a > symbol at the end of the command line for the said program and proceed this with a filename.

For instance, using the most recent example, if you want to redirect the output of the find command to a file named something like writeable-files.txt, this is how it would be done:

find / -writeable > writeable-files.txt 

There is one small detail about this kind of I/O redirection though, as with many of the common bash shorthands: there's usually quite a bit going on under the hood. If used as demonstrated previously, the only output that will actually appear in the chosen file (for the previous example it is writeable-files.txt) would be the output actually printed to the standard output file that is commonly referred to as file descriptor 0, which is the default destination for normal output.


File descriptors are constructs in operating systems that represent access to an actual section of the physical storage mechanism or a file. File descriptors are nothing more than numbers that are associated to other data structures managed by the kernel that represent open files. Each process has its own "private" set of file descriptors.

Whenever you open a file using a text editor or generally perform any editing of a resource stored on a physical medium, a file descriptor representing the involved file is passed to the kernel through a system call. The kernel then uses this number to look up other details about the file in a data structure only the kernel should have access to.

The file descriptor's primary purpose is to help abstract and logically isolate details about the actual process involved with accessing the storage mechanism. After all, reading and writing to files is quite an essential operation to computer systems and it would be quite tedious—and error-prone—to do many things if writing to a file meant accommodating actions such as spinning/stopping the hard drive disk, interpreting different filesystems' organization, and handling read/write errors!

Output destined for or coming from any file descriptor can be redirected, provided that you have the correct access rights from your bash shell! Here's the code to do that:

[command line] a>&b > [output file]

In the previous command, a and b are both file descriptors. If a or b are not explicitly set, then they default to 1, which is standard output.

What about output destined for the standard error file? How do you redirect that? Well as it turns out this is pretty easy too, and here's the code to do it:

[command] 2> [output file]

As you can see in the previous example, we specified the redirection symbol as 2>, which simply means the following:

Redirect everything from file descriptor 1 to the file called writeable-files.txt.

You can also combine or bond the two standard output files, namely send the output of both input and output to a single file if there is anything interesting being printed to the standard error output. It is done using the following command:

[command line] 2>&1 > [output file]

There's also a simpler abbreviation for this and here's what it looks like:

[command line] &> [output file]

This means the following:

Redirect everything from file descriptor 1 to file descriptor 0 and then redirect everything from file descriptor 0 to [output file].

The previous redirection commands will all assume that the specified file does not exist; if it does, the output being directed will overwrite whatever is currently in the file. What will you do if you'd like to append text to a file? Well, the following command shows how that works:

[command line] [&][n] >> [&][m] [filename.txt]

As before, the &, n, and m notations are all optional parameters and work exactly the same as they did in previous examples.

Redirecting input

If you can redirect output, you should also be able to redirect input using the following command:

[command line] < [input file | command line]

Its pretty straightforward really: if > means redirect output, then < means redirect the 'output' of the right operand, which from the perspective of the left operand is input.

As with output redirection, you can also control which file descriptors you'd like to include in the redirection using the following command:

[command line] <[n] [input file | command line]

In the previous command, [n] is the file descriptor number, as with output redirection. The following are a few examples you can test out on your terminal console:

  • keylogs.txt < /dev/`tty`

    The preceding command redirects all the input written to the terminal into the file called keylogs.txt. It achieves this by getting the current tty device associated to the terminal console using the tty command.

  • wc –l < /etc/passwd

    The preceding command redirects input from the /etc/passwd file that contains all the usernames and other user account-orientated details to the wc command, which is used to count lines, file sizes, and other file attributes. Using the –l switch causes the wc command to count all the lines, or more specifically all the new line characters it encounters, until an end of file (EOF) sentinel is reached.


Using pipes

All we've been discussing in this section is redirecting output command to another file; what about redirecting output from one command to another? Well that's exactly what the next section is for.


Pipes are interprocess communication mechanisms, which are mechanisms that allow processes to communicate with one another, in operating systems that allow output from one process to be funneled from to another process as input. In other words, you can turn the standard output of one program into the standard input of another.

In fact, many pipes work exactly this way by duplicating file descriptor 0 for one process and allowing another process to write to it.

The following command shows how to use a pipe in bash speak:

[command line] | [another command line]

Please note that this time the | character, referred to literally as a pipe if used this way, is an actual part of the command invocation. Of course, [command line] would be the command you would like to invoke. The pipe will feed output from the first command line as input to the second command line argument. You can actually specify as many pipes as you your machine will accommodate, which would look something like the following syntax:

[command] | [command] | [command] | ... | [command]

The following are a few examples:

  • cat /etc/passwd | wc –l

    • This is equivalent to the following:

               wc –l < /etc/passwd
    • The following screenshot shows the output of the previous commands:

  • Count the number of files in the operating system's root directory using the following command:

    ls –al / | wc –l
  • List all available usernames using the following command:

    cat /etc/passwd | awk –F: '{print $1}'

    The following screenshot shows the output of the previous command:

  • List all the open services from an nmap scan using the following command:

    nmap –v | grep –e '^[0-9]*/(udp|tcp)[\ ]*open'

Getting to know grep

The Global Regular Expression Print (grep) utility is a staple for all command-line jockeys. The grep utility in its most basic functionality gives its users the ability to run regular expressions on a given input file or stream and prints the matching results. More advanced features of grep allow you to specify which attributes of the matching text you'd like to print, whether you'd like the output colorized, or even how many lines around the matching output you should print. It's packed with many very useful features, and once mastered they become an essential part of any penetration tester, developer, or system administrator's arsenal.


To properly make use of grep, you will need at least basic understanding and practice with regular expressions. Regular expressions will not be covered in their entirety here, though simple examples and basic elements of regular expression language will be covered. For more extensive reading on regular expressions and how they work, see the Further reading section at the end of the chapter.

Regular expression language – a crash course

Regular expressions are merely strings that describe a collection of strings using a special language—in formal language theory terms, any collection or set of strings is termed as language. Being able to wield this language to your disposal is an invaluable skill. It will help you do many things from static code source analysis, reverse engineering, malware fingerprinting and larger vulnerability assessment, and exploit development.

The regular expression language supported by grep is filled with useful shorthands to simplify the description of a set of common strings, for instance, describing a string consisting of any decimal number, any lowercase or uppercase alphabetic character or even any printable character. So given that any string or collection of strings must be composed of a collection of smaller strings, if you know how to match or describe any alphabetic character or any decimal number, you should be able to describe anything composed of characters from those character classes. A character class is simply a language composed of length 1 strings from a specific collection of characters.

First of all, we need to define some "control" characters. Given that you will be describing strings using other strings, there needs to be a way to designate special meaning to given characters or substrings in your regular expression. Otherwise, all you'd be able to do is compare one string to another, character by character. You can do that as follows:

  • ^: The following regular expression must be matched at the beginning of a line, for example, ^this is the start of the line.

  • $: The preceding regular expression must be matched at the end of a line, for example, this is the end of the line$.

  • []: The description of a character class, or a list of characters, is contained within the brackets, and strings that match contain characters in the specified list. Certain character classes can be described using shorthands. We will see some of them throughout the rest of the chapter.

  • (): This logically groups regular expressions together.

  • |: This is a logical OR of two regular expressions, for instance, ([expression]) | ([expression]).

  • ?: This matches the preceding regular expression at least once. For example, keith? will match any string that either contains "keith" or doesn't at all.

  • +: This matches the preceding regular expression at least once.

  • {n}: This matches the preceding regular expression exactly n times.

  • {n,m}: This matches the preceding regular expression at least n times and at most m times. For example [0-9]{0,10} will match any decimal number containing between 0 and 10 digits.

The following is a small collection of some of the shorthands grep supports as an extended regular expression language:

  • [:alnum:]: This matches alphanumeric characters, any decimal digit, or alphabetical character

  • [:alpha:]: This matches strictly alphabetical characters a-z

  • [:digit:]: This strictly matches decimal numbers 0-9

  • [:punt:]: Any punctuation character will be matched

There are a number of other character class shorthands available; see the manual page for grep for more information.

Regular expressions are simply collections of these control characters and character classes. For example, you could combine them in any way you like as long as all the brackets, braces, and parenthesis are balanced.

Now that you have some basic background in regular expressions, let's look at the grep utility's usage specification using the following command:

grep [options] PATTERN [file list]
[options] := [matcher selection][matching control][output control][file selection][other]
PATTERN := a pattern used to match with content in the file list.
[matcher selection] := [-E|--extended-regexp][-F|--fixed-strings]...
[matching control] := [-e|--regexp][-f|--file][-i|--ignore-case]... 
[output control] := [-c][--count][-L|--files-without-match]...
[file selection] := [-a | --text][--binary-files=TYPE][--exclude]...
[file list] := [file name] [file name] ... [file name]

Please remember this is a mere summary of the structure of the command and does not mention all possible options. For more information about the grep utility's regular expression syntax, please see the Further reading section at the end of this chapter, as well as the man page for Perl regular expressions, which can be reached by executing the command man 3 pcresyntax. You can also learn more about regular expression by checking out the man page on POSIX.2 regular expressions, Kali Linux might not have the man page mentioned in the previous command. You can get the regex manual page using the command man 7 regex.

Building on this specification, let's look at some of the options in detail.

Regular expression matcher selection options

Part of the invocation of grep requires you to let grep know what method you would like to use to match your pattern with the contents of the file. This is because grep is capable of more than just running regular expressions.

The following are the options for matcher selection:

  • -E or –-extended-regexp: This interprets the PATTERN argument as an extended regular expression


    Extended regular expression language is pretty much what everyone uses today, but this wasn't always the case. Way back in Unix's heyday, regular expressions were represented using something called POSIX (Portable Operating System Interface) basic regular expression language. Some years later, Unix developers added some functionality to the regular expression language and a new standard for representing this new, more shorthand-laden language was created called the Extended Regular Expression (ERE) language standard.

  • -F or –-fixed-strings: This tells grep to interpret PATTERN as a list of fixed strings separated by newlines to look for in the given file list

    For example, the following screenshot shows the output of this command:

  • -P or –-perl-regexp: This allows grep to interpret PATTERN as a Perl regular expression

Regular expression matching control options

The following options allow you to control a little about how the data being matched should be treated, whether you'd like to match whole words in your input or whole lines or funnel in a number of patterns from a given file.

The following are the options for matching control:

  • -e PATTERN or –-regexp=PATTERN: This forces the PATTERN argument supplied here to be used as PATTERN to match against the input files.

    The following command is an example of the usage for the preceding option:

    cat /etc/passwd | grep –e '^root' 

    The preceding example matches the line that starts with the word root.

  • -f or –-file=FILE: This grabs a list of patterns to use from the supplied file.

    For example, consider a file containing the following text:


    This file can be used with the –f option as follows:

    grep –f patterns.txt < /etc/passwd
  • -v or –-invert-match: This inverts the matching, which means select or report only file contents that don't match.

  • -w or –-word-regexp: This report lines from the input files that have whole matching words.

    For example, see the output of the following commands:

    [email protected]:~# grep r –w < /etc/passwd
    [email protected]:~# grep ro –w < /etc/passwd
    [email protected]:~# grep root –w < /etc/passwd

    As you can see from the previous output, and maybe some of your own testing, the first two runs did not describe a complete word of the contents of the /etc/passwd file. However, the last run does; so it's the only one that actually produces output.

  • -x or –-line-regexp: This reports or prints lines from the input file that have whole lines matching.

Output control options

The grep utility also allows you to control how it reports information about successful matches. You can also specify which attributes of the matches to report on.

The following are the some of the output control options:

  • -c or –-count: This doesn't report on the matched data, instead prints the number of matches.

  • -L or –-files-without-match: This prints only the names of files that contain no matches.

  • -l or –-files-with-matches: This prints only the names of files that contain matches.

  • -m or –max-count=NUM: This stops processing input after NUM number of matches. If input comes from standard input or using an input redirection, the processing will stop after NUM lines are read.

  • -o or –only-matching: This prints the matching parts of the input data, each on a separate line.

File selection options

The following options allow you to specify where the input files should come from and also control some of the attributes of the input data as a whole.

The following are the options for the file selection:

  • -a or –-text: This forces binary files to be processed as text. This allows you to operate grep much like the strings utility, which returns all the printable strings from a given file with the added benefit of being able to match the strings using regular expressions.

    For example:

    grep 'printf' –m 1 –color –text `which echo`


    The which command

    The which command prints the canonical file path of the supplied argument. Here, it appears in back-ticks so that the bash shell will substitute this command for the value it produces, which effectively means grep will be running through the binary for the echo command.

    The output of the previous command is as shown in the following screenshot:

  • --binary-files=TYPE: This checks if a file supplied as input is a binary file. If yes, then it treats the file as the specified TYPE.

  • -D ACTION or –-devices=ACTION: This processes the input file as a device and uses the ACTION parameter to siphon input from it. By default, ACTION is read.

  • --exclude=GLOB: This skips any files whose name matches GLOB; wild cards are honored in the matching.

  • -R, –r, or –-recursive: This processes all the reachable file entries in nested directories from the current directory.

Well that's pretty much it as far as grep goes. Hopefully, you'll be able to make use of these options to find what you're looking for. It takes a little practice and getting used to but once mastered, grep is an invaluable utility.



In this chapter, we got to know some of the basics of the bash shell. We covered man pages, a very important resource for everyone, from seasoned system administrators and kernel developers to newbie penetration testers and security engineers. We also use powerful and efficient ways to find certain files using very descriptive attributes and regular expressions. We covered another very important tool called grep, which allowed us to make effective use of regular expressions to find files based on their content and also pinpoint them in fine detail.

The next chapter will focus on customizing your bash terminal and enabling powerful features to make using your terminal a more information-rich and convenient experience.


Further reading

The following references were accessed by the author on April 22, 2014:

About the Author

  • Keith Makan

    Keith Makan is the lead author of Android Security Cookbook, Packt Publishing. He is an avid computer security enthusiast and a passionate security researcher. Keith has published numerous vulnerabilities in Android applications, WordPress plugins, and popular browser security software such as Firefox's NoScript and Google Chrome's XSS Auditor. His research has also won him numerous listings on the Google Application Security Hall of Fame. Keith has been working as a professional security assessment specialist, penetration tester, and security advisory for over 2 years.

    Browse publications by this author

Latest Reviews

(2 reviews total)
Haven’t read yet. Easy to download.
Loads of information that makes the Shell a new place to work
Penetration Testing with the Bash shell
Unlock this book and the full library for FREE
Start free trial