Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Python Digital Forensics Cookbook

You're reading from  Python Digital Forensics Cookbook

Product type Book
Published in Sep 2017
Publisher Packt
ISBN-13 9781783987467
Pages 412 pages
Edition 1st Edition
Languages
Concepts
Authors (2):
Chapin Bryce Chapin Bryce
Profile icon Chapin Bryce
Preston Miller Preston Miller
Profile icon Preston Miller
View More author details

Table of Contents (11) Chapters

Preface 1. Essential Scripting and File Information Recipes 2. Creating Artifact Report Recipes 3. A Deep Dive into Mobile Forensic Recipes 4. Extracting Embedded Metadata Recipes 5. Networking and Indicators of Compromise Recipes 6. Reading Emails and Taking Names Recipes 7. Log-Based Artifact Recipes 8. Working with Forensic Evidence Container Recipes 9. Exploring Windows Forensic Artifacts Recipes - Part I 10. Exploring Windows Forensic Artifacts Recipes - Part II

Iterating over loose files

Recipe Difficulty: Easy

Python Version: 2.7 or 3.5

Operating System: Any

Often it is necessary to iterate over a directory and its subdirectories to recursively process all files. In this recipe, we will illustrate how to use Python to walk through directories and access files within them. Understanding how you can recursively navigate a given input directory is key as we frequently perform this exercise in our scripts.

Getting started

All libraries used in this script are present in Python's standard library. The preferred library, in most situations, for handling file and folder iteration is the built-in os library. While this library supports many useful operations, we will focus on the os.path() and os.walk() functions. Let’s use the following folder hierarchy as an example to demonstrate how directory iteration works in Python:

SecretDocs/
|-- key.txt
|-- Plans
| |-- plans_0012b.txt
| |-- plans_0016.txt
| `-- Successful_Plans
| |-- plan_0001.txt
| |-- plan_0427.txt
| `-- plan_0630.txt
|-- Spreadsheets
| |-- costs.csv
| `-- profit.csv
`-- Team
|-- Contact18.vcf
|-- Contact1.vcf
`-- Contact6.vcf

4 directories, 11 files

How to do it…

The following steps are performed in this recipe:

  1. Create a positional argument for the input directory to scan.
  2. Iterate over all subdirectories and print file paths to the console.

How it works…

We create a very basic argument handler that accepts one positional input, DIR_PATH, the path of the input directory to iterate. As an example, we will use the ~/Desktop path, the parent of SecretDocs, as the input argument for the script. We parse the command-line arguments and assign the input directory to a local variable. We’re now ready to begin iterating over this input directory:

from __future__ import print_function
import argparse
import os

__authors__ = ["Chapin Bryce", "Preston Miller"]
__date__ = 20170815
__description__ = "Directory tree walker"

parser = argparse.ArgumentParser(
description=__description__,
epilog="Developed by {} on {}".format(
", ".join(__authors__), __date__)
)
parser.add_argument("DIR_PATH", help="Path to directory")
args = parser.parse_args()
path_to_scan = args.DIR_PATH

To iterate over a directory, we need to provide a string representing its path to os.walk(). This method returns three objects in each iteration, which we have captured in the root, directories, and files variables:

  • root: This value provides the relative path to the current directory as a string. Using the example directory structure, root would start as SecretDocs and eventually become SecretDocs/Team and SecretDocs/Plans/SuccessfulPlans.
  • directories: This value is a list of sub-directories located within the current root location. We can iterate through this list of directories, although the entries in this list will become part of the root value during successive os.walk() calls. For this reason, the value is not frequently used.
  • files: This value is a list of files in the current root location.
Be careful in naming the directory and file variables. In Python the dir and file names are reserved for other uses and should not be used as variable names.
# Iterate over the path_to_scan
for root, directories, files in os.walk(path_to_scan):

It is common to create a second for loop, as shown in the following code, to step through each of the files located in that directory and perform some action on them. Using the os.path.join() method, we can join the root and file_entry variables to obtain the file’s path. We then print this file path to the console. We may also, for example, append this file path to a list that we later iterate over to process each of the files:

    # Iterate over the files in the current "root"
for file_entry in files:
# create the relative path to the file
file_path = os.path.join(root, file_entry)
print(file_path)
We can also use root + os.sep() + file_entry to achieve the same effect, but it is not as Pythonic as the method we're using to join paths. Using os.path.join(), we can pass two or more strings to form a single path, such as directories, subdirectories, and files.

When we run the preceding script with our example input directory, we see the following output:

As seen, the os.walk() method iterates through a directory, then will descend into any discovered sub-directories, thereby scanning the entire directory tree.

There's more…

This script can be further improved. Here's a recommendation:

  • Check out and implement similar functionality using the glob library which, unlike the os module, allows for wildcard pattern recursive searches for files and directories
You have been reading a chapter from
Python Digital Forensics Cookbook
Published in: Sep 2017 Publisher: Packt ISBN-13: 9781783987467
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}