You're reading from Python Digital Forensics Cookbook

Product typeBook

Published inSep 2017

Reading LevelIntermediate

PublisherPackt

ISBN-139781783987467

Edition1st Edition

Languages

Python

Concepts

Forensics

Authors (2):

Chapin Bryce

Preston Miller

View More author details

Hashing files and data streams

Recipe Difficulty: Easy

Python Version: 2.7 or 3.5

Operating System: Any

File hashes are a widely accepted identifier for determining file integrity and authenticity. While some algorithms have become vulnerable to collision attacks, the process is still important in the field. In this recipe, we will cover the process of hashing a string of characters and a stream of file content.

Getting started

All libraries used in this script are present in Python’s standard library. For generating hashes of files and other data sources, we implement the hashlib library. This built-in library has support for common algorithms, such as MD5, SHA-1, SHA-256, and more. As of the writing of this book, many tools still leverage the MD5 and SHA-1 algorithms, though the current recommendation is to use SHA-256 at a minimum. Alternatively, one could use multiple hashes of a file to further decrease the odds of a hash collision. While we'll showcase a few of these algorithms, there are other, less commonly used, algorithms available.

To learn more about the hashlib library, visit https://docs.python.org/3/library/hashlib.html.

How to do it…

We hash files with the following steps:

Print hashed filename using the specified input file and algorithm.
Print hashed file data using the specified input file and algorithm.

How it works…

To begin, we must import hashlib as shown in the following. For ease of use, we have defined a dictionary of algorithms that our script can use: MD5, SHA-1, SHA-256 and SHA-512. By updating this dictionary, we can support other hash functions that have update() and hexdigest() methods, including some from libraries other than hashlib:

from __future__ import print_function
import argparse
import hashlib
import os

__authors__ = ["Chapin Bryce", "Preston Miller"]
__date__ = 20170815
__description__ = "Script to hash a file's name and contents"

available_algorithms = {
    "md5": hashlib.md5,
    "sha1": hashlib.sha1,
    "sha256": hashlib.sha256,
    "sha512": hashlib.sha512
}

parser = argparse.ArgumentParser(
    description=__description__,
    epilog="Developed by {} on {}".format(", ".join(__authors__), __date__)
)
parser.add_argument("FILE_NAME", help="Path of file to hash")
parser.add_argument("ALGORITHM", help="Hash algorithm to use",
                    choices=sorted(available_algorithms.keys()))
args = parser.parse_args()

input_file = args.FILE_NAME
hash_alg = args.ALGORITHM

Notice how we define our hashing algorithm object using our dictionary and the argument provided at the command line, followed by open and close parentheses to initiate the object. This provides additional flexibility when adding new hashing algorithms.

With our hash algorithms defined, we now can hash the file's absolute path, a similar method employed during file naming for iTunes backups of an iOS device, by passing the string into the update() method. When we are ready to display the hex value of the calculated hash, we can call the hexdigest() method on our file_name object:

file_name = available_algorithms[hash_alg]()
abs_path = os.path.abspath(input_file)
file_name.update(abs_path.encode())

print("The {} of the filename is: {}".format(
    hash_alg, file_name.hexdigest()))

Let's move onto opening the file and hashing its contents. While we can read the entire file and pass it to the hash function, not all files are small enough to fit in memory. To ensure our code works on larger files, we will use the technique in the following example to read a file in a piecemeal fashion and hash it in chunks.

By opening the file as rb, we will ensure that we are reading the binary contents of the file, not the string content that may exist. With the file open, we will define the buffer size to read in content and then read the first chunk of data in.

Entering a while loop, we will update our hashing object with the new content for as long as there is content in the file. This is possible as the read() method allows us to pass an integer of the number of bytes to read and, if the integer is larger than the number of bytes remaining in the file, will simply pass us the remaining bytes.

Once the entire file is read, we call the hexdigest() method of our object to display the file hash to the examiner:

file_content = available_algorithms[hash_alg]()
with open(input_file, 'rb') as open_file:
    buff_size = 1024
    buff = open_file.read(buff_size)

    while buff:
        file_content.update(buff)
        buff = open_file.read(buff_size)

print("The {} of the content is: {}".format(
    hash_alg, file_content.hexdigest()))

When we execute the code, we see the output from the two print statements revealing the hash value of the file's absolute path and content. We can generate additional hashes for the file by changing the algorithm at the command line:

There's more…

This script can be further improved. Here's a recommendation:

Add support for additional hashing algorithms and create the appropriate entry within the available_algorithms global variable

You have been reading a chapter from

Python Digital Forensics Cookbook

Published in: Sep 2017Publisher: PacktISBN-13: 9781783987467

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Authors (2)

Chapin Bryce

Chapin Bryce is a senior consultant at a global firm that is a leader in digital forensics and incident response investigations. After graduating from Champlain College, with a bachelor's degree in computer and digital forensics, Chapin dove into the field of digital forensics and incident response joining the GIAC advisory board and earning four GIAC certifications: GCIH, GCFE, GCFA, and GNFA. As a member of multiple ongoing research and development projects, he has authored several books and articles in professional and academic publications, including Python Digital Forensics Cookbook (2018 Digital Forensics Book of the Year, Forensic 4Cast), Learning Python for Forensics First Edition, and Digital Forensic Magazine.
Read more about Chapin Bryce

Preston Miller

Preston Miller is a consultant at an internationally recognized risk management firm. Preston holds an undergraduate degree from Vassar College and a master's degree in digital forensics from Marshall University. While at Marshall, Preston unanimously received the prestigious J. Edgar Hoover Foundation's scientific scholarship. Preston is a published author, recently of Python Digital Forensics Cookbook, which won the Forensic 4:cast Digital Forensics Book of the Year award in 2018. Preston is a member of the GIAC advisory board and holds multiple industry-recognized certifications in his field.
Read more about Preston Miller

Personalised recommendations for you

Based on your interests and search pattern

Attacking and Exploiting Modern Web Applications

Attacking and Exploiting Modern Web Attacks will help you understand how to identify attack surfaces and detect vulnerabilities. This book takes a hands-on approach to implementation and associated methodologies and equips you with the knowledge and skills needed to effectively combat web attacks.

BookAug 2023338 pages

Automotive Cybersecurity Engineering Handbook

This Automotive Cybersecurity Engineering Handbook untangles the complexities of building secure automotive products and helps you to comply with cybersecurity standards. It provides practical tools, tips, and techniques coupled with real-world examples to enable you to create cyber-resilient automotive products with ease.

BookOct 2023392 pages

Official Google Cloud Certified Professional Cloud Security Engineer Exam Guide

This book will help you to design, develop, and operate security controls on Google Cloud as well as discover best practices for relevant security domains, including identity and access management, network, and data.

BookAug 2023496 pages

Cloud Penetration Testing for Red Teamers

The advent of cloud networks and the AWS, Azure, and GCP platforms has revolutionized how companies of all sizes in all industries do business online. This book will help you meet the emerging demand for pentesting as it guides you through the tools, techniques, and security measures used by pentesters and red teamers in the 2020s and beyond.

BookNov 2023298 pages

ISACA Certified in Risk and Information Systems Control (CRISC®) Exam Guide

ISACA Certified in Risk and Information Systems Control (CRISC®) Certification Guide is an enterprise IT risk management professional’s dream. With its in-depth approach and various self-assessment exercises, this book arms you with knowledge of every single aspect of the certification, and is a fantastic career companion after you’re certified.

BookSep 2023316 pages5

Burp Suite Cookbook

Burp Suite is an immensely powerful and popular tool for web application security testing. This book provides a collection of recipes that address vulnerabilities in web applications and APIs. It offers guidance on how to configure Burp Suite, make the most of its tools, and explore into its extensions.

BookOct 2023450 pages

Building and Automating Penetration Testing Labs in the Cloud

This hands-on guide will help you design and build a variety of penetration testing labs that mimic modern cloud environments running on AWS, Azure, and GCP. In addition to these, you will explore a number of practical strategies on how to manage the complexity, cost, and security risks involved when setting up vulnerable cloud lab environments.

BookOct 2023562 pages

Ethical Hacking Workshop

As cyber-attacks grow and APT groups advance their skillset, you need to be able to protect your enterprise against cyber-attacks. In order to limit your attack surface, you need to ensure that you leverage the same skills and tools that an adversary may use to hack your environment and discover the security gaps. This book will teach you how to think like a hacker, use state-of-the-art hacking tools, and protect yourself and your organizaiton.

BookOct 2023220 pages

Windows Forensics Analyst Field Guide

This book contains step-by-step processes to guide you in any investigation related to Windows OS. You’ll find out how to acquire evidence using multiple tools as well as examine and analyze the collected artifacts, while discovering multiple techniques used in real-world forensic incidents.

BookOct 2023318 pages

Implementing DevSecOps Practices

This book is a comprehensive, hands-on guide for individuals new to DevSecOps who want to implement DevSecOps practices successfully and efficiently. With its help, you’ll be able to shift security toward the left, enabling you to merge security into coding in no time.

BookDec 2023258 pages