Chapter 2. Forensic Algorithms
Forensic algorithms are the building blocks for a forensic investigator. Independent from any specific implementation, these algorithms describe the details of the forensic procedures. In the first section of this chapter, we will introduce the different algorithms that are used in forensic investigations, including their advantages and disadvantages.
In this section, we describe the main differences between MD5, SHA256, and SSDEEP—the most common algorithms used in the forensic investigations. We will explain the use cases as well as the limitations and threats behind these three algorithms. This should help you understand why using SHA256 is better than using MD5 and in which cases SSDEEP can help you in the investigation.
Before we dive into the different hash functions, we will give a short summary of what a cryptographic hash function is.
A hash function is a function that maps an arbitrarily large amount of data to a value of a fixed length. The hash function ensures that the same input always results in the same output, called the hash sum. Consequently, a hash sum is a characteristic of a specific piece of data.
A cryptographic hash function is a hash function that is considered practically impossible to invert. This means that it is not possible to create the input data while having a pre-defined hash sum value by any...
Supporting the chain of custody
The outcomes of forensic investigations can have a severe impact on organizations and individuals. Depending on your field of work, your investigation can become evidence in the court.
Consequently, the integrity of forensic evidence has to be ensured not just when collecting the evidence, but also throughout the entire handling and analysis. Usually, the very first step in a forensic investigation is gathering the evidence. Normally, this is done using a bitwise copy of the original media. All the subsequent analysis is performed on this forensic copy.
Creating hash sums of full disk images
To ensure that a forensic copy is actually identical to the original media, hash sums of the media and from the forensic copy are made. These hash sums must match to prove that the copy is exactly like the original data. Nowadays, it has become common to use at least two different cryptographic hash algorithms to minimize the risk of hash collisions and harden the overall...
This section will demonstrate some use cases where the preceding algorithms and techniques are used to support the investigator. For this chapter, we use two very common and interesting examples, Mobile Malware and the National Software Reference Library (NSRL).
In this example, we will check the installed applications on an Android smartphone against an online analysis system, Mobile-Sandbox. Mobile-Sandbox is a website that provides free Android files checking for viruses or suspicious behavior, http://www.mobilesandbox.org. It is connected to VirusTotal, which uses up to 56 different antivirus products and scan engines to check for viruses that the user's antivirus solution may have missed or verify against any false positives. Additionally, Mobile-Sandbox uses custom techniques to detect applications that act potentially malicious. Antivirus software vendors, developers, and researchers behind Mobile-Sandbox can receive copies of the files to help in...
This chapter provided an overview of the domains of the forensic and example algorithms for each of these domains. We also showed you how to compare applications installed on an Android device with web services such as Mobile-Sandbox. In the second real-world example, we demonstrated how to sort out benign and known files from a Windows system to reduce the amount of data that is to be analyzed manually. With NSRLquery, the forensic investigations can focus on new or modified content and do not need to waste time on the widely known content of standard applications.
In the following chapters, these algorithms will be applied to a selection of device types, operating systems, and applications for use during forensic investigation.