In this first chapter, we will introduce reverse engineering and explain what it is for. We will begin by discussing some insights already being applied in various aspects that will help the reader understand what reverse engineering is. In this chapter, we will cover a brief introduction to the process and types of tools used in software reverse engineering. There are tips given here on the proper handling of malware. The last section of this chapter shows how easy it is to set up our initial analysis environment using tools that are readily available for download. The following topics will be covered:
- What reverse engineering is used for
- Applying reverse engineering
- Types of tools used in reverse engineering
- Guide to handling malware
- Setting up your reverse engineering environment
Breaking something down and putting it back together is a process that helps people understand how things were made. A person would be able to redo and reproduce an origami by unfolding it first. Knowing how cars work requires understanding each major and minor mechanical part and their purposes. The complex nature of the human anatomy requires people to understand each and every part of the body. How? By dissecting it. Reverse engineering is a way for us to understand how things were designed, why is it in its state, when it triggers, how it works, and what its purpose is. In effect, the information is used to redesign and improve for better performance and cost. It can even help fix defects.
However, reverse engineering entails ethical issues and is still a continuous debate. Similar to Frankenstein's case, there are existing issues that defy natural laws in a way that is not acceptable to humanity. Today, simple redesigning can raise copyright infringement if not thought through carefully. Some countries and states have laws governing against reverse engineering. However, in the software security industry, reverse engineering is a must and a common use case.
Imagine if the Trojan Horse was thoroughly inspected and torn down before it was allowed to enter the gates of a city. This would probably cause a few dead soldiers outside the gate fighting for the city. The next time the city is sent another Trojan Horse, archers would know where to point their arrows. And no dead soldiers this time. The same is true for malware analysis—by knowing the behaviors of a certain malware through reverse engineering, the analyst can recommend various safeguards for the network. Think of it as the Trojan Horse being the malware, the analyst being the soldier who initially inspected the horse, and the city being the network of computers.
Anyone seeking to become a reverse engineer or an analyst should have the trait of being resourceful. Searching the internet is part of reverse engineering. An analyst would not plainly rely on the tools and information we provide in this book. There are instances that an analysis would even require reverse engineer to develop their own tools.
Software auditing may require reverse engineering. Besides high-level code review processes, some software quality verification also involves implementing reverse engineering. The aim of these test activities is to ensure that vulnerabilities are found and fixed. There are a lot of factors that are not taken into consideration during the design and development of a piece of software. Most of these are random input and external factors that may cause leaks, leading to vulnerabilities. These vulnerabilities may be used for malicious intents that not only disrupt the software, but may cause damage and compromise the system environment it is installed in. System monitoring and fuzzing tools are commonly used when testing software. Today's operating systems have better safeguards to protect from crashing. Operating systems usually report any discrepancies found, such as memory or file corruption. Additional information, such as crash dumps, are also provided. From this information, a reverse engineer would be able to pinpoint where exactly in the software they have to inspect.
In the software security industry, one of the core skills required is reverse engineering. Every attack, usually in the form of malware, is reversed and analyzed. The first thing that is usually needed is to clean the network and systems from being compromised. An analyst determines how the malware installed itself and became persistent. Then, they develop steps for uninstalling the malware. In the anti-malware phase, these steps are used to develop the clean-up routine, once the anti-malware product is able to detect that the system has been compromised.
Some administrators are even advised to restructure their network infrastructure. Once a system gets compromised, the attackers may already have got all of the information about the network, and would easily be able to make another wave of the same attack. Making major changes will greatly help prevent the same attack from happening again.
Part of restructuring the infrastructure is education. The best way to prevent a system from being compromised is by educating its users about securing information, including their privacy. Knowing about social engineering and having experience of previous attacks makes users aware of security. It is important to know how attackers are able to compromise an institution and what damage they can cause. As a result, security policies are imposed, backups are set up, and continuous learning is implemented.
Going further, targeted companies can report the attack to authorities. Even a small piece of information can give authorities hints to help them hunt down the suspects and shut down malware communication servers.
Systems can be compromised by taking advantage of software vulnerabilities. After the attacker gets knowledge about the target, the attacker can craft code that exploits known software vulnerabilities. Besides making changes in the infrastructure, any software used should also be kept up to date with security features and patches. Reverse engineering is also needed to find vulnerable code. This helps pinpoint the vulnerable code by backtracking it to the source.
All of these activities are done based on the output of reverse engineering. The information gathered from reverse engineering affects how the infrastructure needs to be restructured.
We will work in an environment that will make use of virtualization software. It is recommended that we have a physical machine with virtualization enabled and a processor with at least four cores, 4 GB of RAM, and 250 GB of disk space. Pre-install this physical machine with either the Windows or Linux operating system.
We will be using VirtualBox in our setup. The host operating system version of Windows or Linux will depend on the requirements of VirtualBox. See the latest version of VirtualBox at https://www.virtualbox.org/ and look for the recommended requirements.
You may need to download virtual machines from Microsoft in advance, as these may take some time to download. See the developers' page at https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/. Windows 10 can be downloaded from the following link: https://www.microsoft.com/en-us/software-download/windows10
Ethics requires anyone carrying out reverse engineering of software to have approval from the owner of the software. However, there are a lot of instances where software shows its bugs upfront, while the operating system reports it. Some companies are more lenient about their software getting reversed without approval, but it is customary today that any vulnerabilities found should be reported directly to the owner and not publicized. It is up to the owner to decide when to report the vulnerability to the community. This prevents attackers from using a vulnerability before a software patch gets released.
It is a different story when malware or hacking is involved. Of course, reversing malware doesn't need approval from the malware author. Rather, one of the goals of malware analysis is to catch the author. If not sure, always consult a lawyer or a company's legal department.
Without any execution, viewing the file's binary and parsing each and every byte provides much of the information needed to continue further. Simply knowing the type of file sets the mindset of the analyst in a way that helps them to prepare specific sets of tools and references that may be used. Searching text strings can also give clues about the author of the program, where it came from, and, most likely, what it does.
This type of analysis is where the the object being analyzed gets executed. It requires an enclosed environment so that behaviors that may compromise production systems do not happen. Setting up enclosed environments are usually done using virtual machines, since they can then easily be controlled. Tools that monitor and log common environment actions are implemented during dynamic analysis.
There is some information that may be missed out during static and dynamic analyses. The flow of a program follows a path that depends of certain conditions. For example, a program will only create a file only if a specific process is running. Or, a program will create a registry entry in the
Wow6432Node key only if it were running in a 64-bit Windows operating system. Debugging tools are usually used to analyze a program in low-level analysis.
While doing analysis, every piece of information should be collected and documented. It is common practice to document a reverse engineered object to help future analysis. An analysis serves as a knowledge base for developers who want to secure their upcoming programs from flaws. For example, a simple input can now be secured by placing bounds validation, which is known about as a result of a prior reverse-engineered program that indicated possible buffer overflow.
- How a reversed engineered object works
- When specific behavior triggers
- Why specific codes were used in the program
- Where it was intended to work on
- What the whole program does
Doing reverse code engineering starts off with understanding the meaning of every bit and byte. Simply viewing the bytes contained requires developing tools that aid in the reading of files and objects. Parsing and adding meaning to every byte would require another tool. Reverse engineering has evolved with tools that are continuously updated when encountering new software technology. Here, we have categorized these tools into binary analysis tools, disassemblers, decompilers, debuggers, and monitoring tools.
Binary analysis tools are used to parse binary files and extract information about the file. An analyst would be able to identify which applications are able to read or execute the binary. File types are generally identified from their magic header bytes. These Magic Header bytes are usually located at the beginning of a file. For example, a Microsoft executable file, an
EXE file, begin with the MZ header (MZ is believed to be the initials of Mark Zbikowski, a developer from Microsoft during the DOS days). Microsoft Office Word documents, on the other hand, have these first four bytes as their Magic Header:
The hexadecimal bytes in the preceding screenshot read as
DOCFILE Other information such as text string also give hints. The following screenshot shows information indicating that the program was most likely built using Window Forms:
Disassemblers are used to view the low-level code of a program. Reading low-level code requires knowledge of assembly language. Analysis done with a disassembler gives information about the execution conditions and system interactions that a program will carry out when executed. However, the highlights when reading low-level code are when the program uses Application Program Interface (API) functions. The following screenshot shows a code snippet of a program module that uses the
GetJob() API. This API is used to get information about the printer job, as shown here:
Disassemblers can show the code tree, but the analyst can verify which branch the code flows to by using a debugger. A debugger does actual execution per line of code. The analyst can trace through codes such as loops, conditional statements, and API execution. Since debuggers are categorized under dynamic analysis and perform a step-wise execution of code, debugging is done in an enclosed environment. Various file types have different disassemblers. In a .NET compiled executable, it is best to instead disassemble the p-code and work out what each operator means.
Monitoring tools are used to monitor system behaviors regarding file, registry, memory, and network. These tools usually tap or hook on APIs or system calls, then log information such as newly created processes, updated files, new registry entries, and incoming SMB packets are generated by reporting tools.
Decompilers are similar to disassemblers. They are tools that attempt to restore the high-level source code of program unlike disassemblers that attempt to restore the low-level (assembly language) source code of a program.
These tools work hand in hand with each other. The logs generated from monitoring tools can be used to trace the actual code from the disassembled program. The same applies when debugging, where the analyst can see the overview of the low-level code from the disassembly, while being able to predict where to place breakpoints based on the monitoring tools' logs.
- Do your analysis in an enclosed environment such as a separate computer or in a virtual machine.
- If network access is not required, cut it off.
- If internet access is not required, cut it off.
- When copying files manually, rename the file to a filename that doesn't execute. For example, rename
A typical setup would require a system that can run malware without it being compromised externally. However, there are instances that may require external information from the internet. For starters, we're going to mimic an environment of a home user. Our setup will, as much as possible, use free and open source tools. The following diagram shows an ideal analysis environment setup:
The sandbox environment here is where we do analysis of a file.
MITM, mentioned on the right of the diagram, means the man in the middle environment, which is where we monitor incoming and outgoing network activities. The sandbox should be restored to its original state. This means that after every use, we should be able to revert or restore its unmodified state. The easiest way to set this up is to use virtualization technology, since it will then be easy to revert to cloned images. There are many virtualization programs to choose from, including VMware, VirtualBox, Virtual PC, and Bochs.
It should also be noted that there is software that can detect that it is being run, and doesn't like to be run in a virtualized environment. A physical machine setup may be needed for this case. Disk management software that can store images or re-image disks would be the best solution for us here. These programs include Fog, Clonezilla, DeepFreeze, and HDClone.
In our setup, we will be using VirtualBox, which can be downloaded from https://www.virtualbox.org/. The Windows OS we will be using is Windows 7 32-bit, which can be downloaded from https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/. In the following diagram, the system, which has an internet connection, is installed with two virtual machines, a guest sandbox and guest MITM:
- The image downloaded from the Microsoft website is zipped and should be extracted. In VirtualBox, click on
File|Import Appliance. You should be shown a dialog where we can import the Windows 7 32-bit image.
- Simply browse and select the OVA file that was extracted from the ZIP archive, then click on
Next, as shown here:
- Before continuing, the settings can be changed. The default RAM is set to 4096 MB. The more RAM allocated and the higher the number of CPU cores set, the better performance will be noticed when running or debugging. However, the more RAM added, the same amount of disk space gets consumed when storing snapshots of the image. This means that if we allocated 1 GB of RAM, creating a snapshot will also consume at least 1GB of disk space. We set our RAM to 2048 MB, which would be a reasonable amount for us to work on:
- Click on
Importand it should start generating the virtual disk image. Once it has completed, we need to create our first snapshot. It is recommended to create a snapshot in a powered-off state, since the amount of disk space consumed is minimal. Look for the
SnapShotstab, then click on
Take. Fill out the
Snapshot Descriptionfields, then click on the
OKbutton. This quickly creates your first snapshot.
In a power-on state, the amount of RAM plus the amount of modified disk space in the virtual machine is equal to the total disk space that a snapshot will consume.
- Click on
Startto begin running the Windows 7 image. You should end up with the following window. In case it asks for a password, the default password is
At this point, the network setup is set to NAT. This means that any network resources required by the virtual machine will use the host computer's IP address. The IP address of the virtual machine is taken from the VirtualBox's virtual DHCP service. Remember that any network communication in the virtual machine makes use of the host computer's IP address.
Since we can't prevent a certain malware from sending out information to the web in order to return information back to our virtual machine, it is important to note that some ISPs may monitor common malware behavior. It would be best to review your contract with them and make a call if needed.
Most of our reverse engineering deals with malware and, as of the time of writing, attackers usually target Windows systems. Our setup uses Microsoft Windows 7 32-bit. Feel free to use other versions. We recommend installing the 32-bit version of Microsoft Windows, as it will be easier to track virtual and physical addresses later on during low-level debugging.
We will be building our own programs to validate and understand how the low-level code behaves and what it looks like. The following list outlines the software we will be using to build our programs:
If you are interested in malware, the samples can be obtained from the following sites:
Reverse engineering has been around for years and has been a useful technique to understand how things work. In the software industry, reverse engineering helps validate and fix code flow and structures. The information from such tasks can improve the security of various aspects of software, network infrastructure, and human awareness. As a core skill requirement for the anti-malware industry, reverse engineering helps create detection and remediation information; the same information that is used to build safeguards for an institution's servers. It is also used by authorities and forensic experts to hunt down syndicates.
There are basic steps that help build reverse engineering information. Once an analyst has approval from the original author to carry out reverse engineering, they can begin with static analysis, dynamic analysis, and then low-level analysis. This is then followed by reporting the overview and details about the software.
When doing analysis, various types of tools are used, including static analysis tools, disassemblers, decompilers, debuggers, and system monitoring tools. When doing reverse engineering on malware, it is best to use these tools in an environment that has limited or no access to the network you use for personal purposes or work. This should prevent your infrastructure from being compromised. Malware should be handled properly, and we listed a couple of ways to prevent accidental double-clicks.
Malware analysis nonetheless requires the internet to get further information on how the malware works and what it does. There may be some legal issues that require you to consult the laws of your country and the policies of your local ISP, to ensure that you are not violating any of them.
The core requirement for the setup of an analysis lab is that the target operating system can be reverted back to its unmodified state.
Malware samples can be obtained from the following link: https://github.com/PacktPublishing/Mastering-Reverse-Engineering/tree/master/tools. These samples will be used throughout this book.
Now that we have our basic setup, let's embark on our journey through reverse engineering.