Chapter 8. Sandboxing - Virtualization as a Component for RE
In previous chapters, we have used virtualization software, in particular, VirtualBox or VMware, to set up our Linux and Windows environments to conduct analysis. virtualization worked fine since these virtualization software only support x86 architecture. Virtualization is a very useful component of reverse engineering. In fact, most software is built under x86 architecture. Virtualization uses the resources of the host machine's CPU via the hypervisor.
Unfortunately, there are other CPU architectures out there that doesn't support virtualization. VirtualBox nor VMware doesn't support these architectures. What if we were given a non-x86 executable to work with? And all we have is an operating system installed in an x86 machine. Well, this should not stop us from doing reverse engineering.
To work around this issue, we will be using emulators. Emulators have been around long before the hypervisor was even introduced. Emulators...
The beauty of emulation is that it can fool the operating system into thinking that it is running on a certain CPU architecture. The drawback is noticeably slow performance, since almost every instruction is interpreted. To explain CPUs briefly, there are two CPU architecture designs: Complex Instruction Set Computing (CISC) and Reduced Instruction Set Computing (RISC). In assembly programming, CISC would only require a few instructions. For example, a single arithmetic instruction, such as MUL, executes lower-level instructions in it. In RISC, a low-level program should be carefully optimized. In effect, CISC has the advantage of requiring less memory space, but a single instruction would require more time to execute. On the other hand, RISC has better performance, since it executes instructions in a simplistic way. However, if a code is not properly optimized, programs built for RISC may not perform as fast as they should and may consume space. High-level compilers should have...
Analysis in unfamiliar environments
Here, the reverse engineering concepts are the same. However, the availability of tools is limited. Static analysis can still be done under an x86 environment, but when we need to execute the file, it would require sandbox emulation.
It is still best to debug native executables locally in the emulated environment. But, if local debugging is slim, one alternative way is to do remote debugging. For Windows, the most popular remote debugging tools are Windbg and IDA Pro. For Linux, we usually use GDB.
Analyzing ARM-compiled executables is not far from the process that we perform with x86 executables. We follow the same steps as we did with x86:
- Study the ARM low-level language
- Do deadlisiting using disassembly tools
- Debug the program in the operating system environment
Studying the ARM low-level language is done in the same way that we studied x86 instructions. We just need to understand the memory address space, general purpose registers, special registers, stack...
In this chapter, we have learned that, even if the file is not a Windows or a Linux x86-native executable, we can still analyze a non-x86 executable file. With static analysis alone, we can analyze a file without even doing dynamic analysis, although we still need references to understand the low-level language of non-x86 architectures, categorized as RISC or CISC. Just as we learned x86 assembly language, languages such as ARM assembly can be learned with the same concepts.
However, an analysis can still be proven with actual code execution, using dynamic analysis. To do that, we need to set up the environment where the executable will run natively. We introduced an emulation tool called QEMU that can do the job for us. It has quite a number of architectures that it can support, including ARM. Today, one of the most popular operating system using ARM architecture is Arch Linux. This operating system is commonly deployed by Raspberry Pi enthusiasts.
We also learned about debugging...