Windows APT Warfare

From Source to Binaries – The Journey of a C Program

In this chapter, we will learn the basics of how compilers package EXE binaries from C code and techniques for system processes to execute. These basic concepts will build your understanding of how Windows compiles C into programs and links them across system components. You will also understand the program structure and workflow that malware analysis and evasion detection should follow.

In this chapter, we’re going to cover the following main topics:

The simplest Windows program in C
C compiler – assembly code generation
Assembler – transforming assembly code into machine code
Compiling code
Windows linker – packing binary data into Portable Executable (PE) format
Running compiled PE executable files as dynamic processes

C compiler – assembly code generation

What is intriguing to understand in the previous section is the reason the compiler understands this C code. First, the main task for the compiler is to convert the C code into assembly code according to the C/C++ calling convention, as shown in Figure 1.1:

Figure 1.1 – x86 calling convention

Important note

For convenience and practicability, the following examples will be presented with x86 instructions. However, the methods and principles described in this book are common to all Windows systems, and the compiler examples are based on GNU Compiler Collection (GCC) for Windows (MinGW).

As different system functions (and even third-party modules) have the expected in-memory access to the memory level of the assembly code, there are several mainstream application binary interface (ABI) calling conventions for ease of management. Interested readers can refer to Argument Passing and Naming Conventions by Microsoft (https://docs.microsoft.com/en-us/cpp/cpp/argument-passing-and-naming-conventions).

These calling conventions mainly deal with several issues:

The position where the parameters are placed in order (e.g., on a stack, in a register such as ECX, or mixed to speed up performance)
The memory space occupied by parameters if parameters are need to be stored
The occupied memory to be released by the caller or callee

When the compiler generates the assembly code, it will recognize the calling conventions of the system, arrange the parameters in memory according to its preference, and then call the memory address of the function with the call command. Therefore, when the thread jumps into the system instruction, it can correctly obtain the function parameter at its expected memory address.

Take Figure 1.1 as an example: we know that the USER32!MessageBoxA function prefers WINAPI calling conventions. In this calling convention, the parameter content is pushed into the stack from right to left, and the memory released for this calling convention is chosen by the callee. So after pushing 4 parameters into the stack to occupy 16 bytes in the stack (sizeof(uint32_t) x 4), it will be executed in USER32!MessageBoxA. After executing the function request, return to the next line of the Call MessageBoxA instruction with ret 0x10 and release 16 bytes of memory space from the stack (i.e., xor eax, eax).

Important note

The book here only focuses on the process of how the compiler generates single-chip instructions and encapsulates the program into an executable file. It does not include the important parts of advanced compiler theory, such as semantic tree generation and compiler optimization. These parts are reserved for readers to explore for further learning.

In this section, we learned about the C/C++ calling convention, how parameters are placed in memory in order, and how memory space is released when the program is finished.

Assembler – transforming assembly code into machine code

At this moment, you may notice that something is not quite right. The processor chips we use every day are not capable of executing text-based assembly code but are instead parsed into the machine code of the corresponding instruction set to perform the corresponding memory operations. Thus, during the compiling process, the previously mentioned assembly code is converted by the assembler into the machine code that can be understood by the chip.

Figure 1.2 shows the dynamic memory distribution of the 32-bit PE:

Figure 1.2 – 32-bit PE memory layout

Since the chip cannot directly parse strings such as hi there or info, data (such as global variables, static strings, global arrays, etc.) is first stored in a separate structure called a section. Each section is created with an offset address where it is expected to be placed. If the code later needs to extract resources identified during these compilation periods, the corresponding data can be obtained from the corresponding offset addresses. Here is an example:

The aforementioned info string can be expressed as \x69\x6E\x66\x6F\x00 in ASCII code (5 bytes in total with null at the end of the string). The binary data of this string can be stored at the beginning of the .rdata section. Similarly, the hi there string can be stored closely after the previous string at the address of the .rdata section at offset +5.
In fact, the aforementioned call MessageBoxA API is not understood by the chip. Therefore, the compiler will generate an Import Address Table struct, which is the .idata section, to hold the address of the system function that the current program wants to call. When needed by the program, the corresponding function address can be extracted from this table, enabling the thread to jump to the function address and continue executing the system function.
Generally speaking, it is the compiler’s practice to store the code content in the .text section.
Each individual running process does not have just one PE module. Either *.EXE or *.DLL mounted in the process memory is packaged in PE format.
In practice, each module loaded into the memory must be assigned an image base address to hold all contents of the module. In the case of a 32-bit *.EXE, the image base address would normally be 0x400000.
The absolute address of each piece of data in the dynamic memory will be the image base address of this module + the section offset + the offset of the data on the section. Take the 0x400000 image base address as an example. If we want to get the info string, the expected content will be placed at 0x402000 (0x400000 + 0x2000 + 0x00). Similarly, hi there would be at 0x402005, and the MessageBoxA pointer would be stored at 0x403018.

Important note

There is no guarantee that the compiler will generate .text, .rdata, and .idata sections in practice, or that their respective uses will be for these functions. Most compilers follow the previously mentioned principles to allocate memory. Visual Studio compilers, for example, do not produce executable programs with .idata sections to hold function pointer tables, but rather, in readable and writable .rdata sections.

What is here is only a rough understanding of the properties of block and absolute addressing in the dynamic memory; it is not necessary to be obsessed with understanding the content, attributes, and how to fill them correctly in practice. The following chapters will explain the meaning of each structure in detail and how to design it by yourself.

In this section, we learned about the transformation to machine code operations during program execution, as well as the various sections and offsets of data stored in memory that can be accessed later in the compiling process.

Compiling code

As mentioned earlier, if the code contains chip-incomprehensible strings or text-based functions, the compiler must first convert them to absolute addresses that the chip can understand and then store them in separate sections. It is also necessary to translate the textual script into native code or machine code that the chip can recognize. How does this work in practice?

In the case of Windows x86, the instructions executed on the assembly code are translated according to the x86 instruction set. The textual instructions are translated and encoded into machine code that the chip understands. Interested readers can search for x86 Instruction Set on Google to find the full instruction table or even encode it manually without relying on a compiler.

Once the compiler has completed the aforementioned block packaging, the next stage is to extract and encode the textual instructions from the script, one by one, according to the x86 instruction set, and write them into the .text section that is used to store the machine code.

As shown in Figure 1.3, the dashed box is the assembly code in the text type obtained from compiling the C/C++ code:

Figure 1.3 – Native code generation

You can see the first instruction is push 0, which pushes 1 byte of data onto the stack (saved as 4 bytes), and 6A 00 is used to represent this instruction. The push 0x402005 instruction pushes 4 bytes onto the stack at once, so push 68 50 20 40 00 is used to achieve a longer push. call ds:[0x403018] is the address of the 4 bytes, and the long call of machine code, FF 15 18 30 40 00, is used to represent this instruction.

Although Figure 1.3 shows the memory distribution of the dynamic msgbox.exe file, the file produced by the compiler is not yet an executable PE file. Rather, it is a file called a Common Object File Format (COFF) or an object file, as some people call it, which is a wrapper file specifically designed to record the various sections produced by the compiler. The following figure shows the COFF file obtained by compiling and assembling the source code with the gcc -c command, and viewing its structure with a well-known tool, PEview.

As shown in Figure 1.4, there is an IMAGE_FILE_HEADER structure at the beginning of the COFF file to record how many sections are included:

Figure 1.4 – COFF

At the end of this structure is a whole array of IMAGE_SECTION_HEADER to record the current location and size of the content of each section in the file. Closely attached at the end of this array is the substantive content of each section. In practice, the first section will usually be the content of the .text section.

In the next stage, the Linker is responsible for adding an extra piece of the COFF file to the application loader, which will become our common EXE program.

Important note

In the case of x86 chip systems, it is customary to reverse the pointer and digit per bit into the memory when encoding. This practice is called little-endian, as opposed to a string or array that should be arranged from lowest to highest address. The data arrangement of multiple bytes varies according to the chip architecture. Interested readers can refer to the article How to write endian-independent code in C (https://developer.ibm.com/articles/au-endianc/).

In this section, we learned about the COFF, which is used to record the contents in the memory of the various sections recorded by the compiler.

Windows linker – packing binary data into PE format

In the previous section, we assumed some memory distribution during the program's compilation. For example, the default EXE module image base should be at 0x400000 so that executable content should be placed. The .text section should be placed at 0x401000 above its image base. As we said, the .idata section is used to store the import address table, so the question is who or what is responsible for filling the import address table?

The answer is that every OS has an application loader, which is designed to fill all these tasks correctly when creating a process from a static program. However, there is a lot of information that will only be known at the compiling time and not by the system developer, such as the following:

Does the program want to enable Address Space Layout Randomization (ASLR) or Data Execution Prevention (DEP)?
Where is the main(int, char) function in the .text section written by the developer?
How much of the total memory is used by the execution module during the dynamic phase?

Microsoft has therefore introduced the PE format, which is essentially an extension to the COFF file, with an additional optional header structure to record the information required by the Windows program loader to correct the process. The following chapters will focus on playing with the various structures of the PE format so that you can write an executable file by hand on a whiteboard.

All you need to know now is that a PE executable has some key features:

Code content: Usually stored as machine code in the .text section
Import address tables: To allow the loader to fill in the function addresses and enable the program to get them correctly
Optional header: This structure allows the loader to read and know how to correct the current dynamic module

Here is an example in Figure 1.5:

Figure 1.5 – Minimalist architecture of the program

msgbox.exe is a minimalist Windows program with only three sections: .text, .rdata, and .idata. After dynamic execution, the system application loader sequentially extracts the content of the three sections and writes them each to the offset of 0x1000, 0x2000, and 0x3000 relative to the current PE module (msgbox.exe).

In this section, we learned that the application loader is responsible for correcting and filling the program content to create a static program file into a process.

Running static PE files as dynamic processes

At this point, you have a general idea of how a minimal program is generated, compiled, and packaged into an executable file by the compiler in the static phase. So, the next question is, What does the OS do to get a static program running?

Figure 1.6 shows the process structure of how an EXE program is transformed from a static to a dynamic process under the Windows system:

Figure 1.6 – Dynamic operation of the process hatching flow

Note that this is different from the process hatching process in the latest version of Windows. For the sake of explanation, we'll ignore the processes of privilege escalation, the patching mechanism, and kernel generation, and only talk about how a static program is correctly parsed and run from a single execution.

On Windows systems, all processes must be hatched by the parent process by interrupting the system function to jump to the kernel level. For example, a parent process is currently trying to run the cmd.exe /c whoami command, which is an attempt to hatch the cmd.exe static file into a dynamic process and assign its parameters to /c whoami.

So, what happens in the whole process? As shown in Figure 1.6, these are the steps:

The parent process makes a request to the kernel with CreateProcess, specifying to generate a new process (child process).
Next, the kernel will produce a new process container and fill the execution code into the container with file mapping. The kernel will create a thread to assign to this child process, which is commonly known as the main thread or GUI thread. At the same time, the kernel will also arrange a block of memory in Userland’s dynamic memory to store two structural blocks: a process environment block (PEB) for recording the current process environment information and a thread environment block (TEB) for recording the first thread environment information. The details of these two structures will be fully introduced in Chapter 2, Process Memory – File Mapping, PE Parser, tinyLinker, and Hollowing, and in Chapter 3, Dynamic API Calling – Thread, Process, and Environment Information.
The NtDLL export function, RtlUserThreadStart, is the main routing function for all threads and is responsible for the necessary initialization of each new thread, such as the generation of structured exception handling (SEH). The first thread of each process, that is, the main thread, will execute NtDLL!LdrInitializeThunk at the user level and enter the NtDLL!LdrpInitializeProcess function after the first execution. It is the executable program loader, responsible for the necessary correction of the PE module loaded into the memory.
After the execution loader completes its correction, it jumps back to the current execution entry (AddressOfEntryPoint), which is the developer’s main function.

Important note

From a code perspective, a thread can be thought of as a person responsible for executing code, and a process can be thought of as a container for loading code.

The kernel layer is responsible for file mapping, which is the process of placing the program content based on the preferred address during the compiling period. For example, if the image base address is 0x400000 and the .text offset is 0x1000, then the file mapping process is essentially a simple matter of requesting a block of memory at the 0x400000 address in the dynamic memory and writing the actual contents of .text to 0x401000.

In fact, the loader function (NtDLL! LdrpInitializeProcess) does not directly call AddressOfEntryPoint after execution; instead, the tasks corrected by the loader and the entry point are treated as two separate threads (in practice, two thread contexts will be opened). NtDLL!NtContinue will be called after the correction and will hand over the task to the entry to continue execution as a thread task schedule.

The entry point of the execution is recorded in NtHeaders→OptionalHeader.AddressOfEntryPoint of the PE structure, but it is not directly equivalent to the main function of the developer. This is for your understanding only. Generally speaking, AddressOfEntryPoint points to the CRTStartup function (C++ Runtime Startup), which is responsible for a series of C/C++ necessity initialization preparations (e.g., cutting arguments into developer-friendly argc and argv inputs, etc.) before calling the developer’s main function.

In this section, we learned how EXE files are incubated from static to dynamically running processes on the Windows system. With the process and thread, and the necessary initialization actions, the program is ready to run.

Windows APT Warfare: Identify and prevent Windows APT attacks effectively

What do you get with a Packt Subscription?

Product Details

Windows APT Warfare

From Source to Binaries – The Journey of a C Program

The simplest Windows program in C

C compiler – assembly code generation

Assembler – transforming assembly code into machine code

Compiling code

Windows linker – packing binary data into PE format

Running static PE files as dynamic processes

Summary

Page 1 of 8

Key benefits

Description

What you will learn

What do you get with a Packt Subscription?

Product Details

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

Authors (1)

FAQs

Windows APT Warfare: Identify and prevent Windows APT attacks effectively

What do you get with a Packt Subscription?

Product Details

Key benefits

Description

What you will learn

What do you get with a Packt Subscription?

Product Details

Packt Subscriptions

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

Authors (1)

FAQs