Professional Documents
Culture Documents
Translation of The Book Windows APT Warfare - Sudo Null IT News
Translation of The Book Windows APT Warfare - Sudo Null IT News
Translation of The Book Windows APT Warfare - Sudo Null IT News
I finished
translating the book Windows APT Warfare. In its own way, a very interesting
read, for those who work with assembler, malware and information security. I'll
leave the first part here. The rest of the book can be picked up on damage, now
- xss.is. Maybe someone can layout it in PDF, I will be very grateful. Thanks to
all.
In this chapter, we will learn the basics of how compilers package EXE binaries
from C code and methods for executing system processes. These basic
concepts will help you understand how Windows compiles C into programs and
links them between system components. You will also understand the program
structure and workflow that malware analysis should follow.
Here is the most simplified C program for Windows. Its purpose is to call the
USER32!MessageBox() function at the entry point of the main() function to open
a popup window with an informational header and welcome content.
What is interesting to understand in the previous section is the reason why the
compiler understands this C code. Firstly, the main job of the compiler is to
convert the C code into assembly code according to the C/C++ calling
convention, as shown in Fig. 1.1:
For convenience and practicality, the following examples will be presented with
x86 instructions. However, the methods and principles described in this book
are common to all Windows systems, and the compiler examples are based on
the GNU Compiler Collection (GCC) for Windows (MinGW).
When the compiler generates assembly code, it recognizes the system's calling
conventions, arranges the parameters in memory according to its preferences,
and then calls the function's memory address using the call
instruction. Therefore, when the thread jumps to the system instruction, it can
correctly obtain the function parameter at the expected memory address.
Let's take Fig. 1.1: We know that the USER32!MessageBoxA function prefers
WINAPI calling conventions. In this calling convention, the contents of the
parameter are pushed onto the stack from right to left, and the memory freed for
this calling convention selects the called object. So, after pushing 4 parameters
onto the stack, occupying 16 bytes on the stack (sizeof(uint32_t) x 4), the code
will be executed in USER32!MessageBoxA. After executing the function
request, return to the next line of the Call MessageBoxA instruction with ret
0x10 and free 16 bytes of memory from the stack (i.e. xor eax, eax).
The book focuses only on how the compiler generates single-chip instructions
and encapsulates the program into an executable file. It does not include
important parts of advanced compiler theory such as semantic tree generation
and compiler optimization. These parts are reserved for readers to study for
further study.
In this section, we learned about C/C++ calling convention, how parameters are
placed in memory in order, and how memory is freed when a program
terminates.
At this point you may notice that something is wrong. The processor chips we
use every day are not capable of executing text-based assembly code, but are
instead converted into machine code for the appropriate set of instructions to
perform the corresponding memory operations. Thus, during the compilation
process, the previously mentioned assembly code is converted by the
assembler into machine code that the chip can understand.
In Fig. Figure 1.2 shows the dynamic memory allocation of a 32-bit PE:
Since the chip cannot directly parse strings such as HELLO WORLD or INFO,
data (such as global variables, static strings, global arrays, etc.) is first stored in
a separate structure called section. Each partition is created with an offset
address where it should be placed. If the code later needs to retrieve resources
identified during these compilation periods, the corresponding data can be
retrieved at the appropriate offset addresses. Here's an example:
There is no guarantee that in practice the compiler will generate .text, .rdata,
and .idata sections or that they will be used for these functions. Most compilers
follow the previously mentioned memory allocation principles. Visual Studio
compilers, for example, do not create executable programs with .idata sections
to store function pointer tables, but rather with .rdata sections that are readable
and writable.
As mentioned earlier, if the code contains strings or text functions that the chip
does not understand, the compiler must first convert them to absolute
addresses that the chip can understand and then store them in separate
sections. It is also necessary to translate the text script into native code or
machine code that the chip can recognize. How does this work in practice?
Once the compiler completes the above block packing, the next step is to
extract and encode the text instructions from the script, one by one, according
to the x86 instruction set, and write them into the .text section, which is used to
store the machine code.
As shown in Fig. 1.3, the dotted box is the text-type assembly code resulting
from compiling the C/C++ code:
You can see that the first instruction is push 0, which pushes 1 byte of data onto
the stack (stored as 4 bytes), and 6A 00 is used to present this instruction. The
push instruction 0x402005 pushes 4 bytes onto the stack at a time, so push 68
50 20 40 00 is used to achieve the longer push instruction. call ds:[0x403018] is
a 4-byte address and a long machine code call, FF 15 18 30 40 00, is used to
represent this instruction.
Although in Fig. Figure 1.3 shows the memory allocation of the dynamic file
msgbox.exe; the file created by the compiler is not yet an executable PE
file. Rather, it is a file called Common Object File Format (COFF), or an object
file as some people call it, which is a wrapper file specifically designed to record
the various sections produced by the compiler. The following figure shows a
COFF file obtained by compiling and assembling the source code using the gcc
-c command and viewing its structure using the famous PEview tool.
In the next step, the linker is responsible for adding an extra fragment of the
COFF file to the application loader, which will become our general EXE
program.
In the case of systems with an x86 chip, it is common to swap the pointer and
digit for a bit in memory during encoding. This practice is called little-endian, as
opposed to a string or array, which must be arranged from least significant to
most significant address. The layout of the multi-byte data depends on the chip
architecture. Interested readers may refer to the article How to Write Endian-
Independent Code in C (https://developer.ibm.com/articles/au-endianc/).
In this section, we learned about COFF, which is used to write the contents into
memory of various sections written by the compiler.
Windows Linker - Packing binary data into PE format
All you need to know now is that the PE executable has some key features:
In this section, we learned that the application loader is responsible for patching
and populating the program content to create a static program file in the
process.
So what happens in the whole process? As shown in Figure 1.6, these are the
steps:
2. The kernel will then create a new process container and populate the
container with executable code with file associations. The kernel will create a
thread to assign to this child process, which is usually called the main thread or
GUI thread. At the same time, the kernel also organizes a memory block in the
Userland heap to store two building blocks: a process environment block (PEB)
to record current process environment information, and a thread environment
block (TEB) to record thread environment information. Details of these two
structures will be fully presented in Chapter 2, “Process Memory—File Mapping,
PE Parser, tinyLinker, and Hollowing,” and Chapter 3, “Dynamic API Call—
Thread, Process, and Environment Information.”
4. After the execution loader completes the fix, it returns to the current
execution entry (AddressOfEntryPoint), which is the main function of the
developer.
The kernel level is responsible for file mapping, which is the process of placing
program content based on a preferred address at compile time. For example, if
the image base address is 0x400000 and the .text offset is 0x1000, then the file
mapping process is essentially just requesting a block of memory at address
0x400000 on the heap and writing the actual contents of the .text to address
0x401000.
In this section, we learned how EXE files are transformed from static to
dynamically running processes in a Windows system. With a process and
thread and the necessary initialization steps, the program is ready to run.
Results
In this chapter, we have explained how the OS converts C code into assembly
code using a compiler and into executable programs using a linker.
The next chapter will build on this framework and take you through hands-on
experience with the entire flowchart in several C/C++ labs. In the following
chapters, you'll learn the intricacies of PE format design by creating a compact
program loader and writing an executable program yourself.