• The compilation process is summarized in Figure 5.11.
Compilation begins with high-level language code such as C and generally produces assembly code. • The high-level language program is parsed to break it into statements and expressions. • In addition, a symbol table is generated, which includes all the named objects in the program. • Some compilers may then perform higher-level optimizations that can be viewed as modifying the high-level language program input without reference to instructions. Cont…
• For example, consider the following array access code:
x[i] = c*x[i]; • A simple code generator would generate the address for x[i] twice, once for each appearance in the statement. • The later optimization phases can recognize this as an example of common expressions that need not be duplicated. • While in this simple case it would be possible to create a code generator that never generated the redundant expression, taking into account every such optimization at code generation time is very difficult. • We get better code and more reliable compilers by generating simple code first and then optimizing it. Statement Translation Procedures • Another major code generation problem is the creation of procedures. • Generating code for procedures is relatively straightforward once and the procedure linkage appropriate for the CPU. • At the procedure definition, we generate the code to handle the procedure call and return. • At each call of the procedure, we set up the procedure parameters and make the call from compiled code. • Procedure stacks are typically built to grow down from high addresses. • A stack pointer (sp) defines the end of the current frame, while a frame pointer (fp) defines the end of the last frame. • The ARM Procedure Call Standard (APCS) is a good illustration of a typical procedure linkage mechanism. • r0 - r3 are used to pass parameters into the procedure. • r0 is also used to hold the return value. If more than four parameters are required, they are put on the stack frame. • r4 - r7 hold register variables. • r11 is the frame pointer and r13 is the stack pointer. • r10 holds the limiting address on stack size, which is used to check for stack overflows. • Other registers have additional uses in the protocol. Data Structures • The compiler must also translate references to data structures into references to raw memories. In general, this requires address computations. • Some of these computations can be done at compile time while others must be done at run time. • Arrays are interesting because the address of an array element must in general be computed at run time, since the array index may change. • Let us first consider one-dimensional arrays: a[i] Cont….
• The layout of the array in memory is shown in Figure 5.13.
• The zeroth element is stored as the first element of the array, the first element directly below, and so on. • Create a pointer for the array that points to the array’s head, namely, a[0]. • If call that pointer aptr for convenience, then we can rewrite the reading of a[i] as *(aptr + i) Cont….
• There are multiple possible ways to lay out a two-dimensional array in
memory, as shown in Figure 5.14. • In this form, which is known as row major, the inner variable of the array ( j in a[i, j]) varies most quickly. (Fortran uses a different organization known as column major.) • Let us consider the row-major form. If the a[ ] array is of size N M, then we can turn the two-dimensional array access into a one-dimensional array access. Thus, a[i,j] becomes a[i*M + j] where the maximum value for j is M - 1. Cont…
• A C structure is easier to address. As shown in Figure 5.15, a
structure is implemented as a contiguous block of memory. • Fields in the structure can be accessed using constant offsets to the base address of the structure. • In this example, if field1 is four bytes long, then field2 can be accessed as *(aptr + 4) . • This addition can usually be done at compile time, requiring only the indirection itself to fetch the memory location during execution.