Professional Documents
Culture Documents
Peephole Optimization: Replacing Slow Instructions With Faster Ones
Peephole Optimization: Replacing Slow Instructions With Faster Ones
In compiler theory, peephole optimization is a kind of optimization performed over a very small set
of instructions in a segment of generated code. The set is called a "peephole" or a "window". It works
by recognising sets of instructions that can be replaced by shorter or faster sets of instructions.
Common techniques applied in peephole optimization: [1]
Special case instructions Use instructions designed for special operand cases.
There can, of course, be other types of peephole optimizations involving simplifying the target
machine instructions, assuming that the target machine is known in advance. Advantages of a given
architecture and instruction sets can be exploited, and disadvantages avoided in this case.
...
aload 1
aload 1
mul
...
can be replaced by
...
aload 1
dup
mul
...
This kind of optimization, like most peephole optimizations, makes certain assumptions about the
efficiency of instructions. For instance, in this case, it is assumed that the dup operation (which
duplicates and pushes the top of the stack) is more efficient than the aload X operation (which
loads a local variable identified as X and pushes it on the stack).
a = b + c;
d = a + e;
is straightforwardly implemented as
MOV b, R0
ADD c, R0
# Add
MOV R0, a
MOV a, R0
ADD e, R0
# Add
MOV R0, d
PUSH AF
PUSH BC
PUSH DE
PUSH HL
CALL _ADDR
POP HL
POP DE
POP BC
POP AF
If there were two consecutive subroutine calls, they would look like this:
PUSH AF
PUSH BC
PUSH DE
PUSH HL
CALL _ADDR1
POP HL
POP DE
POP BC
POP AF
PUSH AF
PUSH BC
PUSH DE
PUSH HL
CALL _ADDR2
POP HL
POP DE
POP BC
POP AF
The sequence POP regs followed by PUSH for the same registers is generally redundant. In cases
where it is redundant, a peephole optimization would remove these instructions. In the example, this
would cause another redundant POP/PUSH pair to appear in the peephole, and these would be
removed in turn. Removing all of the redundant code in the example above would eventually leave
the following code:
PUSH AF
PUSH BC
PUSH DE
PUSH HL
CALL _ADDR1
CALL _ADDR2
POP HL
POP DE
POP BC
POP AF