Professional Documents
Culture Documents
Stratified Synthesis: Automatically Learning The x86-64 Instruction Set
Stratified Synthesis: Automatically Learning The x86-64 Instruction Set
Stratified Synthesis: Automatically Learning The x86-64 Instruction Set
Alex Aiken
Stanford University, USA
aiken@cs.stanford.edu
# learn a formula for instr # generalize a learned formula for an instruction to a set
15: procedure S TRATAO NE(instr , baseset, tests) of formulas
16: prog instr ← I NSTANTIATE I NSTR(instr ) 17: procedure G ENERALIZE(instr , formula)
17: t = S YNTHESIZE(prog instr , baseset, tests) 18: result ← ∅
18: if t is timeout then return hfailurei 19: for instr 0 ← A LL I NSTRUCTIONS() do
# the set of equivalence classes of learned programs 20: if instr 0 has same mnemonic as instr then
19: eqclasses ← {{t.prog}} 21: continue
20: while true do 22: candidate ← N ONE
21: t = S YNTHESIZE(prog instr , baseset, tests) 23: if operands of instr 0 have same size then
22: if t is timeout then 24: candidate ← rename operands in formula
23: break 25: if operands of instr 0 have smaller size then
24: else 26: candidate ← rename and sign-extend operands
25: t ← C LASSIFY(t.prog, eqclasses, tests) 27: if operands of instr 0 have larger size then
26: if t is failure then return hfailurei 28: candidate ← rename and select operands
27: if |eqclasses| > threshold then 29: if candidate 6= N ONE then
28: break 30: test candidate with random constant operands
29: bestclass ← C HOOSE C LASS(eqclasses) and on random test cases
30: bestprog ← C HOOSE P ROG(bestclass) 31: add to result if all tests pass
31: formula ← B UILD F ORMULA(instr , bestprog)
32: return result
32: return hsuccess, formulai
Number of formulas
140
paring the learned formulas with the manual. However, over
120
the course of this project we found several inconsistencies 100
when manually comparing the two. This is unsurprising given 80
the over 3,800 pages of intricate technical content, the vast 60
majority of which is not machine-checked. Examples of these 40
errors include: 20
• The upper 128 bits of the output register of pshufb xmm0, xmm1 0
9x smaller 3x smaller equal 3x larger 9x larger
are kept unmodified on Intel hardware, but the pseudo- Relative size increase for Strata formulas
code in the manual incorrectly says they are cleared.
• The Op/En column for the vextracti128 instruction Figure 1. Distribution of the relative increase in size of
the formulas learned by S TRATA compared to the hand-
should be MRI, but it was RMI. This causes a wrong
written formulas. Negative numbers indicate that the learned
operand ordering when assembling this instruction.
formulas were smaller.
• One possible encoding for xchgl eax, eax is 0x90
according to the manual, and its semantics are to clear the
upper 32 bits of rax. However, 0x90 is also the encoding (RQ4). We found that the automatically synthesized formulas
of nop, which does nothing according to the manual (and have the same number of non-linear operations, except for 9
hardware). instruction variants, where they contain fewer such operations
(the hand-written formulas for some rotate instructions con-
• The textual description of roundpd claims to produce a tain a modulo operation, whereas the automatically learned
single-precision floating point value, when it produces a ones do not). In all of these cases the non-linear operation is
double-precision floating point value. a modulo by a constant, and is likely not a big problem for
We have reported all of these items and they were confirmed SMT solvers (unlike non-linear operations by variables).
by Intel as errors in the manual [8]. The number of uninterpreted functions (UIF) is the same
for almost all instruction variants, with the exception of 4
5.4 Formula Size formulas, where the automatically learned formulas contain
Large formulas can cause problems for SMT solvers and per- redundant UIFs: Instead of using a single UIF, the learned
form less well, especially if formulas are further combined formula uses 2 UIFs that combined perform the same op-
(e.g., to prove equivalence between two x86-64 programs). eration as the single UIF. In all cases a simplification step
For this reason, we answer (RQ3) by comparing the number that knows about floating point operations should be able to
of nodes in the AST of the formula representation in the au- remove these redundant UIFs.
tomatically learned formulas and the hand-written formulas. In summary, the automatically learned formulas are com-
For every instruction variant where we have both a learned parable in size, and with the exception of 4 instruction vari-
and hand-written formula, we compute ants are equally precise and contain at most as many non-
number of nodes in learned formula linear arithmetic operations compared to the hand-written
number of nodes in hand-written formula formulas.
We plot a histogram of this distribution in logarithmic space
5.6 Simplification
(such that increases and decreases in size are shown sym-
metrically) in Figure 1. The center of the figure consists of To address (RQ5), we measure the size of the learned formu-
formulas that have the same size, with instructions where las before and after simplification and compute the relative
S TRATA’s formulas are smaller to the left and instructions decrease in size (as before when comparing with handwrit-
where the hand-written formulas are smaller to the right. The ten formulas). At the median of this distribution, simplified
median of this distribution is one (i.e., formulas are equally formulas are 8.45 times smaller, and in the extreme they are
large). The smallest formula has size 0 (for the nop instruc- smaller by up to a factor of 1,583. A small number of formu-
tion), and the largest formula has 211 and 196 nodes for las are larger after simplification (due to using a distributive
S TRATA and the hand-written formulas, respectively. law), but at most by a factor of 1.33. The full distribution is
shown in Figure 2.
5.5 Formula Precision
Two further important aspects of the quality of formulas is 5.7 Stratification
the number of uninterpreted function applications (a measure To answer (RQ6), we can investigate if instructions could
of preciseness) and the number of non-linear arithmetic oper- have been learned sooner. To this end, we define the stratum
ations (which tend to perform poorly in queries), as stated by of a formula that corresponds to an instruction variant i induc-
250 0.14
95
0.12 89 90
Fraction of formulas
Number of formulas
200 78
0.10
63
150 0.08 59
55
51
46 47
100 0.06
0.04
21
50 16
0.02 13
9
6
3 2
0 0.00
equal 10x smaller 100x smaller 1000x smaller 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Relative size decrease simplified formulas Stratum
Figure 2. Distribution of the relative decrease in size of the Figure 3. Distribution of the strata for all register-only
formulas learned by S TRATA due to the simplification step. instruction variants. Formulas at stratum 0 are from the base
set, and all others are learned formulas.
tively as follows. Let M (i) represent the set of instructions
that the synthesized program for i uses, and let B be the set 5.8 Experimental Details
of instructions in the base set. Then, We performed all experiments on an Intel Xeon E5-2697
0 machine with 28 physical cores, running at 2.6 GHz. For en-
if i ∈ B
stratum(i) = 1 + max stratum(i0 ) otherwise gineering reasons, we split the problem into learning register-
0 i ∈M (i)
only variants and learning formulas for instructions with
an 8-bit constant. Finally, we generalized these formulas to
Intuitively, this captures the earliest point at which a particular other similar instruction variants with memory or immediate
formula could have been learned with the given program. We operands.
show the distribution of strata in Figure 3, which shows that Our algorithm is embarrassingly parallel and we run it
stratified synthesis is in fact necessary for our approach to on all available cores. We let the register-only experiment
learning the semantics of x86-64: only 13.73% of instructions run for 268.86 hours, and spent 159.12 hours for the 8-bit
are learned at stratum 1, the 50th percentile of all learned in- constant experiment. Total runtime therefore is 427.98 hours,
structions is reached only at stratum 4, and the 90th percentile or 11983.37 CPU core hours. Figure 4 shows the progress
is not reached until stratum 10. during the learning of register-only instructions, both for
In S TRATA, formulas are added to the base set as soon S TRATA as presented and S TRATA with stratification turned
as a specification has been learned (by learning multiple off. S TRATA is able to quickly learn formulas for the simple
programs). So when an instruction i at stratum n is added instructions, with progress tapering off as more difficult
to the base set and another instruction i0 makes use of instructions are attempted.
i during the search for its specification, we say i0 was During the process of learning formulas, we invoked an
learned at stratum (at least) n + 1. However, there might SMT solver to decide what equivalence class a program
be another (equivalent) specification where the instruction belongs to (procedure C LASSIFY in Algorithm 2). Out of
i was not used, and thus the stratum for i0 then might be the 7,075 such decisions S TRATA had to make when learning
different (and lower). Therefore, to definitively answer (RQ6), register-only variants, we found an equivalent program in
we run S TRATA where new instructions are not added to 6,669 cases (94.26%) and created a new equivalence class in
the base set (for the same amount of time, on the same 356 cases (5.03%) as the solver could not prove equivalence
machine as the main experiment). This effectively turns with any existing class. In only 3 out of these 356 cases a
off stratification and determines the maximum number of solver timeout was the cause, and in all other cases the solver
instructions this approach can learn at stratum 1. We find did not give a valid counterexample.
that for the register-only variants, we can learn only 426 We also found a (valid) counterexample in 50 cases. This
formulas without stratification, compared to the total 692 highlights the importance of learning many programs, as the
formulas learned with stratification (an increase of 62.4%). first one we learn can actually be wrong. It also shows that
In the experiment with stratification, only 95 formulas were for the most part the testcases work well.
learned at stratum 1, showing that indeed, some formulas
at higher strata could have been learned at lower strata if 5.9 Implementation
we had given more time to the program synthesis at those We make S TRATA, our implementation of stratified synthesis
lower strata. However, overall stratification was essential to available online, together with the results of the experiments
learning. described in this paper including all learned formulas.
https://stefanheule.com/strata/
complete specification than the existing ones. For instance,
Number of formulas learned