Cheat Sheet Final Final

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

LANGUAGES, GRAMMARS AND AUTOMATA Mul- First/Follow conflict the First and Follow of a variable intersect can continue

of a variable intersect can continue to shift too. It cannot deterministically choose


tiple matches - What if multiple DFA’s recognize the same - solutions: substitution, left recursion removal strong LL(k) whether to reduce now or do so later. sentential-form prefixes
string? Two options: 1. prioritize the first regular expression in First k α1 · Follow k (A) ∩ First k α2 · Follow k (A) = on the stack Let (q, w, γZ0 ) be a configuration reached along


the specification of the scanner -> keep the DFA’s ordered and ∅ Error recovery vs repair Recovery: ignore the current error an accepting run of the bottom-up parser with q not of the form
choose the first DFA which accepts 2. Disallow multiple matches, and have the parser reach a correct configuration - Repair: parser (A, . . .). Then: S ⇒∗ γ R · w along a rightmost derivation. This
the specifications must be unambiguous -> requires language attempts to fix the input to continue Panic Mode parser skips does not mean however that if we make sure the stack always has
checking when generating the scanner -> for all pairs of Regular input tokens until it finds a frequently occurring delimiter (e.g. ;) prefixes of valid sentential forms, then we will get an accepting
Expressions with languages L1, L2 → check L1 ∩ L2 6= φ Gram- Error detection in LL(1) if no entry in action table -> error. run. viable prefix A prefix π is viable if it is the prefix of a
mar/Language Types regular grammars – regular language ⊂ We can output a message with what the parser expected Panic right-sentential form that can occur (in reverse) on the top of the
context-free grammars – context-free language ⊂ context-sensitive Mode Recovery in LL(1) every call to not-terminal-function stack along an accepting run of the bottom-up parser. canonical
grammars – context-sensitive language ⊂ all grammars – recur- is given as argument some set of potential end-of-block lexical finite-state machine CFSM is an automaton that recognizes as
sively enumerable Grammar definition A grammar is a tuple units (e.g. ;) - if an error occurs, the function discards tokens its language exactly the set of all viable prefixes of a given CFG
G = (V, T, P, S) where - V is a finite set of variables, - T is a finite until one of those tokens is matched Undecidable Problems finite language every finite language is regular and therefore
set of terminals, - P is a finite set of production rules of the form type checking - line number reachability Static Type Checking LL(1) (THIS DOES NOT MEAN THE GRAMMAR IS LL(1)!!)
α → β with - α ∈ (V ∪ T )∗ V (V ∪ T )∗ and - β ∈ (V ∪ T )∗ , - S ∈ V apply inference rules to deduce types of expressions - conservative item An item of a grammar is a rule where we have inserted • as
is a variable called the start symbol. Context-sensitive gram- approach since some programs will be marked as incorrect, yet a marker somewhere in the RHS, e.g. A → α1 • α2 closure of an
mars either α = S and β = ε or |α| ≤ |β| and S does not appear the type-incorrect instructions may not even be reached Argue item The closure of an item A → α1 • Bα2 with B ∈ V is the set
in β Context-free grammars α ∈ V Regular grammars type checking and reachability analysis are undecidable of all items B → •β where B → β is a rule of the grammar. - The
a ∈ V and 1. left-regular: β ∈ T ∗ ∪ (V · T ∗ ) 2. right-regular: Deciding reachability and type checking amounts to deciding the closure of a set I is the minimal closure-closed set that includes
β ∈ T ∗ ∪ (T ∗ .V ) Ambiguous grammar if there’s more than halting problem. 1. In the case of type checking, determining I. CFSM definition The CFSM is the DFA (Q, V ∪ T, δ, q0 )
one derivation tree for the same word and grammar (HAVE NO whether a program satisfies the type rules of a programming where - Q is the set of all subsets of items, - q0 is the closure of
PARSER!) Chomsky Normal Form A CFG is in CNF if all its language involves analyzing the compatibility of operations and {S → •α | S → α is a rule }, - δ(q, a) maps to the closure of
production rules are of the form: 1. A → BC (B and C cannot assignments with types. By reducing the halting problem to type the a-successor of q. - (All states are accepting! Except for the
be the start symbol) 2. A → a 3. S → ε Nondeterministic checking, one can show that deciding type checking is undecidable. empty set, i.e. a bad sink.) LR Left scanning, Right parsing
Pushdown Automata NPDA A is a tuple (Q, Σ, Γ, δ, q0 , Z0 , F ) 2. Similarly, reachability analysis involves determining if a par- parsers LR(0) simulation shift from left to right, if we recognize
such that - Q is a finite set of states; - Σ is a finite input alphabet; ticular program state or instruction is reachable during program a body, reduce. keep going until we only have S LR(0) action
- Γ is a finite stack alphabet; - δ : Q × (Σ ∪ {ε}) × Γ → 2Q×Γ execution. By constructing programs that exhibit the same be- table Index grammar rules 1 ≤ j ≤ n and states from the CFSM

is the transition function; - q0 ∈ Q is the initial state; - Z0 ∈ Γ havior as programs in the halting problem, one can demonstrate 0 ≤ i ≤ k. The table T maps each i to a set of actions: - T (i) con-
is the initial symbol on the stack; and - F ⊆ Q is the set of that deciding reachability is also undecidable. Henceforth, we tains (a Reduce) j if state i has the item A → α• with A → α the
accepting states. NPDA ≡ CFG can be transformed into each write type checking for static type checking: apply inference rules j-th rule. - T (i) contains Shift if state i has an item A → α1 • α2 .
other Synthesized vs Inherited attributes an attribute is to deduce types of expressions (without caring about reachability - T (i) contains Accept if state i has an item S → α• - T (∅)
synthesized if its value depends on the values of its children in of instructions). PARSERS top-down - produce: use a rule contains an Error action only. - Note that every cell of the action
some derivation tree. Otherwise, it is inherited. Static Sin- A → α to pop A from the stack and push α. - match: pop the table has at least one action because all states from the CFSM
gle Assignment every register can be assigned at most once terminal a from the stack while simultaneously reading a - stack contain some item (except ∅). Action Table for SLR Index
- registers are not variables - Advantages: Easier to optimize start with S and accept by empty stack LL(1) easy proof if you grammar rules 1 ≤ j ≤ n and states from the CFSM 0 ≤ i ≤ k.
because 1. registers correspond to a single value 2. instruction can construct a DFA, it is LL(0) (and therefore LL(1) LL(2), The table T maps each i to a set of actions: - T (i) contains j if
results go to output registers, and stay there (so we can very . . . too of course) LL(1)Action Table terminals in columns, state i has the item A → α• with A → α the j-th rule. - T (i)
easily determine if a register is never used e.g.) Control Flow non-terminals in rows - T (A, a) contains the rule A → β iff a contains (a Shift) a if state i has an item A → α1 • aα2 . - T (i)
Graph a graph with all the paths that might be traversed by a is in First1 (β) or ε is in First1 (β) and a is in Follow1 (A), contains Accept if state i has an item S → α•. - T (∅) contains
program, each node is a basic block Phi-function assigns the so if a is in First1 (βFollow1 (A)) LL(k) grammar A CFG is an Error action only. Action Table with look-ahead 1. Push
defined value to a variable based on a condition - e.g. %d = phi LL(k) iff for all pairs of derivations S ⇒∗ wAγ ⇒ wα1 γ ⇒∗ wx1 the initial state 0 of the CFSM into the stack S. 2. As long as we
i32 [%c1, %iftru], [%c2, %iffls] - only necessary if not doing lazy and S ⇒∗ wAγ ⇒ wα2 γ ⇒∗ wx2 with w, x1 , x2 ∈ T ∗ , A ∈ V , can’t accept or get an error: 2.1 If j ∈ T (top(S)), A → α is the
solution CFG to PDA one state - for all rules A → B add tran- and γ ∈ (V ∪ T )∗ , if Firstk (x1 ) = Firstk (x2 ) then α1 = α2 j-th rule, and the look-ahead ` is in Follow k (A), then 2.1.1 pop
sition that pops A and pushes B on empty input - ε, A/B - for all bottom-up - shift: read a terminal a and push it onto the stack |α| times from S 2.1.2 and put A in variable c. 2.2 Otherwise, if
terminals t ∈ T add transition that pops t when reading t - t, t/ε - - reduce: use a rule A → α to pop the αR from the stack the look-ahead starts with a, put a into the variable c. 2.3 Let q 0
k-look-ahead PDA δ : Q × (Σ ∪ {ε}) × Γ × Σ≤k → 2Q×Γ be the next state δ(top(S), c) of the CFSM after reading c. 2.4

and push A - stack start with empty stack and accept if stack
where Σ≤k = ∪ki=0 Σi First and Follow Firstk (α) := contains S reduce-reduce conflicts When the top of the stack Push q 0 into S. Look-ahead LR(k) parsers similar to SLR(K)
{w ∈ T ∗ : α ⇒∗ wx ∧ (|w| = k or |w| < k ∧ x = ε)}, corresponds to the handle for two different rules. The parser can- except it uses ItemFollow instead of the general Follow. Use
Follow k (α) := {w ∈ T ∗ | ∃β, γ : S ⇒∗ βαγ ∧w ∈ First k (γ)} not deterministically choose one. shift-reduce conflicts When the LR(k) CFSM, and merge the states with same items (and
First/First conflict two rules for the same variable have inter- the top of the stack corresponds to some handle, but the parser different look-aheads). LR(K)-items These are item-look-ahead
secting First’s - solutions: left factoring, left recursion removal pairs (A → α1 • α2 , w) where w ∈ Σ≤k . LR(k)-closure The
closure of an LR(k)-item (A → α1 • Bα2 , w) is the set of all x∈ / {y, z} then return r. (We will overwrite it anyways!) 3.3 If $sp, $sp, -8; sw $ra, 4($sp); sw $s0, 0($sp); li $v0, -1; blt $a0,
LR(k)-items (B → •β, y) where B → β is a rule of the gram- all variables ... are not used after the instruction (i.e. they are 0, exit; blt $a1, 0, exit; addi $v0, $a1, 1; beq $a0, 0, exit; beq
mar, and y ∈ Firstk (α2 w). As before, the closure of a set I not live), again return r. 3.4 Otherwise spill the variables into $a1, 0, rec_call; addi $a1, $a1, -1; move $s0, $a0; jal ackermann;
is the minimal closure-closed set including I. LR(k)-CFSM memory. Register allocation for result x = y OP zSince the addi $a0, $s0, -1; move $a1, $v0; jal ackermann; j exit; rec_call:
States of the CFSM are now subsets of LR(k)-items. The a- value of x is being computed then a register r currently holding addi $a0, $a0, -1; li $a1, 1; call ackermann; j exit; exit: lw $ra,
successor of a state I, for a ∈ T ∪ V , is the closure of the set the value of x is always a safe choice. Otherwise, if y or z are 4($sp); lw $s0, 0($sp); add $sp, $sp, 8; jr $ra; LLVM IR define
{(A → α1 a • α2 , w) | (A → α1 • aα2 , w) ∈ I} The LR(k)-CFSM not live after the current instruction, then Ry or Rz can be used. dso_local i32 @ackermann(i32 noundef %0, i32 noundef %1) {
is the DFA (Q, V ∪ T, δ, q0 ) where - Q is the set of all subsets Inference graph 1. Nodes correspond to the (infinitely-many) %3 = alloca i32, align 4; %4 = alloca i32, align 4; %5 = alloca
of LR(k)-items, - q0 is the closure of {(S → •α, ε) | S → α is a registers used in the generated code (i.e. the variables). 2. An i32, align 4; store i32 %0, ptr %4, align 4; store i32 %1, ptr %5,
rule }, - δ(q, a) maps to the closure of the a-successor of q. - (All edge connects two nodes if they are simultaneously live (i.e. they align 4; %6 = load i32, ptr %4, align 4; %7 = icmp slt i32 %6, 0;
states are accepting! Except for the empty set, i.e. a bad sink.) are both live at some instruction). Ershov Number Let us label br i1 %7, label %11, label %8; 8: %9 = load i32, ptr %5, align 4;
Action Table for LR(k) Index grammar rules 1 ≤ j ≤ n and a binary expression tree as follows. - Leaves are labelled with 1. %10 = icmp slt i32 %9, 0; br i1 %10, label %11, label %12; 11:
states from the CFSM 0 ≤ i ≤ k. The table T maps each i and - Inner nodes with one child have the same label as their child. - store i32 -1, ptr %3, align 4; br label %33; 12: %13 = load i32,
look-ahead ` to a set of actions: - T (i, `) contains (a Reduce) j Inner nodes with two children have the maximal label amongst ptr %4, align 4; %14 = icmp eq i32 %13, 0; br i1 %14, label %15,
if state i has the item (A → α•, `) with A → α the j-th rule. - its children’s labels if they are different; 1+ the label of its chil- label %18; 15: %16 = load i32, ptr %5, align 4; %17 = add nsw
T (i, `) contains Shift if state i has an item (A → α1 • α2 , y) and dren if they are the same. Ershov Theorem the number of i32 %16, 1; store i32 %17, ptr %3, align 4; br label %33; 18: %19
` ∈ Firstk (α2 y). - T (i, ε) contains Accept if state i has an item the root corresponds to the minimal number of registers required = load i32, ptr %5, align 4; %20 = icmp eq i32 %19, 0; br i1 %20,
S → α•. - T (∅, `) contains an Error action only. GOOD CODE to evaluate the expression without using additional memory or label %21, label %25; 21: %22 = load i32, ptr %4, align 4; %23
usage and liveness algorithm go through the instructions in algebraic properties Sethi-Ullman algorithm Consider regis- = sub nsw i32 %22, 1; %24 = call i32 @ackermann(i32 noundef
the block from last to first. For every instruction i : x := y op z ters Rb , Rb+1 , . . . , Rb+k−1 for k the Ershov Number of the given %23, i32 noundef 1); store i32 %24, ptr %3, align 4; br label %33;
we do the following: 1. attach to i the usage and liveness in- node and b ≥ 1 a base number. We will leave the result of the 25: %26 = load i32, ptr %4, align 4; %27 = sub nsw i32 %26, 1;
formation of x, y, z 2. Set x to not live and no next use 3. Set computation in register Rb+k−1 . Algorithm for equal-children %28 = load i32, ptr %4, align 4; %29 = load i32, ptr %5, align
y, z to live and the next uses of y, z to i Basic block directed nodes (labelled k − 1 ) 1. Recursively generate code for the 4; %30 = sub nsw i32 %29, 1; %31 = call i32 @ackermann(i32
acyclic graph 1. There is a node in G for each initial value of right child using base b. The result of the right child appears in noundef %28, i32 noundef %30); %32 = call i32 @ackermann(i32
the variables in the block. 2. There is a node N associated to each Rb+k−1 . 2. Recursively generate code for the left child using base noundef %27, i32 noundef %31); store i32 %32, ptr %3, align
instruction in the block. Its children are the nodes corresponding b − 1; its result appears in Rb+k−2 . 3. Generate the instruction 4; br label %33; 33: %34 = load i32, ptr %3, align 4; ret i32
to instructions where the operands of i were last defined. Nodes Rb+k−1 = Rb+k−2 op Rb+k−1 . - Algorithm for different-children %34; } C int ackermann(int x, int y) { if (x < 0 || y < 0) return
with the same children are ”merged” 3. A node is labelled by nodes (labelled m < k ) 1. Recursively generate code for the -1; if (x == 0) return y+1; if (y == 0) return ackermann(x-1,
the operator applied at its corresponding instructions and with big child, using base b; the result appears in register Rb+k−1 . 1); return ackermann(x-1, ackermann(x, y-1)); array indexing
the list of variables for which it is the last definition within the 2. Recursively generate code for the small child, using base b; getelementptr inbounds i32, ptr %18, i64 1 (1 is the index here)
block. 4. Nodes with values that can be used by successor blocks the result appears in register Rb+m−1 . (Note Rb+m , . . . , Rb+k−1
are marked as live on exit. E.g. return instructions. Usages of are not used in the computation!) 3. Generate the instruction
DAG 1. elimination local common sub-expressions 2. dead-code Rb+k−1 = Rb+m−1 op Rb+k−1 (or vice versa if the right child is
elimination 3. instruction reordering 4. reorder operands of in- the small child). Expression evaluation with spilling Mod-
struction Register descriptors variable names whose current ifications to the Sethi Ullman algorithm If the input node has
value is in a register Address descriptors all locations where label k > r with r the number of registers: -Pick a ”big child”
the current value of a variable is stored Register allocation for with label at least r. - Recursively generate code for the big child
operands Ry given by getReg(x := y op z) can be chosen as using b = 1. The result will appear in register Rr . - Store the
follows. 1. If y is currently in a register r then Ry = r. 2. If value of Rr in memory. - Generate code for the little child: If
y is not in a register but the register r is currently empty then it has a label of at least r then use base b = 1. If the label is
Ry = r. 3. The remaining case is the difficult one. Let r be a j < r then pick b = r − j. Recursively apply this algorithm to
candidate register: 3.1 If all variables whose descriptor says their the little child and the result appears in Rr . - Load into Rr−1
value is in r also have another location then return r. 3.2 If the the value computed for the big child. - Issue the instruction to
only variable whose descriptor says their value is in r is x and compute into Rr the value of the node. MIPS ackermann: addi

You might also like