Professional Documents
Culture Documents
Class 08 A
Class 08 A
de
Syntax analysis
1. Introduction
2. Parsing techniques and derivations
3. Verifying the language generated by a grammar
4. Preprocessing grammars
5. Problems with grammars
6. Top-down parsing
7. Automatic top-down parsing
8. Table-driven parser
9. Bottom-up parsing
10. Operator precedence
11. LR(0) parsing
12. From SLR to LR(k)
13. LR(1) parsers
14. LALR parsers
TECHNISCHE HOCHSCHULE DEGGENDORF
2
www.th-deg.de
Example: LL vs. LR
+
1 *
2 3
E → E + E | E * E | - E | (E) | id
Infix: 1 + (2 * 3)
Syntax analysis
1. Introduction
2. Parsing techniques and derivations
3. Verifying the language generated by a grammar
4. Preprocessing grammars
5. Problems with grammars
6. Top-down parsing
7. Automatic top-down parsing
8. Table-driven parser
9. Bottom-up parsing
10. Operator precedence
11. LR(0) parsing
12. From SLR to LR(k)
13. LR(1) parsers
14. LALR parsers
TECHNISCHE HOCHSCHULE DEGGENDORF
6
www.th-deg.de
Derivation
• View productions as rewriting rules
• Start with start symbol
• Each rewriting step replaces a nonterminal by the body of one of its productions (input
αAβAββ, production Aβ → γ)
– αAβAββ => αAβγβ – rewrite αAβAββ to αAβγβ in one step
– notation: αAβAββ =>k αAβγβ – rewrite in k steps;
αAβAββ =>* αAβγβ rewrite in finitely many steps
• Several nonterminals may be candidates
• Leftmost reduction: Use the leftmost nonterminal in each sentential (=>lm)
Derivation
• Example:
– E → E + E | E * E | - E | (E) | id
– "1+2*3":
– E =>lm E1 * E2 =>lm E3 + E4 * E2 =>lm 1 + E4 * E2 =>lm 1 + 2 * E2 =>lm 1 + 2 * 3
– E =>rm E1 + E2 =>rm E1 + E3 * E4 =>rm E1 + E3 * 3 =>rm E1 + 2 * 3 =>rm 1 + 2 * 3
E E
E1 * E2 E1 + E2
E3 + E4 3 1 E3 * E4
1 2 2 3
parse rest of
IR
tree frontend
• Tasks of a parser:
– Generate parse tree
– Create entries in symbol table
– Error handling:
• lexical level (wrong keywords,...)
• syntactic level (non-matching parenthesies,...)
• semantic level (wrong types,...)
• logic level (infinite loop,...)
TECHNISCHE HOCHSCHULE DEGGENDORF
10
www.th-deg.de
Error recovery
• Syntactic errors can be immediately detected:
– As soon as the token stream cannot be parsed further according to the
grammar
– Viable-prefix property: An error is detected as soon as a prefix of the
input has been read that cannot be completed to form a string in the
language
– LL and LR parsing methods have the viable-prefix property
• Some semantic errors can also be detected efficiently (type mismatch for
simpler type systems)
• Other semantic / logical errors cannot be detected (efficiently / at all)
• Goals:
– Report error clearly and accurately
– Recover from an error quickly enough to detect subsequent errors
– Add minimal overhead to the processing of correct programs
• Panic mode:
– Discard input symbols until a synchronizing token is found
– Synchronizing tokens: usually delimiters ("}", "{", ";", …)
– Advantage: Termination guaranteed
– Disadvantage: Possible removal of large code segments
• Phrase-level recovery:
– Perform local correction: Replace prefix of remaining input by
some string that allows recognition
– c.f. LaTeX "$"
– Disadvantage: May defer actual error detections