Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

www.th-deg.

de

Syntax analysis
1. Introduction
2. Parsing techniques and derivations
3. Verifying the language generated by a grammar
4. Preprocessing grammars
5. Problems with grammars
6. Top-down parsing
7. Automatic top-down parsing
8. Table-driven parser
9. Bottom-up parsing
10. Operator precedence
11. LR(0) parsing
12. From SLR to LR(k)
13. LR(1) parsers
14. LALR parsers
TECHNISCHE HOCHSCHULE DEGGENDORF
2
www.th-deg.de

Syntax analysis – introduction


• Programming languages need precise rules describing the syntax
structure (ambiguous grammars may lead to unexpected results):
– The most famous soldiers? Private property, major mistake, and
kernel panic!
["major" etc. may be adjective or part of a name – ignoring case]
– Who's General Failure, and why's he reading my hard disk?
– I knew a woman who owned a taser, man was she stunning!
[stunning: adjective or verb?]
– Did you hear about the crime that happened in a parking garage? It
was wrong on so many levels.
["wrong": adjective or a noun?]
– I wondered why the baseball was getting bigger. Then it hit me.
["it" may be referring to "ball" or may be a syntactic expletive]
• In some cases, ambiguity is necessary or desirable => additional rules!
• For some grammars, automatic parser generation is possible
TECHNISCHE HOCHSCHULE DEGGENDORF
3
www.th-deg.de

Principles of parsing techniques


• Universal:
– Can parse any grammar
– Inefficient
• Bottom-up:
– LR parser (left-to-right, rightmost derivation)
– Hard to understand, automatic generation possible
• Top-down:
– LL Parser (left-to-right, leftmost derivation)
– Easier to understand, hand coding possible

TECHNISCHE HOCHSCHULE DEGGENDORF


4
www.th-deg.de

Example: LL vs. LR
+

1 *

2 3

E → E + E | E * E | - E | (E) | id

Infix: 1 + (2 * 3)

LL = Prefix (polish): +(1, *(2, 3))

LR = Postifx (reverse polish): (1, (2, 3)*) +


TECHNISCHE HOCHSCHULE DEGGENDORF
5
www.th-deg.de

Syntax analysis
1. Introduction
2. Parsing techniques and derivations
3. Verifying the language generated by a grammar
4. Preprocessing grammars
5. Problems with grammars
6. Top-down parsing
7. Automatic top-down parsing
8. Table-driven parser
9. Bottom-up parsing
10. Operator precedence
11. LR(0) parsing
12. From SLR to LR(k)
13. LR(1) parsers
14. LALR parsers
TECHNISCHE HOCHSCHULE DEGGENDORF
6
www.th-deg.de

Derivation
• View productions as rewriting rules
• Start with start symbol
• Each rewriting step replaces a nonterminal by the body of one of its productions (input
αAβAββ, production Aβ → γ)
– αAβAββ => αAβγβ – rewrite αAβAββ to αAβγβ in one step
– notation: αAβAββ =>k αAβγβ – rewrite in k steps;
αAβAββ =>* αAβγβ rewrite in finitely many steps
• Several nonterminals may be candidates
• Leftmost reduction: Use the leftmost nonterminal in each sentential (=>lm)

• Rightmost reduction: Use the rightmost nonterminal in each sentential (=>rm)


• If (for start symbol S of L) S =>* αAβ:
– αAβ is a sentential form
– a sentence of L is a sentential form w/o nonterminals
– αAβ is a left/right-sentential form, if S =>lm* α / S =>rm* α

TECHNISCHE HOCHSCHULE DEGGENDORF


7
www.th-deg.de

Derivation and parse tree


• Parse tree: graphical representation of derivation removing info
about order of production application
• Leaves of the parse tree: sentential form, the yield or frontier
• A grammar that produces more than 1 parse tree for some
sentence is called ambiguous
• The grammar
E → E + E | E * E | - E | (E) | id
is ambiguous

TECHNISCHE HOCHSCHULE DEGGENDORF


8
www.th-deg.de

Derivation
• Example:
– E → E + E | E * E | - E | (E) | id
– "1+2*3":
– E =>lm E1 * E2 =>lm E3 + E4 * E2 =>lm 1 + E4 * E2 =>lm 1 + 2 * E2 =>lm 1 + 2 * 3
– E =>rm E1 + E2 =>rm E1 + E3 * E4 =>rm E1 + E3 * 3 =>rm E1 + 2 * 3 =>rm 1 + 2 * 3

E E

E1 * E2 E1 + E2

E3 + E4 3 1 E3 * E4

1 2 2 3

TECHNISCHE HOCHSCHULE DEGGENDORF


9
www.th-deg.de

Principles of parsing techniques


symbol
table

source lexical getNextToken()


token parser
program analyzer

parse rest of
IR
tree frontend
• Tasks of a parser:
– Generate parse tree
– Create entries in symbol table
– Error handling:
• lexical level (wrong keywords,...)
• syntactic level (non-matching parenthesies,...)
• semantic level (wrong types,...)
• logic level (infinite loop,...)
TECHNISCHE HOCHSCHULE DEGGENDORF
10
www.th-deg.de

Error recovery
• Syntactic errors can be immediately detected:
– As soon as the token stream cannot be parsed further according to the
grammar
– Viable-prefix property: An error is detected as soon as a prefix of the
input has been read that cannot be completed to form a string in the
language
– LL and LR parsing methods have the viable-prefix property
• Some semantic errors can also be detected efficiently (type mismatch for
simpler type systems)
• Other semantic / logical errors cannot be detected (efficiently / at all)
• Goals:
– Report error clearly and accurately
– Recover from an error quickly enough to detect subsequent errors
– Add minimal overhead to the processing of correct programs

TECHNISCHE HOCHSCHULE DEGGENDORF


11
www.th-deg.de

Error recovery strategies

• Panic mode:
– Discard input symbols until a synchronizing token is found
– Synchronizing tokens: usually delimiters ("}", "{", ";", …)
– Advantage: Termination guaranteed
– Disadvantage: Possible removal of large code segments
• Phrase-level recovery:
– Perform local correction: Replace prefix of remaining input by
some string that allows recognition
– c.f. LaTeX "$"
– Disadvantage: May defer actual error detections

TECHNISCHE HOCHSCHULE DEGGENDORF


12
www.th-deg.de

Error recovery strategies


• Error productions:
– Common errors seen as part of the "correct" language
– Rules included into grammar
– These error productions do not produce code, but an
appropriate error message
• Global corrections:
– For input x, grammar G, find a string yєL(G)L(G) that differs from x
in the least number of insertions/deletions/changes of tokens
– Disadvantage: Very complex, "corrected" program not
necessarily what the programmer intended

TECHNISCHE HOCHSCHULE DEGGENDORF


13

You might also like