Class 08 A

www.th-deg.
de
Syntax analysis
1. Introduction
2. Parsing techniques and derivations
3. Verifying the language generated by a grammar
4. Preprocessing grammars
5. Problems with grammars
6. Top-down parsing
7. Automatic top-down parsing
8. Table-driven parser
9. Bottom-up parsing
10. Operator precedence
11. LR(0) parsing
12. From SLR to LR(k)
13. LR(1) parsers
14. LALR parsers
TECHNISCHE HOCHSCHULE DEGGENDORF
2
www.th-deg.de
Syntax analysis – introduction

• Programming languages need precise rules describing the syntax
structure (ambiguous grammars may lead to unexpected results):
– The most famous soldiers? Private property, major mistake, and
kernel panic!
["major" etc. may be adjective or part of a name – ignoring case]
– Who's General Failure, and why's he reading my hard disk?
– I knew a woman who owned a taser, man was she stunning!
[stunning: adjective or verb?]
– Did you hear about the crime that happened in a parking garage? It
was wrong on so many levels.
["wrong": adjective or a noun?]
– I wondered why the baseball was getting bigger. Then it hit me.
["it" may be referring to "ball" or may be a syntactic expletive]
• In some cases, ambiguity is necessary or desirable => additional rules!
• For some grammars, automatic parser generation is possible
3
www.th-deg.de
Principles of parsing techniques

• Universal:
– Can parse any grammar
– Inefficient
• Bottom-up:
– LR parser (left-to-right, rightmost derivation)
– Hard to understand, automatic generation possible
• Top-down:
– LL Parser (left-to-right, leftmost derivation)
– Easier to understand, hand coding possible

4
www.th-deg.de
Example: LL vs. LR
+
1 *
2 3
E → E + E | E * E | - E | (E) | id
Infix: 1 + (2 * 3)
LL = Prefix (polish): +(1, *(2, 3))
LR = Postifx (reverse polish): (1, (2, 3)*) +

5
www.th-deg.de
Syntax analysis
1. Introduction
2. Parsing techniques and derivations
3. Verifying the language generated by a grammar
4. Preprocessing grammars
5. Problems with grammars
6. Top-down parsing
7. Automatic top-down parsing
8. Table-driven parser
9. Bottom-up parsing
10. Operator precedence
11. LR(0) parsing
12. From SLR to LR(k)
13. LR(1) parsers
14. LALR parsers
6
www.th-deg.de
Derivation
• View productions as rewriting rules
• Start with start symbol
• Each rewriting step replaces a nonterminal by the body of one of its productions (input
αAβAββ, production Aβ → γ)
– αAβAββ => αAβγβ – rewrite αAβAββ to αAβγβ in one step
– notation: αAβAββ =>k αAβγβ – rewrite in k steps;
αAβAββ =>* αAβγβ rewrite in finitely many steps
• Several nonterminals may be candidates
• Leftmost reduction: Use the leftmost nonterminal in each sentential (=>lm)
• Rightmost reduction: Use the rightmost nonterminal in each sentential (=>rm)

• If (for start symbol S of L) S =>* αAβ:
– αAβ is a sentential form
– a sentence of L is a sentential form w/o nonterminals
– αAβ is a left/right-sentential form, if S =>lm* α / S =>rm* α

7
www.th-deg.de
Derivation and parse tree

• Parse tree: graphical representation of derivation removing info
about order of production application
• Leaves of the parse tree: sentential form, the yield or frontier
• A grammar that produces more than 1 parse tree for some
sentence is called ambiguous
• The grammar
E → E + E | E * E | - E | (E) | id
is ambiguous

8
www.th-deg.de
Derivation
• Example:
– E → E + E | E * E | - E | (E) | id
– "1+2*3":
– E =>lm E1 * E2 =>lm E3 + E4 * E2 =>lm 1 + E4 * E2 =>lm 1 + 2 * E2 =>lm 1 + 2 * 3
– E =>rm E1 + E2 =>rm E1 + E3 * E4 =>rm E1 + E3 * 3 =>rm E1 + 2 * 3 =>rm 1 + 2 * 3
E E
E1 * E2 E1 + E2
E3 + E4 3 1 E3 * E4
1 2 2 3

9
www.th-deg.de
Principles of parsing techniques

symbol
table
source lexical getNextToken()

token parser
program analyzer
parse rest of
IR
tree frontend
• Tasks of a parser:
– Generate parse tree
– Create entries in symbol table
– Error handling:
• lexical level (wrong keywords,...)
• syntactic level (non-matching parenthesies,...)
• semantic level (wrong types,...)
• logic level (infinite loop,...)
10
www.th-deg.de
Error recovery
• Syntactic errors can be immediately detected:
– As soon as the token stream cannot be parsed further according to the
grammar
– Viable-prefix property: An error is detected as soon as a prefix of the
input has been read that cannot be completed to form a string in the
language
– LL and LR parsing methods have the viable-prefix property
• Some semantic errors can also be detected efficiently (type mismatch for
simpler type systems)
• Other semantic / logical errors cannot be detected (efficiently / at all)
• Goals:
– Report error clearly and accurately
– Recover from an error quickly enough to detect subsequent errors
– Add minimal overhead to the processing of correct programs

11
www.th-deg.de
Error recovery strategies
• Panic mode:
– Discard input symbols until a synchronizing token is found
– Synchronizing tokens: usually delimiters ("}", "{", ";", …)
– Advantage: Termination guaranteed
– Disadvantage: Possible removal of large code segments
• Phrase-level recovery:
– Perform local correction: Replace prefix of remaining input by
some string that allows recognition
– c.f. LaTeX "$"
– Disadvantage: May defer actual error detections

12
www.th-deg.de
Error recovery strategies

• Error productions:
– Common errors seen as part of the "correct" language
– Rules included into grammar
– These error productions do not produce code, but an
appropriate error message
• Global corrections:
– For input x, grammar G, find a string yєL(G)L(G) that differs from x
in the least number of insertions/deletions/changes of tokens
– Disadvantage: Very complex, "corrected" program not
necessarily what the programmer intended

13

Class 08 A

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Class 08 A

Uploaded by

Copyright:

Available Formats

www.th-deg.

Syntax analysis – introduction

Principles of parsing techniques

TECHNISCHE HOCHSCHULE DEGGENDORF

LL = Prefix (polish): +(1, *(2, 3))

LR = Postifx (reverse polish): (1, (2, 3)*) +

• Rightmost reduction: Use the rightmost nonterminal in each sentential (=>rm)

TECHNISCHE HOCHSCHULE DEGGENDORF

Derivation and parse tree

TECHNISCHE HOCHSCHULE DEGGENDORF

TECHNISCHE HOCHSCHULE DEGGENDORF

Principles of parsing techniques

source lexical getNextToken()

TECHNISCHE HOCHSCHULE DEGGENDORF

Error recovery strategies

TECHNISCHE HOCHSCHULE DEGGENDORF

Error recovery strategies

TECHNISCHE HOCHSCHULE DEGGENDORF

You might also like