Professional Documents
Culture Documents
Syntax Analyzer
Syntax Analyzer
Week 03
1
Overview
2
Syntax Analysis Overview
Goal – determine if the input token stream
satisfies the syntax of the program
What do we need to do this?
An expressive way to describe the syntax
A mechanism that determines if the input token
stream satisfies the syntax description
For lexical analysis
Regular expressions describe tokens
Finite automata = mechanisms to generate tokens
from input stream
3
Syntax Analysis Overview
Methods commonly used in compilers to
create parsers for grammars are classified as
Top down
Build parse trees from top (root) to the bottom (leaves)
Bottom up
Starts from leaves and work up to the roots
Both methods work only on subclasses of
Grammar, but several of these subclasses,
are expressive enough to describe most
syntactic constructs in programing
languages.
4
How it works..
5
Example
6
Example (cont)
if (x == y) { a=1; }
7
Syntax Tree
8
Example (cont)
9
Syntax Rules
Just like Lexical Analyzer, Syntax Analyzer
also requires some rules to check the
structure of the program.
We use CFG to check the structure of the
program or to apply Syntax rules.
As we know how to write CFG, if we can
understand how can we implement CFG then
we can better understand Parsing process.
10
Problems in CFG
11
Implementation of CFG
12
Implementation of CFG
13
Top Down Parsers
Gets their name because of the way they
construct the tree.
Uses Left Most Derivation to construct tree.
Most suited for hand-written Parsers.
LL-Parsers.
Ambiguity, Left Recursion, First-First Conflict
create problem for these Grammars.
14
Bottom Up Parsers
15
THE ROLE OF THE PARSER
symbol
table
16
Where is Syntax Analysis
Performed? if (b == 0) a = b;
if ( b == 0 ) a = b ;
if
abstract syntax tree
== = or parse tree
b 0 a b
17
Parsing Analogy
• Syntax analysis for natural languages
• Recognize whether a sentence is grammatically correct
• Identify the function of each word
sentence
article noun
“I gave him the book”
the book
18
Syntax Error Handling
19
Syntax Error Handling
Programs can contain errors at many different
levels. For e.g.:
Lexical: Such as misspelling an identifier,
keyword, or operator
Syntactic: Such as an arithmetic expression
with unbalanced parentheses
Semantic: Such as an operator applied to an
incompatible operand
Logical: Such as an infinitely recursive call
20
Syntax Error Handling
21
Syntax Error Handling
23
How should an error handler reports the
presence of an error?
The place in the source program where an
error is detected because there is a good
chance that the actual error occurred
within the previous few tokens.
A common strategy employed by many
compilers is to print the offending line with
a pointer to the position at which an error
is detected.
24
How should an error handler reports the
presence of an error?
If there is reasonable likelihood of what the
error actually is, an informative,
understandable diagnostic message is
also included, e.g., “semicolon missing at
this position”.
25
Formal Method for Describing Syntax
29
Context Free Grammar
32
Example of a grammar in CFG
In a grammar for a complete programing
language, the start symbol represents a
complete program and is usually named
<program>
<program> → begin <stmt_list> end
<stmt_list> → <stmt> | <stmt>; <stmt_list>
<stmt> → <var> = <expression>
<var> → A | B | C | D
<expression> → <var> + <var> | <var> -
<var> | <var>
33
Notational Conventions
These symbols are non terminals:
1. Upper-case letters early in the alphabet such as
A, B, C
2. The letter S, which, when it appears, is usually
the start symbol.
3. Lower case italic names such as expr or stmt
E E O E | (E) | - E | id
A+|-|*|/|^
35
Grammars and Derivation
A grammar is a generative device for
defining language.
The sentences of the language are
generated through a sequence of
applications of the rules, beginning with a
special nonterminal of the grammar called
the start symbol.
A sentence generation is called a
derivation.
36
Derivation
37
Grammars and Derivation
The process of generating a sentence
begin A = B – C end
Derivation: <program> (start symbol)
=> begin <stmt_list> end
=> begin <stmt> end
=> begin <var> = <expression> end
=> begin A = <expression> end
=> begin A = <var> - <var> end
=> begin A = B - <var> end
=> begin A = B - C end
38
Grammars and Derivation
Leftmost derivation:
the replaced non-terminal is always the leftmost
non-terminal
Rightmost derivation:
the replaced non-terminal is always the rightmost
non-terminal
Sentential forms
Each string in the derivation, including
<program>
39
Grammars and Derivation
When we construct a derivation, there are
two choices at each step: which nonterminal
to expand, and which production to use for
the given nonterminal. We will sometimes be
concerned with a leftmost derivation, which
eliminates that first degree of freedom at
each step. We can write, if we need to be
clear or emphasize this, S =(lm)*> a. a is then
called a left-sentential form of the grammar.
40
Parse Tree and Derivation
41
Parse Tree and Derivation
42
Parse Tree and Derivations
A = B * (A + C)
<assign>
<id> = <expr>
A = <expr>
A = <id> * <expr>
A = B * <expr>
A = B * ( <expr> )
A = B * ( <id> + <expr> )
A = B * ( A + <expr> )
A = B * ( A + <id> )
A = B * ( A + C )
43
44
Ambiguity in Grammar
Grammar is ambiguous if there are multiple
derivations (therefore multiple parse trees) for
a single string
45
Ambiguity Example
Two parse trees for 2-1+1
Tree corresponding
Tree corresponding
to 2-<1+1>
to <2-1>+1
Start Start
Expr Expr
Expr Op Expr
Expr Op Expr
-
+
Int
Int
Expr Op Expr 1 2 Expr Op Expr
- +
Int Int Int Int
2 1 1 1
46
Eliminating Ambiguity
Solution: hack the grammar
Expr Expr
49
Draw the two parse trees. Which one
conforms to the usual interpretation made by
languages? ("match else with the most recent
unmatched if")
50
Stat Two Parse Trees
if Stat
Expr
Stat e2 s1 s2
e2 s1
51
Eliminating Ambiguity
52
A Closer Look at Eliminating
Ambiguity
Precedence enforced by
Introduce distinct non-terminals for each
precedence level
Operators for a given precedence level are
specified as RHS for the production
Higher precedence operators are accessed by
referencing the next-higher precedence non-
terminal
53
Operator Precedence
A=B+C*A
How to force “*” to have higher precedence
over “+”?
add more non-terminal symbols
Observe that higher precedent operator
reside at “deeper” levels of the trees
54
Operator Precedence
A=B+C*A
Before:
<assign> → <id> +<expr>
<id> → A | B | C | D
<expr> → <expr>+<expr>
| <expr> * <expr>
| ( <expr> )
| <id>
55
Operator Precedence
After:
<assign> → <id> + <expr>
<id> → A | B | C | D
<expr> → <expr> +<term>
| <term>
<term> → <term> *<factor>
| <factor>
<factor> → ( <expr> )
| <id>
56
Operator Precedence
A=B+C*A
57
Associativity
An operator is either left, right or non
associative
Left: a + b + c = (a + b) + c
Right: a ^ b ^ c = a ^ (b ^ c)
Non: a < b < c is illegal (thus undefined)
Position of the recursion relative to the
operator dictates the associativity
Left (right) recursion left (right)
associativity
58
Associativity of Operators
A=B+C–D*F/G
Left-associative
Operators of the same precedence evaluated from
left to right C++/Java: +, -, *, /, %
Right-associative
Operators of the same precedence evaluated from
right to left C++/Java: unary -, unary +, ! (logical
negation)
How to enforce operator associativity using BNF?
59
Associativity of Operators
60
Associativity of Operators
61
Eliminating Left Recursion
62
Eliminating Left Recursion
A -> Aa | b will be
65