SYNTAX Analyzer

Syntax Analysis
Context-Free Grammars
• Accurate syntactic specifications of a

programming language.
• we can construct automatically an
efficient parser.
• Allows a language to be develop.
The Parser
The Parser
Three general types of parsers
Universal parsing methods:

• can parse any grammars
• too inefficient to use in production compilers
The Parser
Top-down methods:
• Parse-trees built from root to leaves.
• Input to parser scanned from left to right one symbol at a time
The Parser
Bottom-up methods:
• Start from leaves and work their way up to the root.
• Input to parser scanned from left to right one symbol at a time
Dealing With Errors
If compiler had to process only correct programs, its
design and implementation would be simplified greatly!
• Few languages have been designed with

error handling in mind.
• Error handling is left to compiler designer.
• Bugs caused about 50% of the total cost,
Common Programming Errors
• Lexical errors: misspellings of

identifiers, keywords, or operators
• Syntactic errors: misplaced
semicolons, extra or missing braces,
case without switch, … .
• Semantic errors: type
mismatches between operators
and operands
• Logical errors: anything else!
Wish List
• Report the occurrence of errors clearly

and accurately
• Recover from each error quickly enough
to detect subsequent errors
• Add minimal overhead to the processing
of correct programs
Easier said than done!

Error-Recovery Strategies
• Simplest: quit with an informative error
message when detecting the first error
• Panic-mode Recovery: discards input
symbols one at a time until a designated
synchronizing tokens is found.
• Phrase-level Recovery: perform local
correction on the remaining input. The
choice of local correction is left to the
compiler designer.
• Error Production: production rules for
common errors.
Context-Free Grammar
Terminals Nonterminals
(token name)
Example:
Start Productions
Symbol
Derivations
• Starting with start symbol

• At each step: a nonterminal replaced
with the body of a production
Example:
Deriving: -(id + id)

More on Derivations
Leftmost derivations, the leftmost nonterminal in each statement is always

chosen.
Rightmost derivations, the rightmost nonterminal in each statement is

always chosen.
Parse Trees
• What is the relationship between a

parse-tree and derivations?
– Parse tree is the graphical representation
of derivations
– Filters out order of nonterminal
replacement
– many-to-one relationship between
derivations and parse-tree
1 Expr Expr O
pxEpr
2  numbe
r
3  id
4 Op +
5  –
6  *
7  /
The Two Derivations for x – 2 * y
Rule Sentent
ialForm Rule Sentent
ialForm
— Expr — Expr
1 Expr O
pExpr 1 Expr O
pExpr
3 <id,x> OpExpr 3 Expr O
p<id,y>
5 <id,x> –Expr 6 Expr* <id,y>
1 <id,x> –ExprOpExpr 1 Expr O
pExpr* <id,y>
2> OpExpr
2 <id,x> –<num, 2 Expr O
p<num
,2>* <id,y>
2> *Expr
6 <id,x> –<num, 5 Expr–<num
,2>* <id,y>
3 <id,x> –<num,
2> *<id,y> 3 <id,x>–<num
,2>*<id,y>
Leftmost derivation Rightmost derivation

Derivations and Parse Trees
Leftmost derivation
Rule Sentent
ialForm
— Expr
1 Expr O
pExpr
3 <id,x> OpExpr E
5 <id,x> –Expr
1 <id,x> –ExprOpExpr
E Op E
2> OpExpr
2 <id,x> –<num,
2> *Expr
6 <id,x> –<num,
3 <id,x> –<num,
2> *<id,y>
x – Op E
E
This evaluates as x – ( 2 * y ) 2 y
*
Derivations and Parse Trees
Rightmost derivation
Rule Sentent
ialForm
— Expr
1 Expr O pE xpr
3 Expr O p<id,y> E
6 Expr* <id,y>
1 Expr O pE xpr* <id,y>
2 Expr O p<num ,2>* <id,y> E Op E
5 Expr–<num ,2>* <id,y>
3 <id,x>–<num ,2>*<id,y>
E Op E * y
This evaluates as ( x – 2 ) * y x – 2
Derivations and Precedence
These two derivations point out a problem with the grammar:
It has no idea of precedence, or implied order of evaluation
To add precedence
• Create a non-terminal for each level of precedence
• Isolate the corresponding part of the grammar
• Force the parser to recognize high precedence sub-expressions first
For algebraic expressions

• Multiplication and division, first (level one)
• Subtraction and addition, next (level two)
Adding the standard algebraic precedence
produces:
1 Go
al  Expr
level 2 Expr  Expr + Term
two 3 | Expr – Term
4 | Term
level 5 Term  Term*Factor
one 6 | Term/Factor
7 | Factor
8 Factor  number
9 | id
Rule SententialForm G
— Goal
1 Expr E
3 Expr – Term
5 Expr – Term*Factor
E – T
9 Expr –Term*<id,y>
T T * F
7 Expr –Factor*<id,y>
8 Expr –<num ,2> *<id,y> F F <id,y>
4 Term–<num ,2> *<id,y>
7 Factor –<num ,2> *<id,y> <id,x> <num,2>
9 <id,x> –<num, 2> *<id,y>
Its parse tree
The rightmost derivation
This produces x – ( 2 * y ), along with an appropriate parse tree.

Both the leftmost and rightmost derivations give the same expression, because
the grammar directly encodes the desired precedence.
Ambiguous Grammars
Our original expression grammar had other
Rule Sen
tenti
al Form
problems
1 ExprExprO
pExpr — Expr
2  nu
mber 1 ExprOpExpr
3  id
1 ExprOpExprOpExpr
4 Op + 3 <id,x>Op ExprOp Expr
5 –
5 <id,x>–ExprOpExpr
6 *
2 <id,x>–<num,2>Op Expr
7 /
6 <id,x>–<num,2>*Ex pr
3 <id,x>–<num,2>*<id,y>
choice different from

the first time
Two Leftmost Derivations for x – 2 * y
The Difference:
 Different productions chosen on the second step
Rule Sen
tenti
al Form Rule Sen
tenti
al Form
— Expr — Expr
1 ExprOpExpr 1 ExprOpExpr
3 <id,x>Op Expr 1 ExprOpExprOpExpr
5 <id,x>–Expr 3 <id,x>Op Expr Op Expr
1 <id,x>–ExprOpExpr 5 <id,x>–ExprOpExpr
2 <id,x>–<num ,2>Op Expr 2 <id,x>–<num ,2>Op Expr
6 <id,x>–<num ,2>*Ex pr 6 <id,x>–<num ,2>*Ex pr
3 <id,x>–<num ,2>*<id,y> 3 <id,x>–<num ,2>*<id,y>
Original choice New choice

Ambiguous Grammars
Definitions
• If a grammar has more than one leftmost derivation for a single
sentential form, the grammar is ambiguous
• If a grammar has more than one rightmost derivation for a single
sentential form, the grammar is ambiguous
• The leftmost and rightmost derivations for a sentential form may
differ, even in an unambiguous grammar
Classic example — the if-then-else problem
Stmt  if Expr then Stmt
| if Expr then Stmt else Stmt
| … other stmts …
Ambiguity
This sentential form has two derivations
if Expr1 then if Expr2 then Stmt1 else Stmt2
if if
E1 then else E1 then
if S2 if
E2 then E2 then else
S1 S1 S2
Context-Free Grammar Vs
Regular Expressions
• Grammars are more powerful notations than
regular expressions
– Every construct that can be described by a
regular
expression can be described by a grammar, but not
vice-versa
Regular expression -> NFA then:
(a|b)*abb
Question Worth Asking
If grammars are much powerful than regular
expressions, why not using them in lexical
analysis too?
• Lexical rules are quite simple and do not
need notation as powerful as grammars
• Regular expressions are more concise and
easier to understand for tokens
• More efficient lexical analyzers can be
generated from regular expressions than
from grammars

SYNTAX Analyzer

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SYNTAX Analyzer

Uploaded by

Copyright:

Available Formats

Syntax Analysis

• Accurate syntactic specifications of a

Three general types of parsers

Universal parsing methods:

Three general types of parsers

Three general types of parsers

• Few languages have been designed with

• Lexical errors: misspellings of

• Report the occurrence of errors clearly

Easier said than done!

• Starting with start symbol

Deriving: -(id + id)

Leftmost derivations, the leftmost nonterminal in each statement is always

Rightmost derivations, the rightmost nonterminal in each statement is

• What is the relationship between a

Leftmost derivation Rightmost derivation

For algebraic expressions

This produces x – ( 2 * y ), along with an appropriate parse tree.

choice different from

Original choice New choice

E1 then else E1 then

E2 then E2 then else

You might also like