Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 29

Syntax Analysis

Context-Free Grammars

• Accurate syntactic specifications of a


programming language.
• we can construct automatically an
efficient parser.
• Allows a language to be develop.
The Parser
The Parser

Three general types of parsers

Universal parsing methods:


• can parse any grammars
• too inefficient to use in production compilers
The Parser

Three general types of parsers

Top-down methods:
• Parse-trees built from root to leaves.
• Input to parser scanned from left to right one symbol at a time
The Parser

Three general types of parsers

Bottom-up methods:
• Start from leaves and work their way up to the root.
• Input to parser scanned from left to right one symbol at a time
Dealing With Errors
If compiler had to process only correct programs, its
design and implementation would be simplified greatly!

• Few languages have been designed with


error handling in mind.
• Error handling is left to compiler designer.
• Bugs caused about 50% of the total cost,
Common Programming Errors

• Lexical errors: misspellings of


identifiers, keywords, or operators
• Syntactic errors: misplaced
semicolons, extra or missing braces,
case without switch, … .
• Semantic errors: type
mismatches between operators
and operands
• Logical errors: anything else!
Wish List

• Report the occurrence of errors clearly


and accurately
• Recover from each error quickly enough
to detect subsequent errors
• Add minimal overhead to the processing
of correct programs

Easier said than done!


Error-Recovery Strategies
• Simplest: quit with an informative error
message when detecting the first error
• Panic-mode Recovery: discards input
symbols one at a time until a designated
synchronizing tokens is found.
• Phrase-level Recovery: perform local
correction on the remaining input. The
choice of local correction is left to the
compiler designer.
• Error Production: production rules for
common errors.
Context-Free Grammar

Terminals Nonterminals
(token name)
Example:

Start Productions
Symbol
Derivations

• Starting with start symbol


• At each step: a nonterminal replaced
with the body of a production

Example:

Deriving: -(id + id)


More on Derivations

Leftmost derivations, the leftmost nonterminal in each statement is always


chosen.

Rightmost derivations, the rightmost nonterminal in each statement is


always chosen.
Parse Trees

• What is the relationship between a


parse-tree and derivations?
– Parse tree is the graphical representation
of derivations
– Filters out order of nonterminal
replacement
– many-to-one relationship between
derivations and parse-tree
1 Expr Expr O
pxEpr
2  numbe
r
3  id
4 Op +
5  –
6  *
7  /
The Two Derivations for x – 2 * y
Rule Sentent
ialForm Rule Sentent
ialForm
— Expr — Expr
1 Expr O
pExpr 1 Expr O
pExpr
3 <id,x> OpExpr 3 Expr O
p<id,y>
5 <id,x> –Expr 6 Expr* <id,y>
1 <id,x> –ExprOpExpr 1 Expr O
pExpr* <id,y>
2> OpExpr
2 <id,x> –<num, 2 Expr O
p<num
,2>* <id,y>
2> *Expr
6 <id,x> –<num, 5 Expr–<num
,2>* <id,y>
3 <id,x> –<num,
2> *<id,y> 3 <id,x>–<num
,2>*<id,y>

Leftmost derivation Rightmost derivation


Derivations and Parse Trees
Leftmost derivation
Rule Sentent
ialForm
— Expr
1 Expr O
pExpr
3 <id,x> OpExpr E
5 <id,x> –Expr
1 <id,x> –ExprOpExpr
E Op E
2> OpExpr
2 <id,x> –<num,
2> *Expr
6 <id,x> –<num,
3 <id,x> –<num,
2> *<id,y>
x – Op E
E

This evaluates as x – ( 2 * y ) 2 y
*
Derivations and Parse Trees
Rightmost derivation
Rule Sentent
ialForm
— Expr
1 Expr O pE xpr
3 Expr O p<id,y> E
6 Expr* <id,y>
1 Expr O pE xpr* <id,y>
2 Expr O p<num ,2>* <id,y> E Op E
5 Expr–<num ,2>* <id,y>
3 <id,x>–<num ,2>*<id,y>
E Op E * y

This evaluates as ( x – 2 ) * y x – 2
Derivations and Precedence
These two derivations point out a problem with the grammar:
It has no idea of precedence, or implied order of evaluation

To add precedence
• Create a non-terminal for each level of precedence
• Isolate the corresponding part of the grammar
• Force the parser to recognize high precedence sub-expressions first

For algebraic expressions


• Multiplication and division, first (level one)
• Subtraction and addition, next (level two)
Derivations and Precedence
Adding the standard algebraic precedence
produces:
1 Go
al  Expr
level 2 Expr  Expr + Term
two 3 | Expr – Term
4 | Term
level 5 Term  Term*Factor
one 6 | Term/Factor
7 | Factor
8 Factor  number
9 | id
Derivations and Precedence
Rule SententialForm G
— Goal
1 Expr E
3 Expr – Term
5 Expr – Term*Factor
E – T

9 Expr –Term*<id,y>
T T * F
7 Expr –Factor*<id,y>
8 Expr –<num ,2> *<id,y> F F <id,y>
4 Term–<num ,2> *<id,y>
7 Factor –<num ,2> *<id,y> <id,x> <num,2>
9 <id,x> –<num, 2> *<id,y>
Its parse tree
The rightmost derivation

This produces x – ( 2 * y ), along with an appropriate parse tree.


Both the leftmost and rightmost derivations give the same expression, because
the grammar directly encodes the desired precedence.
Ambiguous Grammars
Our original expression grammar had other
Rule Sen
tenti
al Form
problems
1 ExprExprO
pExpr — Expr
2  nu
mber 1 ExprOpExpr
3  id
1 ExprOpExprOpExpr
4 Op + 3 <id,x>Op ExprOp Expr
5 –
5 <id,x>–ExprOpExpr
6 *
2 <id,x>–<num,2>Op Expr
7 /
6 <id,x>–<num,2>*Ex pr
3 <id,x>–<num,2>*<id,y>

choice different from


the first time
Two Leftmost Derivations for x – 2 * y
The Difference:
 Different productions chosen on the second step

Rule Sen
tenti
al Form Rule Sen
tenti
al Form
— Expr — Expr
1 ExprOpExpr 1 ExprOpExpr
3 <id,x>Op Expr 1 ExprOpExprOpExpr
5 <id,x>–Expr 3 <id,x>Op Expr Op Expr
1 <id,x>–ExprOpExpr 5 <id,x>–ExprOpExpr
2 <id,x>–<num ,2>Op Expr 2 <id,x>–<num ,2>Op Expr
6 <id,x>–<num ,2>*Ex pr 6 <id,x>–<num ,2>*Ex pr
3 <id,x>–<num ,2>*<id,y> 3 <id,x>–<num ,2>*<id,y>

Original choice New choice


Ambiguous Grammars
Definitions
• If a grammar has more than one leftmost derivation for a single
sentential form, the grammar is ambiguous
• If a grammar has more than one rightmost derivation for a single
sentential form, the grammar is ambiguous
• The leftmost and rightmost derivations for a sentential form may
differ, even in an unambiguous grammar
Classic example — the if-then-else problem
Stmt  if Expr then Stmt
| if Expr then Stmt else Stmt
| … other stmts …
Ambiguity
This sentential form has two derivations
if Expr1 then if Expr2 then Stmt1 else Stmt2
if if

E1 then else E1 then

if S2 if

E2 then E2 then else

S1 S1 S2
Context-Free Grammar Vs
Regular Expressions
• Grammars are more powerful notations than
regular expressions
– Every construct that can be described by a
regular
expression can be described by a grammar, but not
vice-versa
Regular expression -> NFA then:
(a|b)*abb
Question Worth Asking
If grammars are much powerful than regular
expressions, why not using them in lexical
analysis too?
• Lexical rules are quite simple and do not
need notation as powerful as grammars
• Regular expressions are more concise and
easier to understand for tokens
• More efficient lexical analyzers can be
generated from regular expressions than
from grammars

You might also like