Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 25

Chapter 3

Describing Syntax
Analysis
Program execution

1-2
Introduction

• Syntax: the form or structure of the


expressions, statements, and program
units
• Semantics: the meaning of the expressions,
statements, and program units
• Syntax and semantics provide a language’s
definition
– Users of a language definition
• Other language designers
• Implementers
• Programmers (the users of the language)

1-3
Chomsky hierarchy

According to Chomsky hierarchy, grammar is


divided into 4 types as follows:

•Type 0 is known as unrestricted grammar.


•Type 1 is known as context-sensitive
grammar.
•Type 2 is known as a context-free grammar.
•Type 3 Regular Grammar.

1-4
Chomsky hierarchy

1-5
BNF and Context-Free Grammars

• Context-Free Grammars
– Developed by Noam Chomsky in the mid-1950s
– Language generators, meant to describe the
syntax of natural languages
– Define a class of languages called context-free
languages

• Backus-Naur Form (1959)


– Invented by John Backus to describe the syntax
of Algol 58
– BNF is equivalent to context-free grammars
1-6
BNF Fundamentals (continued)

• Nonterminals are often enclosed in angle brackets

– Examples of BNF rules:


<ident_list> → identifier | identifier, <ident_list>
<if_stmt> → if <logic_expr> then <stmt>

• Grammar: a finite non-empty set of rules

• A start symbol is a special element of the


nonterminals of a grammar

1-7
Describing Lists

• Syntactic lists are described using


recursion
<ident_list>  ident
| ident, <ident_list>

• A derivation is a repeated application of


rules, starting with the start symbol and
ending with a sentence (all terminal
symbols)

1-8
An Example Grammar

<program>  <stmts>
<stmts>  <stmt> | <stmt> ; <stmts>
<stmt>  <var> = <expr>
<var>  a | b | c | d
<expr>  <term> + <term> | <term> - <term>
<term>  <var> | const
<program> => <stmts> => <stmt>
=> <var> = <expr>
=> a = <expr>
=> a = <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
=> a = b + const
1-9
Recursive Grammars

1) S->SaS
S->b
The language(set of strings) generated by the above
grammar is :{b, bab, babab,…}, which is infinite.

2) S-> Aa
A->Ab|c
The language generated by the above grammar is :{ca,
cba, cbba …}, which is infinite.

Note: A recursive context-free grammar that contains no


useless rules necessarily produces an infinite language.
1-10
Non-Recursive Grammars

S->Aa
A->b|c
The language generated by the above grammar is :{ba,
ca}, which is finite.

Types of Recursive Grammars


Based on the nature of the recursion in a recursive
grammar, a recursive CFG can be again divided into the
following:
Left Recursive Grammar (having left Recursion)
Right Recursive Grammar (having right Recursion)
General Recursive Grammar(having general Recursion)

1-11
Parse Tree
• A hierarchical representation of a derivation

<program>

<stmts>

<stmt>

<var> = <expr>

a <term> + <term>

<var> const

b
1-12
An Ambiguous Expression Grammar

<expr>  <expr> <op> <expr> | const


<op>  / | -

<expr> <expr>

<expr> <op> <expr> <expr> <op> <expr>

<expr> <op> <expr> <expr> <op> <expr>

const - const / const const - const / const

1-13
An Unambiguous Expression Grammar

• If we use the parse tree to indicate


precedence levels of the operators, we
cannot have ambiguity
<expr>  <expr> - <term> | <term>
<term>  <term> / const| const

<expr>

<expr> - <term>

<term> <term> / const

const const
1-14
Unambiguous Grammar for Selector

• if-then-else grammar
<if_stmt> -> if (<logic_expr>) <stmt>
| if (<logic_expr>) <stmt> else <stmt>
Ambiguous!
- An unambiguous grammar for if-then-else

<stmt> -> <matched> | <unmatched>


<matched> -> if (<logic_expr>) <stmt>
| a non-if statement
<unmatched> -> if (<logic_expr>) <stmt>
| if (<logic_expr>) <matched> else
<unmatched>

1-15
Removal of Ambiguity in Grammar

S->aSbS | bSaS | ∈ S -> AB


A -> Aa | a
B -> b

1-16
Removal of Ambiguity in Grammar

We can remove ambiguity solely on the basis of the following two


properties –
1. Precedence – 
If different operators are used, we will consider the precedence of
the operators. The three important characteristics are :
•The level at which the production is present denotes the priority
of the operator used.
•The production at higher levels will have operators with less
priority. In the parse tree, the nodes which are at top levels or close
to the root node will contain the lower priority operators.
•The production at lower levels will have operators with higher
priority. In the parse tree, the nodes which are at lower levels or
close to the leaf nodes will contain the higher priority operators.

1-17
Removal of Ambiguity in Grammar

2. Associativity –
If the same precedence operators are in production, then we will
have to consider the associativity.
•If the associativity is left to right, then we have to prompt a left
recursion in the production. The parse tree will also be left
recursive and grow on the left side. 
+, -, *, / are left associative operators.
•If the associativity is right to left, then we have to prompt the right
recursion in the productions. The parse tree will also be right
recursive and grow on the right side. 
^ is a right associative operator.

1-18
Removal of Ambiguity in Grammar

Example 1 – Consider the ambiguous grammar


E -> E-E | id

The language in the grammar will contain { id, id-id, id-id-id, ….}
If we want to derive the string id-id-id. Let’s consider a single
value of id=3 to get more insights. The result should be :

3-3-3 =-3 

Since the same priority operators, we need to consider


associativity which is left to right.

1-19
Parse Tree

1-20
Removal of Ambiguity in Grammar

So, to make the above grammar unambiguous, simply make the


grammar Left Recursive by replacing the left most non-terminal
E in the right side of the production with another random
variable, say P.

E -> E – P | P
P -> id

1-21
Another Operator

Similarly, the unambiguous grammar for the


expression : 2^3^2 will be –

E -> P ^ E | P // Right Recursive as ^ is right associative.

P -> id

1-22
Task

Consider the grammar shown below, which has two


different operators :

E -> E + E | E * E | id

Clearly, the above grammar is ambiguous as we can


draw two parse trees for the string “id+id*id” as
shown below. Consider the expression :

3+2*5 // “*” has more priority than “+”


The correct answer is : (3+(2*5))=13
1-23
Extended BNF

• Optional parts are placed in brackets [ ]


<proc_call> -> ident [(<expr_list>)]
• Alternative parts of RHSs are placed
inside parentheses and separated via
vertical bars
<term> → <term> (+|-) const
• Repetitions (0 or more) are placed inside
braces { }
<ident> → letter {letter|digit}

1-24
BNF and EBNF
• BNF
<expr>  <expr> + <term>
| <expr> - <term>
| <term>
<term>  <term> * <factor>
| <term> / <factor>
| <factor>
• EBNF
<expr>  <term> {(+ | -) <term>}
<term>  <factor> {(* | /) <factor>}

1-25

You might also like