Kashif Sharif School of Computer Science & Technology

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 62

Kashif Sharif

School of Computer Science & Technology


 Parser: The big picture
 Grammar introduction
 Top-down parsing
 Bottom-up parsing
 Yacc/Bison/ANTLR

Slides adapted from Perkins@UW, Krolic@McGill, Weixing@BIT


Compiler P & D - Sharif K 2
Character Stream
Scanner(Lexical analysis)
Token Stream
Parser(Syntax analysis)
Parse Tree
Semantic analysis &
Intermediate code gen.
Intermediate form
Machine-independent
optimization
Intermediate form
Target code generation
Machine language

Compiler P & D - Sharif K 3


Character Stream
Scanner(Lexical analysis)
Token Stream
Parser(Syntax analysis)
Parse Tree
Semantic analysis &
Intermediate code gen.
Intermediate form
Machine-independent
Syntax Analysis: Discovery of Program Structure
optimization
Intermediate form
Turn the stream of individual input tokens intogeneration
Target code a complete,
Machine
hierarchical language of the program(or compilation unit).
representation

Compiler P & D - Sharif K 4


 Syntax Specification and Parsing

Syntax Specification Syntax Recognition


How can we succinctly How can a compiler discover
describe the structure of legal if a program conforms to the
programs specification

Context-free Grammars LL and LR Parsers

Compiler P & D - Sharif K 5


 Takes a string of tokens generated by the
scanner as input and builds a parse tree using a
grammar.
 Can be generated by bison (or yacc), CUP,
ANTLR, SableCC, Beaver, JavaCC, ...
 Internally it corresponds to a deterministic push-
down automaton.

Compiler P & D - Sharif K 6


 Why push-down automaton?
▪ Regular languages (equivalently regular exps/ DFAs /
NFAs) are not sufficiently powerful
▪ A pushdown automaton
▪ Is an FSM + an unbounded stack
▪ Has a stack that can be viewed/manipulated by transitions
▪ Is used to recognize a context-free language

Compiler P & D - Sharif K 7


Compiler P & D - Sharif K 8
 A context-free grammar is a notation for
describing languages.

 It is more powerful than finite automata or


RE’s, but still cannot define all possible
languages.

 Useful for nested structures, e.g.,


parentheses in programming languages.
Compiler P & D - Sharif K 9
 Basic idea is to use “variables” to stand
for sets of strings (i.e., languages).

 These variables are defined recursively, in


terms of one another.

 Recursive rules (“productions”) involve


only concatenation.

 Alternative rules for a variable allow union.


Compiler P & D - Sharif K 10
 Productions:
S → 01
S → 0S1

Basis: 01 is in the language.


Induction: if w is in the language, then so is
0w1.

Compiler P & D - Sharif K 11


Compiler P & D - Sharif K 12
 Terminals = symbols of the alphabet of
the language being defined.

 Variables = nonterminals = a finite set of


other symbols, each of which represents a
language.

 Start symbol = the variable whose


language is the one being defined.
Compiler P & D - Sharif K 13
A production has the form:
variable → string of variables & terminals
Convention:
▪ A, B, C,… are variables.
▪ a, b, c,… are terminals.
▪ …, X, Y, Z are either terminals or variables.
▪ …, w, x, y, z are strings of terminals only.
▪ , , ,… are strings of terminals and/or
variables.

Compiler P & D - Sharif K 14


 Here is a formal CFG for { 0n1n | n > 1}.
 Terminals = {0, 1}.
 Variables = {S}.
 Start symbol = S.
 Productions:
S → 01
S → 0S1

Compiler P & D - Sharif K 15


 We derive strings in the language of a
CFG by starting with the start symbol, and
repeatedly replacing some variable A by
the right side of one of its productions.
▪ That is, the “productions for A” are those that
have A on the left side of the →.

Compiler P & D - Sharif K 16


Example: S → 01; S → 0S1.
S => 0S1 => 00S11 => 000111.

 We now have 2 symbols:


▪→ for productions
▪ => for derivations

Compiler P & D - Sharif K 17


Compiler P & D - Sharif K 18
Compiler P & D - Sharif K 19
Compiler P & D - Sharif K 20
Compiler P & D - Sharif K 21
Compiler P & D - Sharif K 22
 Derivations allow us to replace any of the
variables in a string.
 Leads to many different derivations of the
same string.
 By forcing the leftmost variable (or
alternatively, the rightmost variable) to be
replaced, we avoid these “distinctions
without a difference.”

Compiler P & D - Sharif K 23


=> =>
=> =>
=> =>
=> =>
=> =>
=> =>

Compiler P & D - Sharif K 24


 Balanced-parentheses grammar:
S → SS | (S) | ()
 S =>lm SS =>lm (S)S =>lm (())S =>lm (())()

 Thus, S =>*lm (())()

 S => SS => S() => (S)() => (())() is a derivation,


but not a left most derivation.

Compiler P & D - Sharif K 25


 Balanced-parentheses grammar:
S → SS | (S) | ()
 S =>rm SS =>rm S() =>rm (S)() =>rm (())()

 Thus, S =>*rm (())()

 S => SS => SSS => S()S => ()()S => ()()()


▪ The above derivation is neither left most, not right
most.

Compiler P & D - Sharif K 26


=>
=>
=>
=>
=>
=>
=>
=>
=>

Compiler P & D - Sharif K 27


Compiler P & D - Sharif K 28
Compiler P & D - Sharif K 29
Compiler P & D - Sharif K 30
Compiler P & D - Sharif K 31
 Grammars for programming languages
are often written in BNF (Backus-Naur
Form).
 Variables are words in < >
Example: <statement>

 Terminals are often multicharacter strings


indicated by boldface or underline;
Example: while or WHILE.
Compiler P & D - Sharif K 32
 Symbol ::= is often used for →
 Symbol | is used for “or”
▪ A shorthand for a list of productions with the
same left side.

Example: S → 0S1 | 01 is shorthand for


S → 0S1 and S → 01.

Compiler P & D - Sharif K 33


Compiler P & D - Sharif K 34
Compiler P & D - Sharif K 35
Compiler P & D - Sharif K 36
 Surround one or more symbols by [ ] to
make them optional.

Example: <statement> ::= if <condition>


then <statement> [else <statement>]

Translation: replace [] by a new variable A


with productions A →  | ε.

Compiler P & D - Sharif K 37


Compiler P & D - Sharif K 38
Compiler P & D - Sharif K 39
Compiler P & D - Sharif K 40
Compiler P & D - Sharif K 41
Compiler P & D - Sharif K 42
Compiler P & D - Sharif K 43
Compiler P & D - Sharif K 44
Compiler P & D - Sharif K 45
Compiler P & D - Sharif K 46
Compiler P & D - Sharif K 47
Compiler P & D - Sharif K 48
Compiler P & D - Sharif K 49
A grammar is ambiguous if the input can have more than one parse trees.

Compiler P & D - Sharif K 50


 Ambiguous grammars can have potentially
severe consequences.
 Not all context-free languages have
unambiguous grammar
 DFA require an unambiguous grammar
 We either rewrite the grammar to be
unambiguous, or use precedence rules to
resolve ambiguities.

Compiler P & D - Sharif K 51


Compiler P & D - Sharif K 52
Compiler P & D - Sharif K 53
Compiler P & D - Sharif K 54
Compiler P & D - Sharif K 55
Compiler P & D - Sharif K 56
Compiler P & D - Sharif K 57
Compiler P & D - Sharif K 58
Compiler P & D - Sharif K 59
Compiler P & D - Sharif K 60
Compiler P & D - Sharif K 61
Compiler P & D - Sharif K 62

You might also like