This document discusses parsing in compilers. It begins with an overview of parsing and introduces context-free grammars which are used to describe the structure of programming languages. It then discusses top-down and bottom-up parsing approaches as well as tools like Yacc and Bison that are used to generate parsers. The document also covers grammar notation and ambiguity as well as leftmost and rightmost derivations during parsing.
This document discusses parsing in compilers. It begins with an overview of parsing and introduces context-free grammars which are used to describe the structure of programming languages. It then discusses top-down and bottom-up parsing approaches as well as tools like Yacc and Bison that are used to generate parsers. The document also covers grammar notation and ambiguity as well as leftmost and rightmost derivations during parsing.
This document discusses parsing in compilers. It begins with an overview of parsing and introduces context-free grammars which are used to describe the structure of programming languages. It then discusses top-down and bottom-up parsing approaches as well as tools like Yacc and Bison that are used to generate parsers. The document also covers grammar notation and ambiguity as well as leftmost and rightmost derivations during parsing.
This document discusses parsing in compilers. It begins with an overview of parsing and introduces context-free grammars which are used to describe the structure of programming languages. It then discusses top-down and bottom-up parsing approaches as well as tools like Yacc and Bison that are used to generate parsers. The document also covers grammar notation and ambiguity as well as leftmost and rightmost derivations during parsing.
Parser: The big picture Grammar introduction Top-down parsing Bottom-up parsing Yacc/Bison/ANTLR
Slides adapted from Perkins@UW, Krolic@McGill, Weixing@BIT
Compiler P & D - Sharif K 2 Character Stream Scanner(Lexical analysis) Token Stream Parser(Syntax analysis) Parse Tree Semantic analysis & Intermediate code gen. Intermediate form Machine-independent optimization Intermediate form Target code generation Machine language
Compiler P & D - Sharif K 3
Character Stream Scanner(Lexical analysis) Token Stream Parser(Syntax analysis) Parse Tree Semantic analysis & Intermediate code gen. Intermediate form Machine-independent Syntax Analysis: Discovery of Program Structure optimization Intermediate form Turn the stream of individual input tokens intogeneration Target code a complete, Machine hierarchical language of the program(or compilation unit). representation
Compiler P & D - Sharif K 4
Syntax Specification and Parsing
Syntax Specification Syntax Recognition
How can we succinctly How can a compiler discover describe the structure of legal if a program conforms to the programs specification
Context-free Grammars LL and LR Parsers
Compiler P & D - Sharif K 5
Takes a string of tokens generated by the scanner as input and builds a parse tree using a grammar. Can be generated by bison (or yacc), CUP, ANTLR, SableCC, Beaver, JavaCC, ... Internally it corresponds to a deterministic push- down automaton.
Compiler P & D - Sharif K 6
Why push-down automaton? ▪ Regular languages (equivalently regular exps/ DFAs / NFAs) are not sufficiently powerful ▪ A pushdown automaton ▪ Is an FSM + an unbounded stack ▪ Has a stack that can be viewed/manipulated by transitions ▪ Is used to recognize a context-free language
Compiler P & D - Sharif K 7
Compiler P & D - Sharif K 8 A context-free grammar is a notation for describing languages.
It is more powerful than finite automata or
RE’s, but still cannot define all possible languages.
Useful for nested structures, e.g.,
parentheses in programming languages. Compiler P & D - Sharif K 9 Basic idea is to use “variables” to stand for sets of strings (i.e., languages).
These variables are defined recursively, in
terms of one another.
Recursive rules (“productions”) involve
only concatenation.
Alternative rules for a variable allow union.
Compiler P & D - Sharif K 10 Productions: S → 01 S → 0S1
Basis: 01 is in the language.
Induction: if w is in the language, then so is 0w1.
Compiler P & D - Sharif K 11
Compiler P & D - Sharif K 12 Terminals = symbols of the alphabet of the language being defined.
Variables = nonterminals = a finite set of
other symbols, each of which represents a language.
Start symbol = the variable whose
language is the one being defined. Compiler P & D - Sharif K 13 A production has the form: variable → string of variables & terminals Convention: ▪ A, B, C,… are variables. ▪ a, b, c,… are terminals. ▪ …, X, Y, Z are either terminals or variables. ▪ …, w, x, y, z are strings of terminals only. ▪ , , ,… are strings of terminals and/or variables.
Compiler P & D - Sharif K 14
Here is a formal CFG for { 0n1n | n > 1}. Terminals = {0, 1}. Variables = {S}. Start symbol = S. Productions: S → 01 S → 0S1
Compiler P & D - Sharif K 15
We derive strings in the language of a CFG by starting with the start symbol, and repeatedly replacing some variable A by the right side of one of its productions. ▪ That is, the “productions for A” are those that have A on the left side of the →.
Compiler P & D - Sharif K 16
Example: S → 01; S → 0S1. S => 0S1 => 00S11 => 000111.
We now have 2 symbols:
▪→ for productions ▪ => for derivations
Compiler P & D - Sharif K 17
Compiler P & D - Sharif K 18 Compiler P & D - Sharif K 19 Compiler P & D - Sharif K 20 Compiler P & D - Sharif K 21 Compiler P & D - Sharif K 22 Derivations allow us to replace any of the variables in a string. Leads to many different derivations of the same string. By forcing the leftmost variable (or alternatively, the rightmost variable) to be replaced, we avoid these “distinctions without a difference.”
Compiler P & D - Sharif K 23
=> => => => => => => => => => => =>
Compiler P & D - Sharif K 24
Balanced-parentheses grammar: S → SS | (S) | () S =>lm SS =>lm (S)S =>lm (())S =>lm (())()
Thus, S =>*lm (())()
S => SS => S() => (S)() => (())() is a derivation,
but not a left most derivation.
Compiler P & D - Sharif K 25
Balanced-parentheses grammar: S → SS | (S) | () S =>rm SS =>rm S() =>rm (S)() =>rm (())()
Thus, S =>*rm (())()
S => SS => SSS => S()S => ()()S => ()()()
▪ The above derivation is neither left most, not right most.
Compiler P & D - Sharif K 26
=> => => => => => => => =>
Compiler P & D - Sharif K 27
Compiler P & D - Sharif K 28 Compiler P & D - Sharif K 29 Compiler P & D - Sharif K 30 Compiler P & D - Sharif K 31 Grammars for programming languages are often written in BNF (Backus-Naur Form). Variables are words in < > Example: <statement>
Terminals are often multicharacter strings
indicated by boldface or underline; Example: while or WHILE. Compiler P & D - Sharif K 32 Symbol ::= is often used for → Symbol | is used for “or” ▪ A shorthand for a list of productions with the same left side.
Example: S → 0S1 | 01 is shorthand for
S → 0S1 and S → 01.
Compiler P & D - Sharif K 33
Compiler P & D - Sharif K 34 Compiler P & D - Sharif K 35 Compiler P & D - Sharif K 36 Surround one or more symbols by [ ] to make them optional.
Example: <statement> ::= if <condition>
then <statement> [else <statement>]
Translation: replace [] by a new variable A
with productions A → | ε.
Compiler P & D - Sharif K 37
Compiler P & D - Sharif K 38 Compiler P & D - Sharif K 39 Compiler P & D - Sharif K 40 Compiler P & D - Sharif K 41 Compiler P & D - Sharif K 42 Compiler P & D - Sharif K 43 Compiler P & D - Sharif K 44 Compiler P & D - Sharif K 45 Compiler P & D - Sharif K 46 Compiler P & D - Sharif K 47 Compiler P & D - Sharif K 48 Compiler P & D - Sharif K 49 A grammar is ambiguous if the input can have more than one parse trees.
Compiler P & D - Sharif K 50
Ambiguous grammars can have potentially severe consequences. Not all context-free languages have unambiguous grammar DFA require an unambiguous grammar We either rewrite the grammar to be unambiguous, or use precedence rules to resolve ambiguities.
Compiler P & D - Sharif K 51
Compiler P & D - Sharif K 52 Compiler P & D - Sharif K 53 Compiler P & D - Sharif K 54 Compiler P & D - Sharif K 55 Compiler P & D - Sharif K 56 Compiler P & D - Sharif K 57 Compiler P & D - Sharif K 58 Compiler P & D - Sharif K 59 Compiler P & D - Sharif K 60 Compiler P & D - Sharif K 61 Compiler P & D - Sharif K 62