Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 23

Course Overview

PART I: overview material


1 Introduction
2 Language processors (tombstone diagrams, bootstrapping)
3 Architecture of a compiler
PART II: inside a compiler
4 Syntax analysis
5 Contextual analysis
6 Runtime organization
7 Code generation
PART III: conclusion
8 Interpretation
9 Review
Syntax Analysis (Chapter 4) 1
Systematic Development of Rec. Descent Parser
(1) Express grammar in EBNF
(2) Grammar Transformations:
Left factorization and Left recursion elimination
(3) Create a parser class with
– private variable currentToken
– methods to call the scanner: accept and acceptIt
(4) Implement a public method for main function to call:
– public parse method that
• fetches the first token from the scanner
• calls parseS (where S is start symbol of the grammar)
• verifies that scanner next produces the end–of–file token
(5) Implement private parsing methods:
– add private parseN method for each non terminal N
Syntax Analysis (Chapter 4) 2
Developing RD Parser for Mini Triangle
Before we begin:
• The following non-terminals are recognized by the scanner
• They will be returned as tokens by the scanner
Identifier := Letter (Letter|Digit)*
Integer-Literal ::= Digit Digit*
Operator ::= + | - | * | / | < | > | =
Comment ::= ! Graphic* eol
Assume scanner returns instances of this class:
public class Token {
byte kind; String spelling;
final static byte
IDENTIFIER = 0,
INTLITERAL = 1;
...
Syntax Analysis (Chapter 4) 3
(1)&(2) Developing RD Parser for Mini Triangle

Program ::= single-Command


Command ::= single-Command
Left recursion elimination needed
| Command ; single-Command
single-Command Left factorization needed
::= V-name := Expression
| Identifier ( Expression )
| if Expression then single-Command
else single-Command
| while Expression do single-Command
| let Declaration in single-Command
| begin Command end
V-name ::= Identifier
...

Syntax Analysis (Chapter 4) 4


(1)&(2) Express grammar in EBNF and transform

After factorization etc. we get:


Program ::= single-Command
Command ::= single-Command (; single-Command)*
single-Command
::= Identifier
( := Expression | ( Expression ) )
| if Expression then single-Command
else single-Command
| while Expression do single-Command
| let Declaration in single-Command
| begin Command end
V-name ::= Identifier
...

Syntax Analysis (Chapter 4) 5


(1)&(2) Developing RD Parser for Mini Triangle
Expression Left recursion elimination
::= primary-Expression
needed
| Expression Operator primary-Expression
primary-Expression
::= Integer-Literal
| V-name
| Operator primary-Expression
| ( Expression )
Declaration Left recursion elimination
::= single-Declaration
needed
| Declaration ; single-Declaration
single-Declaration
::= const Identifier ~ Expression
| var Identifier : Type-denoter
Type-denoter ::= Identifier
Syntax Analysis (Chapter 4) 6
(1)&(2) Express grammar in EBNF and transform
After factorization and recursion elimination :
Expression
::= primary-Expression
( Operator primary-Expression )*
primary-Expression
::= Integer-Literal
| Identifier
| Operator primary-Expression
| ( Expression )
Declaration
::= single-Declaration (; single-Declaration)*
single-Declaration
::= const Identifier ~ Expression
| var Identifier : Type-denoter
Type-denoter ::= Identifier
Syntax Analysis (Chapter 4) 7
(3)&(4) Create a parser class and public parse method
public class Parser {
private Token currentToken;
private void accept (byte expectedKind) {
if (currentToken.kind == expectedKind)
currentToken = scanner.scan( );
else
report syntax error
}
private void acceptIt( ) {
currentToken = scanner.scan( );
}
public void parse( ) {
acceptIt( ); // get the first token
parseProgram( ); // Program is the start symbol
if (currentToken.kind != Token.EOT)
report syntax error
}
...
Syntax Analysis (Chapter 4) 8
(5) Implement private parsing methods
Program ::= single-Command

private void parseProgram( ) {


parseSingleCommand( );
}

Syntax Analysis (Chapter 4) 9


(5) Implement private parsing methods
single-Command
::= Identifier
( := Expression | ( Expression ) )
| if Expression then single-Command
else single-Command
| ... other alternatives ...

private void parseSingleCommand( ) {


switch (currentToken.kind) {
case Token.IDENTIFIER : ...
case Token.IF : ...
... other cases ...
default: report a syntax error
}
}

Syntax Analysis (Chapter 4) 10


Algorithm to convert EBNF into a RD parser
• The conversion of an EBNF specification into a Java or C++
implementation for a recursive descent parser is so “mechanical”
that it could easily be automated (such tools exist, but we won’t
use them in this course)
• We can describe the algorithm by a set of mechanical rewrite
rules
N ::= 
private void parseN( ) {
parse  // as explained on next two slides
}

Syntax Analysis (Chapter 4) 12


Algorithm to convert EBNF into a RD parser

parse t where t is a terminal


accept(t);

parse N where N is a non-terminal


parseN( );

parse 
// a dummy statement

parse X Y

parse X
parse Y

Syntax Analysis (Chapter 4) 13


Algorithm to convert EBNF into a RD parser
parse X*
while (currentToken.kind is in starters[X]) {
parse X
}

parse X | Y
switch (currentToken.kind) {
cases in starters[X]:
parse X
break;
cases in starters[Y]:
parse Y
break;
default:
if neither X nor Y generates  then report syntax error
}
Syntax Analysis (Chapter 4) 14
Example: “Generation” of parseCommand

Command ::= single-Command ( ; single-Command )*

private void parseCommand( ) {


parse single-Command );( ; single-Command )*
parseSingleCommand(
}while
parse ((currentToken.kind==Token.SEMICOLON)
; single-Command )* {
} acceptIt(
parse ; single-Command
); // because SEMICOLON has just been checked
} parseSingleCommand(
parse single-Command );
}}
}

Syntax Analysis (Chapter 4) 15


Example: Generation of parseSingleDeclaration
single-Declaration
::= const Identifier ~ Expression
| var Identifier : Type-denoter

private void parseSingleDeclaration( ) {


switch (currentToken.kind) {
private
case Token.CONST:
void parseSingleDeclaration( ) {
switch
parseacceptIt(
(currentToken.kind)
const );
Identifier {
~ Expression
case
| parseIdentifier(
varToken.CONST:
Identifier : );
Type-denoter
parse const
acceptIt( ); Identifier ~ Expression
} accept(Token.IS);
parse
parseIdentifier(
Identifier ); );
parseExpression(
case Token.VAR:
parse
accept(Token.IS);
case ~
Token.VAR:
var Identifier : Type-denoter
parse Expression
parseExpression(
acceptIt(
default: ); syntax
report ); error
} case Token.VAR:);
parseIdentifier(
parse var Identifier : Type-denoter
} accept(Token.COLON);
default:
parseTypeDenoter(
report syntax);error
} default: report syntax error
}}
} Analysis (Chapter 4)
Syntax 16
LL 1 Grammars
• The presented algorithm to convert EBNF into a parser
does not work for all possible grammars.
• It only works for so called “LL 1” grammars.
• Basically, an LL 1 grammar is a grammar which can
be parsed with a top-down parser with a lookahead (in
the input stream of tokens) of one token.
• What grammars are LL 1?
How can we recognize that a grammar is (or is not) LL 1?
=> We can deduce the necessary conditions from the
parser generation algorithm.

Syntax Analysis (Chapter 4) 17


LL 1 Grammars
parse X*
while (currentToken.kind is in starters[X]) {
parse X
} Condition: starters[X] must be
disjoint from the set of tokens that
parse X |Y can immediately follow X *
switch (currentToken.kind) { Conditions: starters[X] and starters[Y]
cases in starters[X]:
parse X
must be disjoint sets, and if either X
break; or Y generates  then must also be
cases in starters[Y]: disjoint from the set of tokens that can
parse Y immediately follow X | Y
break;
default: if neither X nor Y generates  then report syntax error
}

Syntax Analysis (Chapter 4) 18


LL 1 grammars and left factorization

The original Mini-Triangle grammar is not LL 1:

For example:
single-Command
::= V-name := Expression
| Identifier ( Expression )
| ...
V-name ::= Identifier

Starters[V-name := Expression]
= Starters[V-name] = Starters[Identifier]
Starters[Identifier ( Expression )]
= Starters[Identifier] NOT DISJOINT!
Syntax Analysis (Chapter 4) 19
LL 1 grammars: left factorization
What happens when we generate a RD parser from a non LL 1 grammar?

single-Command
::= V-name := Expression
| Identifier ( Expression )
| ...

private void parseSingleCommand( ) {


switch (currentToken.kind) { wrong: overlapping
case Token.IDENTIFIER: cases
parse V-name := Expression
case Token.IDENTIFIER:
parse Identifier ( Expression )
...other cases...
default: report syntax error
}
}
Syntax Analysis (Chapter 4) 20
LL 1 grammars: left factorization

single-Command
::= V-name := Expression
| Identifier ( Expression )
| ...

Left factorization (and substitution of V-name)

single-Command
::= Identifier
( := Expression | ( Expression ) )
| ...

Syntax Analysis (Chapter 4) 21


LL 1 Grammars: left recursion elimination

Command ::= single-Command


| Command ; single-Command
What happens if we don’t perform left-recursion elimination?
public void parseCommand( ) {
switch (currentToken.kind) { wrong: overlapping
case in starters[single-Command] cases
parseSingleCommand( );
case in starters[Command]
parseCommand( );
accept(Token.SEMICOLON);
parseSingleCommand( );
default: report syntax error
}
}

Syntax Analysis (Chapter 4) 22


LL 1 Grammars: left recursion elimination

Command ::= single-Command


| Command ; single-Command

Left recursion elimination


Command
::= single-Command (; single-Command)*

Syntax Analysis (Chapter 4) 23


Abstract Syntax Trees
• So far we have talked about how to build a recursive
descent parser which recognizes a given language
described by an (LL 1) EBNF grammar.
• Next we will look at
– how to represent AST as data structures.
– how to modify the parser to construct an AST data structure.
• We make heavy use of Object–Oriented Programming!
(classes, inheritance, dynamic method binding)

Syntax Analysis (Chapter 4) 24

You might also like