PART I: Overview Material: 2 Language Processors (Tombstone Diagrams, Bootstrapping) 3 Architecture of A Compiler

Course Overview
PART I: overview material

1 Introduction
2 Language processors (tombstone diagrams, bootstrapping)
3 Architecture of a compiler
PART II: inside a compiler
4 Syntax analysis
5 Contextual analysis
6 Runtime organization
7 Code generation
PART III: conclusion
8 Interpretation
9 Review
Syntax Analysis (Chapter 4) 1
Systematic Development of Rec. Descent Parser
(1) Express grammar in EBNF
(2) Grammar Transformations:
Left factorization and Left recursion elimination
(3) Create a parser class with
– private variable currentToken
– methods to call the scanner: accept and acceptIt
(4) Implement a public method for main function to call:
– public parse method that
• fetches the first token from the scanner
• calls parseS (where S is start symbol of the grammar)
• verifies that scanner next produces the end–of–file token
(5) Implement private parsing methods:
– add private parseN method for each non terminal N
Developing RD Parser for Mini Triangle
Before we begin:
• The following non-terminals are recognized by the scanner
• They will be returned as tokens by the scanner
Identifier := Letter (Letter|Digit)*
Integer-Literal ::= Digit Digit*
Operator ::= + | - | * | / | < | > | =
Comment ::= ! Graphic* eol
Assume scanner returns instances of this class:
public class Token {
byte kind; String spelling;
final static byte
IDENTIFIER = 0,
INTLITERAL = 1;
...
(1)&(2) Developing RD Parser for Mini Triangle
Program ::= single-Command

Command ::= single-Command
Left recursion elimination needed
| Command ; single-Command
single-Command Left factorization needed
::= V-name := Expression
| Identifier ( Expression )
| if Expression then single-Command
else single-Command
| while Expression do single-Command
| let Declaration in single-Command
| begin Command end
V-name ::= Identifier
...

(1)&(2) Express grammar in EBNF and transform
After factorization etc. we get:

Command ::= single-Command (; single-Command)*
single-Command
::= Identifier
( := Expression | ( Expression ) )
else single-Command
| while Expression do single-Command
| let Declaration in single-Command
| begin Command end
...

(1)&(2) Developing RD Parser for Mini Triangle
Expression Left recursion elimination
::= primary-Expression
needed
| Expression Operator primary-Expression
primary-Expression
::= Integer-Literal
| V-name
| Operator primary-Expression
| ( Expression )
Declaration Left recursion elimination
::= single-Declaration
needed
| Declaration ; single-Declaration
single-Declaration
::= const Identifier ~ Expression
| var Identifier : Type-denoter
Type-denoter ::= Identifier
(1)&(2) Express grammar in EBNF and transform
After factorization and recursion elimination :
Expression
::= primary-Expression
( Operator primary-Expression )*
primary-Expression
::= Integer-Literal
| Identifier
| Operator primary-Expression
| ( Expression )
Declaration
::= single-Declaration (; single-Declaration)*
single-Declaration
Type-denoter ::= Identifier
(3)&(4) Create a parser class and public parse method
public class Parser {
private Token currentToken;
private void accept (byte expectedKind) {
if (currentToken.kind == expectedKind)
currentToken = scanner.scan( );
else
report syntax error
}
private void acceptIt( ) {
currentToken = scanner.scan( );
}
public void parse( ) {
acceptIt( ); // get the first token
parseProgram( ); // Program is the start symbol
if (currentToken.kind != Token.EOT)
report syntax error
}
...
(5) Implement private parsing methods
private void parseProgram( ) {

parseSingleCommand( );
}

(5) Implement private parsing methods
single-Command
::= Identifier
else single-Command
| ... other alternatives ...
private void parseSingleCommand( ) {

switch (currentToken.kind) {
case Token.IDENTIFIER : ...
case Token.IF : ...
... other cases ...
default: report a syntax error
}
}

Algorithm to convert EBNF into a RD parser
• The conversion of an EBNF specification into a Java or C++
implementation for a recursive descent parser is so “mechanical”
that it could easily be automated (such tools exist, but we won’t
use them in this course)
• We can describe the algorithm by a set of mechanical rewrite
rules
N ::= 
private void parseN( ) {
parse  // as explained on next two slides
}

parse t where t is a terminal

accept(t);
parse N where N is a non-terminal

parseN( );
parse 
// a dummy statement
parse X Y
parse X
parse Y

parse X*
while (currentToken.kind is in starters[X]) {
parse X
}
parse X | Y
cases in starters[X]:
parse X
break;
cases in starters[Y]:
parse Y
break;
default:
if neither X nor Y generates  then report syntax error
}
Example: “Generation” of parseCommand
Command ::= single-Command ( ; single-Command )*
private void parseCommand( ) {

parse single-Command );( ; single-Command )*
parseSingleCommand(
}while
parse ((currentToken.kind==Token.SEMICOLON)
; single-Command )* {
} acceptIt(
parse ; single-Command
); // because SEMICOLON has just been checked
} parseSingleCommand(
parse single-Command );
}}
}

Example: Generation of parseSingleDeclaration
single-Declaration
private void parseSingleDeclaration( ) {

private
case Token.CONST:
void parseSingleDeclaration( ) {
switch
parseacceptIt(
(currentToken.kind)
const );
Identifier {
~ Expression
case
| parseIdentifier(
varToken.CONST:
Identifier : );
Type-denoter
parse const
acceptIt( ); Identifier ~ Expression
} accept(Token.IS);
parse
parseIdentifier(
Identifier ); );
parseExpression(
case Token.VAR:
parse
accept(Token.IS);
case ~
Token.VAR:
var Identifier : Type-denoter
parse Expression
parseExpression(
acceptIt(
default: ); syntax
report ); error
} case Token.VAR:);
parseIdentifier(
parse var Identifier : Type-denoter
} accept(Token.COLON);
default:
parseTypeDenoter(
report syntax);error
} default: report syntax error
}}
} Analysis (Chapter 4)
Syntax 16
LL 1 Grammars
• The presented algorithm to convert EBNF into a parser
does not work for all possible grammars.
• It only works for so called “LL 1” grammars.
• Basically, an LL 1 grammar is a grammar which can
be parsed with a top-down parser with a lookahead (in
the input stream of tokens) of one token.
• What grammars are LL 1?
How can we recognize that a grammar is (or is not) LL 1?
=> We can deduce the necessary conditions from the
parser generation algorithm.

LL 1 Grammars
parse X*
while (currentToken.kind is in starters[X]) {
parse X
} Condition: starters[X] must be
disjoint from the set of tokens that
parse X |Y can immediately follow X *
switch (currentToken.kind) { Conditions: starters[X] and starters[Y]
cases in starters[X]:
parse X
must be disjoint sets, and if either X
break; or Y generates  then must also be
cases in starters[Y]: disjoint from the set of tokens that can
parse Y immediately follow X | Y
break;
default: if neither X nor Y generates  then report syntax error
}

LL 1 grammars and left factorization
The original Mini-Triangle grammar is not LL 1:
For example:
single-Command
| ...
Starters[V-name := Expression]
= Starters[V-name] = Starters[Identifier]
Starters[Identifier ( Expression )]
= Starters[Identifier] NOT DISJOINT!
LL 1 grammars: left factorization
What happens when we generate a RD parser from a non LL 1 grammar?
single-Command
| ...
private void parseSingleCommand( ) {

switch (currentToken.kind) { wrong: overlapping
case Token.IDENTIFIER: cases
parse V-name := Expression
case Token.IDENTIFIER:
parse Identifier ( Expression )
...other cases...
default: report syntax error
}
}
LL 1 grammars: left factorization
single-Command
| ...
Left factorization (and substitution of V-name)
single-Command
::= Identifier
| ...

LL 1 Grammars: left recursion elimination

What happens if we don’t perform left-recursion elimination?
public void parseCommand( ) {
switch (currentToken.kind) { wrong: overlapping
case in starters[single-Command] cases
case in starters[Command]
parseCommand( );
accept(Token.SEMICOLON);
default: report syntax error
}
}

LL 1 Grammars: left recursion elimination

Left recursion elimination

Command
::= single-Command (; single-Command)*

Abstract Syntax Trees
• So far we have talked about how to build a recursive
descent parser which recognizes a given language
described by an (LL 1) EBNF grammar.
• Next we will look at
– how to represent AST as data structures.
– how to modify the parser to construct an AST data structure.
• We make heavy use of Object–Oriented Programming!
(classes, inheritance, dynamic method binding)

PART I: Overview Material: 2 Language Processors (Tombstone Diagrams, Bootstrapping) 3 Architecture of A Compiler

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PART I: Overview Material: 2 Language Processors (Tombstone Diagrams, Bootstrapping) 3 Architecture of A Compiler

Uploaded by

Copyright:

Available Formats

Course Overview

PART I: overview material

Program ::= single-Command

Syntax Analysis (Chapter 4) 4

After factorization etc. we get:

Syntax Analysis (Chapter 4) 5

private void parseProgram( ) {

Syntax Analysis (Chapter 4) 9

private void parseSingleCommand( ) {

Syntax Analysis (Chapter 4) 10

Syntax Analysis (Chapter 4) 12

parse t where t is a terminal

parse N where N is a non-terminal

Syntax Analysis (Chapter 4) 13

Command ::= single-Command ( ; single-Command )*

private void parseCommand( ) {

Syntax Analysis (Chapter 4) 15

private void parseSingleDeclaration( ) {

Syntax Analysis (Chapter 4) 17

Syntax Analysis (Chapter 4) 18

The original Mini-Triangle grammar is not LL 1:

private void parseSingleCommand( ) {

Left factorization (and substitution of V-name)

Syntax Analysis (Chapter 4) 21

Command ::= single-Command

Syntax Analysis (Chapter 4) 22

Command ::= single-Command

Left recursion elimination

Syntax Analysis (Chapter 4) 23

Syntax Analysis (Chapter 4) 24

You might also like