Cs 160

CS 160
Translation of Programming Languages
Discussion 2
William Eiers & Seemanta Saha
Project 2
• Build scanner & recursive-descent parser in C++
• Goal is to become familiar with top-down parsing
• Requires grammar modification
• Due in one week on Monday, October 16th

Project 2 Steps
• Step 1: Modify grammar so that it is predictive
• Step 2: Implement scanner
• Step 3: Implement recursive-descent parser
• Step 4: Implement evaluation of expressions

Step 1 : Modifying
Grammar
Language Grammar
Start → ExprList
ExprList → ExprList ; Expression

| Expression
Expression Grammar
Expression → Expression + Expression

| Expression - Expression
| Expression * Expression
| Expression / Expression
| Expression % Expression
| Expression mod Expression
| ( Expression )
| num
Grammar Issues
● The grammar has several issues which will
complicate top-down parsing:
○ Ambiguous
○ Not LL(1) / Predictive
○ No operator precedence
Grammar Techniques
● Techniques for modifying grammar:
○ Precedence introduction
○ Left-recursion elimination
○ Left factoring
Precedence Introduction
• Parent nodes in parse tree require information
from child nodes for processing
• For example: an addition requires values of left

and right hand sides so that it can add them
• This means that nodes deeper in the parse tree

are processed earliest
• Higher precedence operators should be deeper

Precedence Introduction
Consider the input:
1 + 2 * 3
Compare the possible parse trees for the above using

both of the following grammars for expressions:
Expression →
Expression + Expression
Expression →
| Term
Expression + Expression
| Expression * Expression
Term →
| num
Term * Term
| num
Operator Precedence
• () is the highest
• Then mod, *, / (same precedence)
• +, - are the lowest (and same)

Top-down Recursive
Descent Parsing
• Goal is to build the parse tree in a top-down fashion
• Start at the root of the parse tree from the start
symbol and grow towards leaves
• Pick a production and try to match the input
• Caveat: top-down parsers cannot handle
left-recursive grammars and must backtrack on ‘bad’
production picks
Why do we need Left
Recursion Elimination for
Recursive Descent
Parser?
Need : Left Recursion Elimination
Expression → Expression + Term

function Expression()
{
Expression(); match('+'); Term();
}
• would fall into infinite recursion when executed

(typically a bad property for parsers to have)
Left Recursion
Elimination
| num
becomes
Expression → num Expression’
Expression’ → + num Expression’

| ε
Why do we need Left
Factoring?
Need : Left Factoring

• This grammar is non deterministic.

• We can not predict which grammar rule to choose
• We may choose a wrong one and need to backtrack
• We factor out the common prefix
Left Factoring
becomes
Expression → Expression Expression’
Expression’ → + Expression
| - Expression
Step 2 : Scanner
• Reads input from standard input stream (STDIN)
• Identifies tokens and handles line numbers
• Make tokens available and useful to parser
• nextToken function, returns next token
• eatToken function, consumes a given token
Scanner Example
Scanner Tips
● Whitespace is ignored between tokens
● Whitespace includes space, tab, newline, etc.
● Whitespace is not allowed inside a single token
● Easiest to examine one character at a time
○ Use a switch-case statement
● nextToken and eatToken should be consistent. Be
careful when fetching a token and consuming a token.
Step 3: Recursive
Descent Parser
• Each non-terminal has corresponding function
• Each terminal in rule translates to call to eatToken
• Applying a rule turns into a sequence of calls
• Decide which rule to apply using FIRST/FOLLOW

Recursive Descent
Example
• Consider the following nonterminal and rule:
E → T + T * F
Recursive Descent Example
• This might translate to something like:

E() {
T()
Eat plus token
T()
Eat multiplication token
F()
}
• Consider the following nonterminal and rules:
E → number
| ( E )
• This might translate to something like:

E():
Switch on Next Token:
case Number:
Eat number token
case Open Parenthesis:
Eat open parenthesis token
E()
Eat close parenthesis token
Choosing Rules
● If there are multiple rules, how does the parser
choose?
○ Parser can use a single token of lookahead
○ One lookahead is what makes the parser LL(1)
● The solution is FIRST and FOLLOW sets

Using FIRST Sets
● For a string of grammar symbols , define FIRST( ) as the set of
tokens that appear as the first symbol in some string that
derives from
● Check if the lookahead is in any of the FIRST sets of the rule

choices
○ If yes, the parser must choose that rule to apply
○ If no, then that means there is a parse error
● What if the token is in multiple FIRST sets?
○ This means that the grammar is not predictive

Epsilon Rules
• The FIRST set method does not work for epsilon
• The FIRST set of epsilon is just epsilon, but that is

not a token that can come in as the lookahead
• This is where FOLLOW sets are necessary
• The first token we see when applying epsilon

must be from the FOLLOW set
N.B. You may have different error messages without
applying epsilon rules
Epsilon Rules
• To apply epsilon the lookahead must be in the
FOLLOW set
• Otherwise it is a parse error as before
• Applying an epsilon rule means making no calls
• Can be as simple as an empty case statement

Epsilon Rule Example
• What lookahead tokens can we see when we

want to apply the epsilon in the below grammar?
S→ E
E → FE’
E’→ +FE’ | -FE’ | ε
F → m | (E) | num
Let’s compute First and
Follow sets for the
grammar from the
previous page
Step 4: Expression
Evaluation
● Activated with -e command line flag
● Should output correct values for each print
● Evaluation must have correct precedence and

associativity
● All the operators for given grammar are left
associative
Expression Evaluation
• Correctly handling evaluation may require:
• Passing information into child rules
• Returning information to parent rules
• This can be implemented as function parameters

and return values
• You can also use stack

Error Handling
● Four types of errors:
○ outOfBoundsError only thrown when
evaluating expressions (-e flag)
○ scanError thrown for bad characters in
input
○ mismatchError thrown when parser
expects a token that is not present in the input
○ parseError thrown when parser encounters
an invalid token
Error Handling
● Use only provided error functions to throw
errors
○ This allows us to grade your project accurately
● All error functions require a line number
● Some error functions require more information
○ For example, scanError takes the bad

character
Out of Bounds Error
• This error is only thrown when evaluating
• Can be thrown by either scanner or parser
• Any time when an input integer is out of range
• Any time when a calculation overflows for any

component of a vector
• The range is INT_MIN to INT_MAX

Scan Error
• Thrown by the scanner
• Should be thrown for any character that is not

part of a valid token
• Should also be thrown for incorrect characters

inside the print token
• For example, “pri nt” throws a scan error
• Note that print token is case insensitive

Mismatch Error
• Probably will be thrown by scanner
• Most likely inside the eatToken function
• Means that the parser expected a token which

was not found
• May be thrown too often if FOLLOW sets are not

used correctly
Parse Error
• Thrown by parser
• Usually when the parser encounters lookahead

token that does not match any rule choice
• Remember that epsilon rules may throw this

error
Testing
• Test cases are provided along with code
• Run program on all test cases with `make test`
• All the test cases are in test/test.rb file
• You need to install Ruby to run the test.rb file

Grading
• Grading will be mostly based on provided test
cases, you should write your own test cases and
check
• More test cases with expected output is higher

score
• Running tests with given test.rb will give you a

strong idea of your grade before submission

Cs 160

Uploaded by

Copyright:

Available Formats

You might also like

Cs 160

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cs 160

Uploaded by

Copyright:

Available Formats

CS 160

Translation of Programming Languages

• Goal is to become familiar with top-down parsing

• Requires grammar modification

• Due in one week on Monday, October 16th

• Step 2: Implement scanner

• Step 3: Implement recursive-descent parser

• Step 4: Implement evaluation of expressions

ExprList → ExprList ; Expression

Expression → Expression + Expression

○ Not LL(1) / Predictive

• For example: an addition requires values of left

• This means that nodes deeper in the parse tree

• Higher precedence operators should be deeper

Compare the possible parse trees for the above using

• Then mod, *, / (same precedence)

• +, - are the lowest (and same)

Expression → Expression + Term

• would fall into infinite recursion when executed

Expression → num Expression’

Expression’ → + num Expression’

Expression → Expression + Expression

• This grammar is non deterministic.

Expression → Expression Expression’

• Each terminal in rule translates to call to eatToken

• Applying a rule turns into a sequence of calls

• Decide which rule to apply using FIRST/FOLLOW

• This might translate to something like:

• Consider the following nonterminal and rules:

• This might translate to something like:

○ Parser can use a single token of lookahead

○ One lookahead is what makes the parser LL(1)

● The solution is FIRST and FOLLOW sets

● Check if the lookahead is in any of the FIRST sets of the rule

○ If yes, the parser must choose that rule to apply

○ If no, then that means there is a parse error

● What if the token is in multiple FIRST sets?

○ This means that the grammar is not predictive

• The FIRST set of epsilon is just epsilon, but that is

• This is where FOLLOW sets are necessary

• The first token we see when applying epsilon

• Otherwise it is a parse error as before

• Applying an epsilon rule means making no calls

• Can be as simple as an empty case statement

• What lookahead tokens can we see when we

● Should output correct values for each print

● Evaluation must have correct precedence and

• Passing information into child rules

• Returning information to parent rules

• This can be implemented as function parameters

• You can also use stack

○ This allows us to grade your project accurately

● All error functions require a line number

● Some error functions require more information

○ For example, scanError takes the bad

• Can be thrown by either scanner or parser

• Any time when an input integer is out of range

• Any time when a calculation overflows for any