Cs 160

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

CS 160

Translation of Programming Languages

Discussion 2
William Eiers & Seemanta Saha
Project 2
• Build scanner & recursive-descent parser in C++

• Goal is to become familiar with top-down parsing

• Requires grammar modification

• Due in one week on Monday, October 16th


Project 2 Steps
• Step 1: Modify grammar so that it is predictive

• Step 2: Implement scanner

• Step 3: Implement recursive-descent parser

• Step 4: Implement evaluation of expressions


Step 1 : Modifying
Grammar
Language Grammar

Start → ExprList

ExprList → ExprList ; Expression


| Expression
Expression Grammar

Expression → Expression + Expression


| Expression - Expression
| Expression * Expression
| Expression / Expression
| Expression % Expression
| Expression mod Expression
| ( Expression )
| num
Grammar Issues
● The grammar has several issues which will
complicate top-down parsing:

○ Ambiguous

○ Not LL(1) / Predictive

○ No operator precedence
Grammar Techniques
● Techniques for modifying grammar:

○ Precedence introduction

○ Left-recursion elimination

○ Left factoring
Precedence Introduction
• Parent nodes in parse tree require information
from child nodes for processing

• For example: an addition requires values of left


and right hand sides so that it can add them

• This means that nodes deeper in the parse tree


are processed earliest

• Higher precedence operators should be deeper


Precedence Introduction
Consider the input:

1 + 2 * 3

Compare the possible parse trees for the above using


both of the following grammars for expressions:

Expression →
Expression + Expression
Expression →
| Term
Expression + Expression
| Expression * Expression
Term →
| num
Term * Term
| num
Operator Precedence

• () is the highest

• Then mod, *, / (same precedence)

• +, - are the lowest (and same)


Top-down Recursive
Descent Parsing
• Goal is to build the parse tree in a top-down fashion
• Start at the root of the parse tree from the start
symbol and grow towards leaves
• Pick a production and try to match the input
• Caveat: top-down parsers cannot handle
left-recursive grammars and must backtrack on ‘bad’
production picks
Why do we need Left
Recursion Elimination for
Recursive Descent
Parser?
Need : Left Recursion Elimination

Expression → Expression + Term


function Expression()
{
Expression(); match('+'); Term();
}

• would fall into infinite recursion when executed


(typically a bad property for parsers to have)
Left Recursion
Elimination
Expression → Expression + Expression
| num

becomes

Expression → num Expression’

Expression’ → + num Expression’


| ε
Why do we need Left
Factoring?
Need : Left Factoring

Expression → Expression + Expression


| Expression - Expression

• This grammar is non deterministic.


• We can not predict which grammar rule to choose
• We may choose a wrong one and need to backtrack
• We factor out the common prefix
Left Factoring
Expression → Expression + Expression
| Expression - Expression

becomes

Expression → Expression Expression’

Expression’ → + Expression
| - Expression
Step 2 : Scanner
• Reads input from standard input stream (STDIN)
• Identifies tokens and handles line numbers
• Make tokens available and useful to parser
• nextToken function, returns next token
• eatToken function, consumes a given token
Scanner Example
Scanner Tips
● Whitespace is ignored between tokens
● Whitespace includes space, tab, newline, etc.
● Whitespace is not allowed inside a single token
● Easiest to examine one character at a time
○ Use a switch-case statement
● nextToken and eatToken should be consistent. Be
careful when fetching a token and consuming a token.
Step 3: Recursive
Descent Parser
• Each non-terminal has corresponding function

• Each terminal in rule translates to call to eatToken

• Applying a rule turns into a sequence of calls

• Decide which rule to apply using FIRST/FOLLOW


Recursive Descent
Example
• Consider the following nonterminal and rule:

E → T + T * F
Recursive Descent Example

• This might translate to something like:


E() {
T()
Eat plus token
T()
Eat multiplication token
F()
}
Recursive Descent Example

• Consider the following nonterminal and rules:

E → number
| ( E )
Recursive Descent Example

• This might translate to something like:


E():
Switch on Next Token:
case Number:
Eat number token
case Open Parenthesis:
Eat open parenthesis token
E()
Eat close parenthesis token
Choosing Rules
● If there are multiple rules, how does the parser
choose?

○ Parser can use a single token of lookahead

○ One lookahead is what makes the parser LL(1)

● The solution is FIRST and FOLLOW sets


Using FIRST Sets
● For a string of grammar symbols , define FIRST( ) as the set of
tokens that appear as the first symbol in some string that
derives from

● Check if the lookahead is in any of the FIRST sets of the rule


choices

○ If yes, the parser must choose that rule to apply

○ If no, then that means there is a parse error

● What if the token is in multiple FIRST sets?

○ This means that the grammar is not predictive


Epsilon Rules
• The FIRST set method does not work for epsilon

• The FIRST set of epsilon is just epsilon, but that is


not a token that can come in as the lookahead

• This is where FOLLOW sets are necessary

• The first token we see when applying epsilon


must be from the FOLLOW set
N.B. You may have different error messages without
applying epsilon rules
Epsilon Rules
• To apply epsilon the lookahead must be in the
FOLLOW set

• Otherwise it is a parse error as before

• Applying an epsilon rule means making no calls

• Can be as simple as an empty case statement


Epsilon Rule Example

• What lookahead tokens can we see when we


want to apply the epsilon in the below grammar?

S→ E
E → FE’
E’→ +FE’ | -FE’ | ε
F → m | (E) | num
Let’s compute First and
Follow sets for the
grammar from the
previous page
Step 4: Expression
Evaluation
● Activated with -e command line flag

● Should output correct values for each print

● Evaluation must have correct precedence and


associativity
● All the operators for given grammar are left
associative
Expression Evaluation
• Correctly handling evaluation may require:

• Passing information into child rules

• Returning information to parent rules

• This can be implemented as function parameters


and return values

• You can also use stack


Error Handling
● Four types of errors:
○ outOfBoundsError only thrown when
evaluating expressions (-e flag)
○ scanError thrown for bad characters in
input
○ mismatchError thrown when parser
expects a token that is not present in the input
○ parseError thrown when parser encounters
an invalid token
Error Handling
● Use only provided error functions to throw
errors

○ This allows us to grade your project accurately

● All error functions require a line number

● Some error functions require more information

○ For example, scanError takes the bad


character
Out of Bounds Error
• This error is only thrown when evaluating

• Can be thrown by either scanner or parser

• Any time when an input integer is out of range

• Any time when a calculation overflows for any


component of a vector

• The range is INT_MIN to INT_MAX


Scan Error
• Thrown by the scanner

• Should be thrown for any character that is not


part of a valid token

• Should also be thrown for incorrect characters


inside the print token

• For example, “pri nt” throws a scan error

• Note that print token is case insensitive


Mismatch Error
• Probably will be thrown by scanner

• Most likely inside the eatToken function

• Means that the parser expected a token which


was not found

• May be thrown too often if FOLLOW sets are not


used correctly
Parse Error

• Thrown by parser

• Usually when the parser encounters lookahead


token that does not match any rule choice

• Remember that epsilon rules may throw this


error
Testing
• Test cases are provided along with code

• Run program on all test cases with `make test`

• All the test cases are in test/test.rb file

• You need to install Ruby to run the test.rb file


Grading
• Grading will be mostly based on provided test
cases, you should write your own test cases and
check

• More test cases with expected output is higher


score

• Running tests with given test.rb will give you a


strong idea of your grade before submission

You might also like