Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 19

Context-free grammars

Arles Rodríguez
aerodriguezp@unal.edu.co

Facultad de Ciencias
Departamento de Matemáticas
Universidad Nacional de Colombia
Motivation
• Not all strings of tokens are valid programs.
• The parser must tell the difference.
– Must distinguish between valid and invalid strings
of tokens.
• We need:
– A way to describe valid strings of tokens
(language)
– A method for distinguishing valid from invalid
strings of tokens (algorithm)
Programming languages have recursive
structure
• In cool by example:
– An expression EXPR can be many things:

if EXPR then EXPR else EXPR fi


while EXPR loop EXPR pool

• Context-free grammars (CFGs) are a natural


notation for describing such recursive
structures.
Context-free grammar structure
• A CFG consists of:
– A set of terminals (T)
– A set of non-terminals (N)
– A start symbol
– A set of productions
• Is a symbol followed by an arrow, followed by a list of
symbols.
• , where and
CFG example: balanced parenthesis
• Given the grammar:

• The sets of non-terminal N and terminal symbols are:

• Productions can be read as replacement rules

• Whenever you see an S you can replace the S with (S)


CFG Algorithm
• Step 1: begin with a string with only the start
symbol S
• Step 2: we can replace any non-terminal X in
the string by the right-hand side of some
production
• Step 3: Repeat step 2 until there are no non-
terminals left.
CFG Example: Algorithm
• A state is a string of symbols:

• We have a production:
• So we can replace:
CFGs
• Multiple steps can be represented as
application of rewriting over strings

• We can say then that rewrites in zero or more


steps in

• means a sequence of individual productions


greater or equal than zero.
Language of CFGs
• Let G be a context free grammar with start symbol S,
then the language L(G) of G is the string of symbols :

• Terminals T are so-called because there are not rules


for replacing them.
• Once generated, terminals are permanent.
• Terminals ought to be tokens of the language.
CFG example for COOL
• Non-terminals are in CAPS
• Terminal symbols are in lowercase.
• A fragment of COOL
EXPR -> if EXPR then EXPR else EXPR fi
EXPR -> while EXPR loop EXPR pool
EXPR -> id
.
.
.
CFG example for COOL
• This can be simplified:
EXPR -> if EXPR then EXPR else EXPR fi
| while EXPR loop EXPR pool
| id
.
.
.
CFGs: Elements
• Some elements of the language are:
– id EXPR -> id
– if id then id else id fi EXPR -> if EXPR then…

– while id loop id pool EXPR-> while EXPR loop…

• Nested expressions:
– if while id loop id pool then id else
id fi
– if if id then id else id fi then id
else id fi
CFGs: Elements
• Simple arithmetic expressions

E -> E + E
| E * E
| (E)
| id
CFGs: Elements
• Simple arithmetic expressions
E -> E + E
| E * E
| (E)
| id
• Following are valid expressions:
id
id + id
id + id * id
( id + id ) * id
Exercise
CFG
• As we defined CFGs output is yes or no,
representing membership in a language.
• We need a method to build a Parse Tree of the
input.
• In cases when the String is not in the language,
we must be able to handle errors gracefully
giving some kind of feedback to the programmer.
• Need to use an implementation of CFG (e.g.
bison)
CFG
• The form of the grammar can be important
– Tools are sensitive to the way you write the
grammar.
– While there are many ways to write the grammar
there are cases where it is necessary to modify the
grammar to get the tools accept it.
¡Thank you!
References
• Aho et al. Compilers: principles, techniques, and
tools. Torczon et al. (2014) (Section 4.2)
• Slides are based on the design of Aiken Alex. CS 143.
https://web.stanford.edu/class/cs143/

You might also like