Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Lecture 03

Describing Syntax
Topics

• Introduction
• The General Problem of Describing Syntax
• Formal Methods of Describing Syntax
– BNF
– Context-Free Grammars
– Derivation
– Parse Trees
– Syntax Diagrams

Copyright © 2015 Pearson. All rights reserved. 1-2


Introduction
• Syntax: the form or structure of the expressions,
statements, and program units

• Semantics: the meaning of the expressions,


statements, and program units

• Syntax and semantics provide a language’s


definition
– Users of a language definition
• Other language designers
• Implementers
• Programmers (the users of the language)

Copyright © 2015 Pearson. All rights reserved. 1-3


The General Problem of Describing Syntax:
Terminology
• A sentence is a string of
characters over some alphabet index = 2 * count + 17;

• A language is a set of sentences


• A lexeme is the lowest level
syntactic unit of a language
(e.g., *, sum, begin)
• A token is a category of lexemes
(e.g., identifier) sentence?
language?
• A pattern explains the ways to lexeme?
arrange the combinations of
token?
characters in a token (e.g., use
pattern?
regular expression: [A-Z a-z 0-9]+)
Copyright © 2015 Pearson. All rights reserved. 1-4
The General Problem of Describing Syntax:
Terminology

1-5
The General Problem of Describing Syntax:
Terminology

1-6
The General Problem of Describing Syntax:
Terminology

1-7
The General Problem of Describing Syntax:
Terminology

How to categorize tokens?

•Very much dependent on the language.


•Typically:
 Give keywords their own tokens.
 Give different punctuation symbols their
own tokens.
 Group lexemes representing identifiers,
numeric constants, strings, etc. into their
own groups.
 Discard irrelevant information (whitespace,
comments)
1-8
Formal Methods of Describing Syntax

• Recognizers
– A recognition device reads input strings over the
alphabet of the language and decides whether
the input strings belong to the language
– Example: syntax analysis part of a compiler

• Generators
– A device that generates sentences of a language
– One can determine if the syntax of a particular
sentence is syntactically correct by comparing it
to the structure of the generator

Copyright © 2015 Pearson. All rights reserved. 1-9


Formal Methods of Describing Syntax

• Lexical analyzer can identify the pattern of the


tokens with the regular expressions but cannot
check the syntax.

• Context-Free Grammars (CFG) is a helpful tool in


describing the syntax of programming languages
(syntax analyzer)

https://www.tutorialspoint.com/compiler_design/compiler_design_syntax_analysis.htm 1-10
BNF and Context-Free Grammars

• Context-Free Grammars (CFG)


– Developed by Noam Chomsky in the mid-1950s
– Language generators, meant to describe the
syntax of natural languages
– Define a class of languages called context-free
languages

• Backus-Naur Form (BNF)


– Invented by John Backus to describe the syntax
of Algol 58 in 1959 then slightly modified by
Peter Naur.
– BNF is notation technique for CFG
Copyright © 2015 Pearson. All rights reserved. 1-11
Difference of CFG and BNF

– CFG
expression  identifier | number | - expression
| ( expression )
| expression operator expression
operator  + | - | * | /

– BNF
expression  identifier | number | - expression
| ( expression )
| expression operator expression
operator  + | - | * | /

Copyright © 2015 Pearson. All rights reserved. 1-12


BNF Fundamentals

The basic properties are

• NT: a set of non-terminal symbols


• T: a set of terminals (NT ∩ T = Ǿ)
• R: a set of rules (R: NT → (NT U T)*)
• S: a start symbol

Copyright © 2015 Pearson. All rights reserved. 1-13


BNF Fundamentals

• In BNF, abstractions are used to represent classes


of syntactic structures--they act like syntactic
variables (also called nonterminal symbols, or just
terminals)

• Terminals are lexemes or tokens

• A rule has a left-hand side (LHS), which is a


nonterminal, and a right-hand side (RHS), which is
a string of terminals and/or nonterminals

Copyright © 2015 Pearson. All rights reserved. 1-14


BNF Fundamentals (continued)

• Nonterminals are often enclosed in angle brackets

– Examples of BNF rules:


<ident_list> → identifier | identifier, <ident_list>
<if_stmt> → if <logic_expr> then <stmt>

• Grammar: a finite non-empty set of rules

• A start symbol is a special element of the


nonterminals of a grammar

Copyright © 2015 Pearson. All rights reserved. 1-15


BNF Rules

• A nonterminal symbol can have more


than one RHS
<stmt>  <single_stmt>
<stmt>  begin <stmt_list> end

<stmt>  <single_stmt>
| begin <stmt_list> end

Copyright © 2015 Pearson. All rights reserved. 1-16


Describing Lists

• Syntactic lists are described using recursion


( LHS appear in its RHS )
<ident_list>  ident
| ident, <ident_list>

• A derivation
– is a repeated application of rules, starting with
the start symbol and ending with a sentence (all
terminal symbols)
– show how to generate a syntactically valid string

Copyright © 2015 Pearson. All rights reserved. 1-17


An Example Grammar & Derivation
<program>  <stmts>
<stmts>  <stmt> | <stmt> ; <stmts>
<stmt>  <var> = <expr>
<var>  a | b | c | d
<expr>  <term> + <term> | <term> - <term>
<term>  <var> | const
 How to derive a sentence (a = b + const) from grammar???
<program> => <stmts> => <stmt>
=> <var> = <expr>
=> a = <expr>
=> a = <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
=> a = b + const
Copyright © 2015 Pearson. All rights reserved. 1-18
Derivations
• Every string of symbols in a derivation is a sentential
form
• A sentence is a sentential form that has only terminal
symbols
• A leftmost derivation is one in which the leftmost
nonterminal in each sentential form is the one that is
expanded
• A derivation may be neither leftmost nor rightmost

1-19
Parse Tree

• A hierarchical representation of a derivation


<program>

<stmts>

<stmt>

<var> = <expr>

a <term> + <term>

<var> const

b
Copyright © 2015 Pearson. All rights reserved. 1-20
Ambiguity in Grammars
• A grammar is ambiguous if and only if it generates a sentential
form that has two or more distinct parse trees

<expr>  <expr> <op> <expr> | const


<op>  / | -
<expr> <expr>

<expr> <op> <expr> <expr> <op> <expr>

<expr> <op> <expr> <expr> <op> <expr>

const - const / const const - const / const


Copyright © 2015 Pearson. All rights reserved. 1-21
An Unambiguous Expression Grammar

• If we use the parse tree to indicate precedence


levels of the operators, we cannot have ambiguity

<expr>  <expr> - <term> | <term>


<term>  <term> / const | const

<expr>

<expr> - <term>

<term> <term> / const

const const

Copyright © 2015 Pearson. All rights reserved. 1-22


Associativity of Operators
• When more than one operators have same
precedence, associativity is used to specify
precedence  left-associative or right-associative
• Operator associativity can also be indicated by a grammar

<expr> -> <expr> + <expr> | const (ambiguous)


<expr> -> <expr> + const | const (unambiguous)
<expr>
<expr>

<expr> + const

<expr> + const

const
Copyright © 2015 Pearson. All rights reserved. 1-23
Unambiguous Grammar for Selector
• C++ if-then-else grammar
<if_stmt> -> if (<logic_expr>) <stmt>
| if (<logic_expr>) <stmt> else <stmt>
AMBIGUOUS!!!

Copyright © 2015 Pearson. All rights reserved. 1-24


Unambiguous Grammar for Selector
• C++ if-then-else grammar
<if_stmt> -> if (<logic_expr>) <stmt>
| if (<logic_expr>) <stmt> else <stmt>
AMBIGUOUS!!!

- An unambiguous grammar for if-then-else

<stmt> -> <matched> | <unmatched>


<matched> -> if (<logic_expr>) <stmt>
| a non-if statement
<unmatched> -> if (<logic_expr>) <stmt>
| if (<logic_expr>) <matched> else
<unmatched>

Copyright © 2015 Pearson. All rights reserved. 1-25


Extended BNF
• Increase readability and writability
• Optional parts are placed in brackets [ ]
<proc_call> -> ident [(<expr_list>)]

• Alternative parts of RHSs are placed inside parentheses and


separated via vertical bars
<term> → <term> (+|-) const

• Repetitions (0 or more) are placed inside braces { }


<ident> → letter {letter|digit}

• Repetitions (1 or more) use a plus (+) superscript


<ident> → letter {letter|digit}+

Copyright © 2015 Pearson. All rights reserved. 1-26


BNF and EBNF
• BNF
<expr>  <expr> + <term>
| <expr> - <term>
| <term>
<term>  <term> * <factor>
| <term> / <factor>
| <factor>
<compound>  begin <stmt> {<stmt>} end
• EBNF
<expr>  <term> {(+ | -) <term>}
<term>  <factor> {(* | /) <factor>}
<compound>  begin {<stmt>}+ end

Copyright © 2015 Pearson. All rights reserved. 1-27


Syntax Diagram
~ Graphical form of BNF
• An expression can broken
down into a sequence of
terms, separated by + or –
<expr>  <term> {(+ | -)
<term>}

• Each term is broken down


into a sequence of factors,
separated by * or /

• Each factor is either a


parenthesized expression
or a number

1-28

You might also like