Compiler Engineering: Lab # 4: Syntax Analysis (Parsing)

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

COMPILER ENGINEERING

LAB # 4: SYNTAX ANALYSIS (PARSING)

LEXICAL ANALYSIS SUMMARY


1. Start New Token 2. Read 1st character to start recognizing its type according to the algorithm specified in 3. Slide 3 3. Pass its Token (Lexeme Type) and Value Attribute send to Parser 4. Repeat Steps (1-3) 5. Repeat Until End

10-14/3/12

Department of Computer Science Compiler Engineering Lab

Start New TOKEN Read 1st Character If is Digit? If is Letter? Read Following Characters If any is digit or _? If all letters? Is RELOP? >, <, !, = Is AROP? +, -. /, *, =
10-14/3/12

TOKEN = NUM

TOKEN = ID Is a Keyword ?

Is 2nd Char (=)? TOEKN = AROP

TOKEN= RELOP

TOKEN=KEYWOR D

Department of Computer Science Compiler Engineering Lab

SYNTAX ANALYSIS (PARSING)


is the process of analyzing a text, made of a sequence of tokens to determine its grammatical structure with respect to a given (more or less) formal grammar. Builds Abstract Syntax Tree (AST) Part from an Interpreter or a Compiler Creates some form of Internal Representation (IR) Programming Languages tend to be written in Context-free grammar Efficient + fast Parsers can be written for them
10-14/3/12 Department of Computer Science Compiler Engineering Lab 4

PHASE 2 : SYNTAX ANALYSIS


also called sometimes Syntax Checking Ensures that:
the code is valid grammatically (without worrying about the meaning) and will sequence into an executable program.

The syntactical analyzer applies rules to the code; For example:


checking to make sure that each opening brace has a corresponding closing brace, and that each declaration has a type, and that the type exists .. etc
10-14/3/12 Department of Computer Science Compiler Engineering Lab 5

CONTEXT-FREE GRAMMAR
Defines the components that forms an expression + defines the order they must appear in A context-free grammar is a set of rules specifying how syntactic elements in some language can be formed from simpler ones The grammar specifies allowable ways to combine tokens(called terminals), into higher-level syntactic elements (called non-terminal)

10-14/3/12

Department of Computer Science Compiler Engineering Lab

CONTEXT-FREE GRAMMAR
Ex.:
Any ID is an expression (Preferred to say TOKEN) Any Number is an expression (Preferred to say TOKEN) If Expr1 and Expr2 are expressions then:
Expr1+ Expr2 are expressions Expr1* Expr2 are expressions

If id1 and Expr2 are expressions then:


Id1 = Expr2 is a statement

If Expr1and Statement 2 then


While (Expr1) If (Expr1) then are statements
10-14/3/12

do Statement 2, Statement 2

Department of Computer Science Compiler Engineering Lab

GRAMMAR & AST


TOKEN (terminals) = Leaf Expressions, Statements (Non-Terminals) = Nodes
Lexical Analysis

Stream of Characters

Stream of TOKENs

Stream of TOKENs

Syntax Analysis

Abstract Syntax Tree (AST)

10-14/3/12

Department of Computer Science Compiler Engineering Lab

PHASE 2 : SYNTAX ANALYSIS

10-14/3/12

Department of Computer Science Compiler Engineering Lab

PHASE 2 : SYNTAX ANALYSIS

Token Token Tokens

Syntax Analyzer (Parser)

10-14/3/12

Department of Computer Science Compiler Engineering Lab

10

SYMBOL TABLE
A Symbol Table is a data structure containing a record for each identifier with fields for the attributes of an ID Tokens formed are recorded in the ST Purpose:
To analyze expressions\statements, that is a hierarchal or nesting structure is required Data structure allows us to: find, retrieve, store a record for an ID quickly.
For example: in Semantic Analysis Phase + Code Generation phase retrieve ID Type to Type Check and Implementation purposes
10-14/3/12 Department of Computer Science Compiler Engineering Lab 11

SYMBOL TABLE MANAGEMENT


The Symbol Table may contain any of the following information:
For an ID:
The storage allocated for an ID, its TYPE, Its Scope (Where its valid in the program)

For a function also:


Number of Arguments Types of Arguments Passing Method (By Reference or By Value) Return Type

Identifiers will be added if they dont already exist


10-14/3/12 Department of Computer Science Compiler Engineering Lab 12

SYMBOL TABLE MANAGEMENT


Not all attributes can always be determined by a lexical analyzer because of its linear nature E.g. dim a, x as integer
In this example the analyzer at the time when seeing the IDs has still unreached the type keyword

So, following phases will complete filling IDs attributes and using them as well
For example: the storage location attribute is assigned by the code generator phase

10-14/3/12

Department of Computer Science Compiler Engineering Lab

13

ERROR DETECTION & REPORTING


In order the Compilation process proceed correctly, Each phase must:
Detect any error Deal with detected error(s)

Errors detection:
Most in Syntax + Semantic Analysis In Lexical Analysis: if characters arent legal for token formation In Syntax Analysis: violates structure rules In Semantic Analysis: correct structure but wrong invalid meaning (e.g. ID = Array Name + Function Name)
Department of Computer Science Compiler Engineering Lab

10-14/3/12

14

COMPILER PHASES

10-14/3/12

Department of Computer Science Compiler Engineering Lab

15

LEXICAL ANALYZER & SYMBOL TABLE

Lexical Analyzer

Token Token ID Type Id1 expr1 1d2 Expr2 Id3 Expr3 ID AROP ID AROP ID AROP

Token Value position ASS Initial SUM Rate MUL

Location

N1
10-14/3/12

Num

60
Department of Computer Science Compiler Engineering Lab 16

SYNTAX ANALYZER & SYMBOL TABLE

10-14/3/12

Department of Computer Science Compiler Engineering Lab

17

SYNTAX ANALYZER & SYMBOL TABLE

A LEAF is a record with two or more fields One to identify the TOKEN and others to identify info attributes Token ID Id1 expr1 1d2 Expr2 Id3 Expr3 N1
10-14/3/12

Token Type ID AROP ID AROP ID AROP NUM

Token Value position ASS Initial SUM Rate MUL 60

Location

Department of Computer Science Compiler Engineering Lab

18

SYNTAX ANALYZER & SYMBOL TABLE

An interior NODE is a record with a field for the operator and two fields of pointers to the left and right children Operator Expr1 Expr2 Expr3 Left Child (Pointer) id1 id2 id3 Right Child (Pointer) Expr2 Expr3 N1

10-14/3/12

Department of Computer Science Compiler Engineering Lab

19

TASK 1: THINK AS A COMPILER!


Analyze the following program syntactically: int main() { std::cout << "hello world" << std::endl; return 0; }

10-14/3/12

Department of Computer Science Compiler Engineering Lab

20

LEXICAL ANALYZER OUTPUT


1 = string "int 2 = string "main 3 = opening parenthesis 4 = closing parenthesis 5 = opening brace 6 = string "std 7 = namespace operator 8 = string "cout 9 = << operator 10 = string ""hello world" 11 = string "endl 12 = semicolon 13 = string "return 14 = number 0 15 = closing brace
Department of Computer Science Compiler Engineering Lab 21

10-14/3/12

TASK 2: A STATEMENT AST


Create an abstract syntax tree for the following code for the Euclidean algorithm: while b 0 if a > b a := a b else b := b a return a

10-14/3/12

Department of Computer Science Compiler Engineering Lab

22

TASK 2: A STATEMENT AST

10-14/3/12

Department of Computer Science Compiler Engineering Lab

23

LAB ASSIGNMENT
Write the Syntax Analyzer Components and Ensure fulfilling the following : Create a Symbol Table (for all types including IDs, Functions, .. Etc) Fill the Symbol Table with Tokens extracted from the Lexical Analysis phase Differentiate between Node and Leaf Applying grammar rules (tokens, expressions, statements)

10-14/3/12

Department of Computer Science Compiler Engineering Lab

24

QUESTIONS?
Thank you for listening

10-14/3/12

Department of Computer Science Compiler Engineering Lab

25

You might also like