Download as pdf or txt
Download as pdf or txt
You are on page 1of 62

01-03-2017

Compilers:
A compiler is a program that reads a program
written in one language the source
language and translates it into an
equivalent program in another language
the target language

Mr.Binu Vargis AsstProfessor


1
ComputerScienceEngg

Compilers:
As an important part of this translation
process, the compiler reports to its user the
presence of errors in the source program.

Mr.Binu Vargis AsstProfessor


2
ComputerScienceEngg

1
01-03-2017

Compilers:

Mr.Binu Vargis AsstProfessor


3
ComputerScienceEngg

Compilers:
At first glance, the variety of compilers may
appear overwhelming.

There are thousands of source languages,


ranging from traditional programming
languages such as FORTRAN and Pascal to
specialized languages.

Mr.Binu Vargis AsstProfessor


4
ComputerScienceEngg

2
01-03-2017

Target languages are equally as varied;

A target language may be another


programming language, or the machine
language of any computer.

Mr.Binu Vargis AsstProfessor


5
ComputerScienceEngg

Compilers are sometimes classified as:


single-pass
multi-pass
load-and-go
Debugging
optimizing

Mr.Binu Vargis AsstProfessor


6
ComputerScienceEngg

3
01-03-2017

The basic tasks that any compiler must


perform are essentially the same.

By understanding these tasks, we can


construct compilers for a wide variety of
source languages and target machines using
the same basic techniques.

Mr.Binu Vargis AsstProfessor


7
ComputerScienceEngg

Compilers HISTORY:
Throughout the 1950s, compilers were
considered notoriously difficult programs to
write.

The first FORTRAN compiler, for example, took


18 staff-years to implement.

Mr.Binu Vargis AsstProfessor


8
ComputerScienceEngg

4
01-03-2017

The Analysis-Synthesis Model of


Compilation:

There are two parts of compilation:

Analysis
Synthesis

Mr.Binu Vargis AsstProfessor


9
ComputerScienceEngg

The Analysis-Synthesis Model of


Compilation:

The analysis part breaks up the source


program into constituent pieces

creates an intermediate representation of the


source program.

Mr.Binu Vargis AsstProfessor


10
ComputerScienceEngg

5
01-03-2017

The Analysis-Synthesis Model of


Compilation:

The synthesis part constructs the desired


target program from the intermediate
representation.

Mr.Binu Vargis AsstProfessor


11
ComputerScienceEngg

The Analysis-Synthesis Model of


Compilation:

source Front IR Back machine


code End End code

errors

Mr.Binu Vargis AsstProfessor


12
ComputerScienceEngg

6
01-03-2017

The Analysis-Synthesis Model of


Compilation:

During analysis, the operations implied by the


source program are determined and recorded
in a hierarchical structure called a tree.

Often, a special kind of tree called a syntax


tree is used.

Mr.Binu Vargis AsstProfessor


13
ComputerScienceEngg

The Analysis-Synthesis Model of


Compilation:

In syntax tree each node represents an


operation and the children of the node
represent the arguments of the operation.

For example, a syntax tree of an assignment


statement is shown below.

Mr.Binu Vargis AsstProfessor


14
ComputerScienceEngg

7
01-03-2017

The Analysis-Synthesis Model of


Compilation:

Mr.Binu Vargis AsstProfessor


15
ComputerScienceEngg

Analysis of the Source Program:

In compiling, analysis consists of three phases:

Linear Analysis:
Hierarchical Analysis:
Semantic Analysis:

Mr.Binu Vargis AsstProfessor


16
ComputerScienceEngg

8
01-03-2017

Scanning or Lexical Analysis (Linear


Analysis):
The identifier, position.
The assignment symbol :=
The identifier initial.
The plus sign.
The identifier rate.
The multiplication sign.
The number 60.

Mr.Binu Vargis AsstProfessor


17
ComputerScienceEngg

Mr.Binu Vargis AsstProfessor


18
ComputerScienceEngg

9
01-03-2017

Mr.Binu Vargis AsstProfessor


19
ComputerScienceEngg

Syntax Analysis or Hierarchical


Analysis (Parsing):
Assignment
Statement
:=
Identifier Expression
Position +
Expression Expression
Identifier *
Initial
Expression Expression
Identifier Number
Rate 60

Mr.Binu Vargis AsstProfessor


20
ComputerScienceEngg

10
01-03-2017

1.3 The Phases of a Compiler:


A compiler operates in phases.

Each of which transforms the source program


from one representation to another.

A typical decomposition of a compiler is


shown in fig given below

Mr.Binu Vargis AsstProfessor


21
ComputerScienceEngg

The Phases of a Compiler:

Mr.Binu Vargis AsstProfessor


ComputerScienceEngg 22

11
01-03-2017

Mr.Binu Vargis AsstProfessor


23
ComputerScienceEngg

The Phases of a Compiler:

Linear Analysis:

In which the stream of characters making up the


source program is read from left-to-right and
grouped into tokens that are sequences of
characters having a collective meaning.

Mr.Binu Vargis AsstProfessor


24
ComputerScienceEngg

12
01-03-2017

In a compiler, linear analysis is called lexical


analysis or scanning.

For example, in lexical analysis the characters


in the assignment statement

Position: = initial + rate * 60


Would be grouped into the following tokens:

Mr.Binu Vargis AsstProfessor


25
ComputerScienceEngg

The Phases of a Compiler:


The identifier, position.
The assignment symbol :=
The identifier initial.
The plus sign.
The identifier rate.
The multiplication sign.
The number 60.

Mr.Binu Vargis AsstProfessor


26
ComputerScienceEngg

13
01-03-2017

The Phases of a Compiler:

The blanks separating the characters of these


tokens would normally be eliminated during
the lexical analysis.

Mr.Binu Vargis AsstProfessor


27
ComputerScienceEngg

They follow a pattern for its executions:


L (L |d)
letter followed by letters and digits is a pattern that
matches a word like x or y with the
token id (= identifier)

Mr.Binu Vargis AsstProfessor


28
ComputerScienceEngg

14
01-03-2017

Lexemes & Tokens


Lexeme:
A string of characters that is a lowest-level
syntactic unit in the programming language.
These are the words and punctuations of the
programming language.
Token:
Its a syntactic category that forms a class of
lexemes.

Mr.Binu Vargis AsstProfessor


29
ComputerScienceEngg

Token Lexeme Pattern

letter followed by letters and


ID x y n0
digits

NUM -123 1.456e-5 any numeric constant

IF if if

LPAREN ( (

any string of characters


LITERAL ``Hello''
(except ``) between `` and ``

Mr.Binu Vargis AsstProfessor


30
ComputerScienceEngg

15
01-03-2017

Mr.Binu Vargis AsstProfessor


31
ComputerScienceEngg

Hierarchical Analysis:

In which characters or tokens are grouped


hierarchically into nested collections with
collective meaning.

Mr.Binu Vargis AsstProfessor


32
ComputerScienceEngg

16
01-03-2017

Hierarchical analysis is called parsing or syntax


analysis.

It involves grouping the tokens of the source


program into grammatical phases that are
used by the compiler to synthesize output.

Mr.Binu Vargis AsstProfessor


33
ComputerScienceEngg

The grammatical phrases of the source


program are represented by a parse tree.
Statement( Start Symbol)
S->id =E
Expressions
E->E + T/T
Term or Terminal Symbols
T-> T*F/F
Factors
F-> id
Productions ( P) : A finite set of production
rules or simply Productions
Mr.Binu Vargis AsstProfessor
ComputerScienceEngg
34

17
01-03-2017

Assignment
Statement
:=
Identifier Expression
Position +
Expression Expression
Identifier *
Initial
Expression Expression
Identifier Number
Rate 60

Mr.Binu Vargis AsstProfessor


35
ComputerScienceEngg

In the expression initial + rate * 60, the phrase


rate * 60 is a logical unit because the usual
conventions of arithmetic expressions tell us
that the multiplication is performed before
addition.
Because the expression initial + rate is
followed by a *, it is not grouped into a single
phrase by itself

Mr.Binu Vargis AsstProfessor


36
ComputerScienceEngg

18
01-03-2017

The hierarchical structure of a program is


usually expressed by recursive rules.

For example, we might have the following


rules, as part of the definition of expression:

Mr.Binu Vargis AsstProfessor


37
ComputerScienceEngg

Any identifier is an expression.


Any number is an expression
If expression1 and expression2 are expressions,
then so are
Expression1 + expression2
Expression1 * expression2
(Expression1 )

Mr.Binu Vargis AsstProfessor


38
ComputerScienceEngg

19
01-03-2017

Semantic Analysis:

In which certain checks are performed to ensure


that the components of a program fit together
meaningfully.

Mr.Binu Vargis AsstProfessor


39
ComputerScienceEngg

The semantic analysis phase checks the source


program for semantic errors and gathers type
information for the subsequent code-
generation phase.

Mr.Binu Vargis AsstProfessor


40
ComputerScienceEngg

20
01-03-2017

It uses the hierarchical structure determined


by the syntax-analysis phase to identify the
operators and operand of expressions and
statements.

Mr.Binu Vargis AsstProfessor


41
ComputerScienceEngg

An important component of semantic analysis


is type checking.

Here are the compiler checks that each


operator has operands that are permitted by
the source language specification.

Mr.Binu Vargis AsstProfessor


42
ComputerScienceEngg

21
01-03-2017

For example, when a binary arithmetic


operator is applied to an integer and real. In
this case, the compiler may need to be
converting the integer to a real. As shown in
figure given below

Mr.Binu Vargis AsstProfessor


43
ComputerScienceEngg

Mr.Binu Vargis AsstProfessor


44
ComputerScienceEngg

22
01-03-2017

Symbol Table Management:


An essential function of a compiler is to record the
identifiers used in the source program and collect
information about various attributes of each
identifier.

These attributes may provide information about


the storage allocated for an identifier, its type, its
scope.

Mr.Binu Vargis AsstProfessor


45
ComputerScienceEngg

The symbol table is a data structure containing a


record for each identifier with fields for the
attributes of the identifier.

When an identifier in the source program is


detected by the lexical analyzer, the identifier is
entered into the symbol table.

Mr.Binu Vargis AsstProfessor


46
ComputerScienceEngg

23
01-03-2017

However, the attributes of an identifier cannot


normally be determined during lexical analysis.

For example, in a Pascal declaration like


Var position, initial, rate : real;

The type real is not known when position, initial


and rate are seen by the lexical analyzer.

Mr.Binu Vargis AsstProfessor


47
ComputerScienceEngg

The remaining phases gets information about


identifiers into the symbol table and then use this
information in various ways.

For example, when doing semantic analysis and


intermediate code generation, we need to know
what the types of identifiers are, so we can check
that the source program uses them in valid ways,
and so that we can generate the proper
operations on them.

Mr.Binu Vargis AsstProfessor


48
ComputerScienceEngg

24
01-03-2017

The code generator typically enters and uses


detailed information about the storage assigned
to identifiers.

Mr.Binu Vargis AsstProfessor


49
ComputerScienceEngg

Mr.Binu Vargis AsstProfessor


50
ComputerScienceEngg

25
01-03-2017

Each phase can encounter errors.

However, after detecting an error, a phase


must somehow deal with that error, so that
compilation can proceed, allowing further
errors in the source program to be detected.

Mr.Binu Vargis AsstProfessor


51
ComputerScienceEngg

A compiler that stops when it finds the first


error is not as helpful as it could be.

The syntax and semantic analysis phases


usually handle a large fraction of the errors
detectable by the compiler.

Mr.Binu Vargis AsstProfessor


52
ComputerScienceEngg

26
01-03-2017

Error Detection and Reporting:


Errors where the token stream violates the
structure rules (syntax) of the language are
determined by the syntax analysis phase.

The lexical phase can detect errors where the


characters remaining in the input do not form
any token of the language.

Mr.Binu Vargis AsstProfessor


53
ComputerScienceEngg

Mr.Binu Vargis AsstProfessor


54
ComputerScienceEngg

27
01-03-2017

Intermediate Code Generation:


After Syntax and semantic analysis, some
compilers generate an explicit intermediate
representation of the source program.

We can think of this intermediate


representation as a program for an abstract
machine.

Mr.Binu Vargis AsstProfessor


55
ComputerScienceEngg

Intermediate Code Generation:


This intermediate representation should have
two important properties;
it should be easy to produce,
easy to translate into the target program.

Mr.Binu Vargis AsstProfessor


56
ComputerScienceEngg

28
01-03-2017

Intermediate Code Generation:


We consider an intermediate form called
three-address code,

which is like the assembly language for a


machine in which every memory location can
act like a register.

Mr.Binu Vargis AsstProfessor


57
ComputerScienceEngg

Intermediate Code Generation:


Three-address code consists of a sequence of
instructions, each of which has at most three
operands.

Mr.Binu Vargis AsstProfessor


58
ComputerScienceEngg

29
01-03-2017

Intermediate Code Generation:


Temp1 := inttoreal (60)
Temp2 := id3 * temp1
Temp3 := id2 + temp2
id1 := temp3

Mr.Binu Vargis AsstProfessor


59
ComputerScienceEngg

Code Optimization:
The code optimization phase attempts to
improve the intermediate code, so that faster-
running machine code will result.

Mr.Binu Vargis AsstProfessor


60
ComputerScienceEngg

30
01-03-2017

Code Optimization:
Some optimizations are trivial.

For example, a natural algorithm generates


the intermediate code (1.3), using an
instruction for each operator in the tree
representation after semantic analysis, even
though there is a better way to perform the
same calculation, using the two instructions.

Mr.Binu Vargis AsstProfessor


61
ComputerScienceEngg

Code Optimization:
Temp1 := id3 * 60.0
id := id2 + temp1

There is nothing wrong with this simple


algorithm, since the problem can be fixed
during the code-optimization phase.

Mr.Binu Vargis AsstProfessor


62
ComputerScienceEngg

31
01-03-2017

Code Optimization:
That is, the compiler can deduce that the
conversion of 60 from integer to real
representation can be done once and for all at
compile time, so the inttoreal operation can
be eliminated.

Mr.Binu Vargis AsstProfessor


63
ComputerScienceEngg

Code Optimization:
Besides, temp3 is used only once, to transmit
its value to id1. It then becomes safe to
substitute id1 for temp3, whereupon the last
statement of 1.3 is not needed and the code
of 1.4 results.

Mr.Binu Vargis AsstProfessor


64
ComputerScienceEngg

32
01-03-2017

MACHINE INDEPENDENT OPTIMIZATION


Less space or faster execution.
Achieved in following ways:
Elimination of redundancies.
Computations in program are rewritten/rearranged to
make it run faster.
Code Optimization has nothing to do with a
Program Algorithm.
It usually doesnt take into account M/C
properties suchh as instruction set, registers etc.

Mr.Binu Vargis AsstProfessor


65
ComputerScienceEngg

Places for Improvements??

source Front IR Back machine


code End End code

User can improve errors


by changing
algorithms &
transforming Compiler can use
loops registers & select
Compiler can do proper M/C
GLOBAL or LOCAL instructions.
optimization
Mr.Binu Vargis AsstProfessor
66
ComputerScienceEngg

33
01-03-2017

Techniques used in Compilers at Intermediate


code level without changing meanings:
Compile time evaluation
Elimination of common sub expressions
Dead Code elimination
Frequency reduction
Strength reduction

Mr.Binu Vargis AsstProfessor


67
ComputerScienceEngg

COMPILE TIME EVALUATION


When all operands in an operation are constants,
the operation can be performed during
compilation.
Example:
x=5*3+9+y;
can be written as:
x = 24 + y ;

It reduces compile time evaluation.

Mr.Binu Vargis AsstProfessor


68
ComputerScienceEngg

34
01-03-2017

ELIMINATION OF COMMON SUBEXPRESSIONS:


Avoid recomputing a common sub expression by using
its previously computed value.
Its called Common Sub expression if it was previously
computed & the values of variables are not changed
since previous computation.
X = y * z;
.
A = y * z + 6;

The subexpression y*z can be saved in temp variable.

Mr.Binu Vargis AsstProfessor


69
ComputerScienceEngg

Dead Code Elimination


A program may contain code that never get used.
a = 5;
b = 3;
X = a * b + 6;
Print a,b;

a = 5;
b = 3;
Print a,b;

Mr.Binu Vargis AsstProfessor


70
ComputerScienceEngg

35
01-03-2017

Frequency reduction
Some instructions can be moved out of a loop
If keeping it out doesnt make in difference better
reduce the frequency of executing it.

while (i<=100)
z = 5;
{
while (i<=100)
z = 5;
{
x = z * I;
x = z * I;
i++;
i++;
}
} Mr.Binu Vargis AsstProfessor
71
ComputerScienceEngg

Strength Reduction
Replacement of time consuming operation by a
faster operation.
In some cases multiplication can be replaced by an
addition operation.

X = 2*y can be written as z = y + y

x can be written as x*x

Mr.Binu Vargis AsstProfessor


72
ComputerScienceEngg

36
01-03-2017

Machine dependent Optimization


Number of load and store statements can be
reduced by properly utilizing CPU registers for
storing intermediate values.
Use smaller and faster machine instructions.
Use appropriate addressing modes to handle
different high level data structures.
Can be re-arranged to make efficient
utilization of instruction pipelining.
Rearrangement to reduce the code size.
Mr.Binu Vargis AsstProfessor
73
ComputerScienceEngg

Load and Store statements can be reduced. HOW?


x=x+y+z
Three address code derived from Syntax Tree:
T1 = x + y
MOV R1,x
ADD R1,y
MOV T1, R1 MOV R1, x
T2 = T1 + z ADD R1, y
MOV R1, T1 ADD R1, z
ADD R1, z MOV x, R1
MOV T2, R1
x = T2
MOV R1, T2
MOV X, R1 Mr.Binu Vargis AsstProfessor
ComputerScienceEngg
74

37
01-03-2017

Cousins of the Compiler:


As we saw in given figure, the input to a
compiler may be produced by one or more
preprocessors, and further processing of the
compilers output may be needed before
running machine code is obtained.

Mr.Binu Vargis AsstProfessor


75
ComputerScienceEngg

Skeletal Source Program

Preprocessor

Source program

Compiler

Target assembly program

Assembler

Relocatable machine code

LIBRARY, RELOCATABLE
Loader/link-editor OBJECT FILES

Absolute machine code


Mr.Binu Vargis AsstProfessor
1.3. A LANGUAGE-PROCESSING SYSTEM 76
ComputerScienceEngg

38
01-03-2017

Cousins of the Compiler:


Preprocessors:
preprocessors produce input to compilers.
They may perform the following functions:

Macro Processing:
File inclusion:
Rational Preprocessors:
Language extensions:

Mr.Binu Vargis AsstProfessor


77
ComputerScienceEngg

Preprocessors:
Macro Processing:

A preprocessor may allow a user to define macros


that are shorthands for longer constructs.

Mr.Binu Vargis AsstProfessor


78
ComputerScienceEngg

39
01-03-2017

Preprocessors:
File inclusion:
A preprocessor may include header files into the
program text.

For example, the C preprocessor causes the


contents of the file <global.h> to replace the
statement #include <global.h> when it processes a
file containing this statement.

Mr.Binu Vargis AsstProfessor


79
ComputerScienceEngg

Preprocessors:
defs.h main.c

#include defs.h //////


////// //////
---------
////// //////
---------
////// ---------
---------
---------
---------

Mr.Binu Vargis AsstProfessor


80
ComputerScienceEngg

40
01-03-2017

Preprocessors:
Rational Preprocessors:
These processors augment older languages with
more modern flow-of-control and data-structuring
facilities.

Mr.Binu Vargis AsstProfessor


81
ComputerScienceEngg

Preprocessors:
Language extensions:
These processors attempt to add capabilities to
the language by what amounts to built-in macros.
For example, the language Equal is a database
query language embedded in C. Statements
beginning with ## are taken by the preprocessor
to be database-access statements, unrelated to C,
and are translated into procedure calls on routines
that perform the database access.

Mr.Binu Vargis AsstProfessor


82
ComputerScienceEngg

41
01-03-2017

Other forms of Intermediate Code


Three address code
Abstract syntax tree
Postfix notation

Mr.Binu Vargis AsstProfessor


83
ComputerScienceEngg

Abstract Syntax tree

z = x + (y * A)

Mr.Binu Vargis AsstProfessor


84
ComputerScienceEngg

42
01-03-2017

Postfix Notation
Operator appears after operands.
Traversing is done from right to left.
z=x+y*A

Mr.Binu Vargis AsstProfessor


85
ComputerScienceEngg

Quadruple representation:
OPERATOR OPERAND 1 OPERAND 2 RESULT

The quadruple representation of z = x + y * A


Operator Operand 1 Operand 2 Result

* Y A Temp1
+ X Temp1 Temp2
= Temp2 - z

Mr.Binu Vargis AsstProfessor


86
ComputerScienceEngg

43
01-03-2017

Triple representation:
OPERATOR OPERAND 1 OPERAND 2

Triple representation of z = x + y * A

OPERATOR OPERAND 1 OPERAND 2


1 * Y A

2 + X (1)

3 = Z (2)

Mr.Binu Vargis AsstProfessor


87
ComputerScienceEngg

Code Generation
The final phase of the compiler is the
generation of target code

consisting normally of relocatable machine


code or assembly code.

Mr.Binu Vargis AsstProfessor


88
ComputerScienceEngg

44
01-03-2017

Code Generation
Memory locations are selected for each of the
variables used by the program.

Then, intermediate instructions are each


translated into a sequence of machine
instructions that perform the same task.

A crucial aspect is the assignment of variables


to registers.
Mr.Binu Vargis AsstProfessor
89
ComputerScienceEngg

INTERMEDIATE CODE GENERATED ASSEMBLY CODE

T1 = x + y MOV R1,x
ADD R1,y
MOV T1, R1

T2 = T1 + z MOV R1, T1
ADD R1, z
MOV T2, R1

x = T2 MOV R1, T2
MOV X, R1
Mr.Binu Vargis AsstProfessor
90
ComputerScienceEngg

45
01-03-2017

Assignment 3:
1. Consider the statement
4.0
R= 2
Explain the processing of above statement with
respect to all phases of compiler.
2. For the statement given below, generate intermediate
code in the format:
i) Postfix notation
ii) Abstract syntax tree
iii) Quadruple
iv) Triple
v) Three address code
+
=
( )
Mr.Binu Vargis AsstProfessor 91
ComputerScienceEngg

The Grouping of Phases:

Mr.Binu Vargis AsstProfessor


92
ComputerScienceEngg

46
01-03-2017

Front and Back Ends:


The phases are collected into a front end and
a back end.

The front end consists of those phases that


depend primarily on the source language and
are largely independent of the target machine.

Mr.Binu Vargis AsstProfessor


93
ComputerScienceEngg

Front and Back Ends:


These normally include lexical and syntactic
analysis, the creating of the symbol table,
semantic analysis, and the generation of
intermediate code.

A certain among of code optimization can be


done by the front end as well.

Mr.Binu Vargis AsstProfessor


94
ComputerScienceEngg

47
01-03-2017

Front and Back Ends:


The front end also includes the error handling
that goes along with each of these phases.

Mr.Binu Vargis AsstProfessor


95
ComputerScienceEngg

Front and Back Ends:


The back end includes those portions of the
compiler that depend on the target machine.

And generally, these portions do not depend


on the source language, depend on just the
intermediate language.

Mr.Binu Vargis AsstProfessor


96
ComputerScienceEngg

48
01-03-2017

Front and Back Ends:


In the back end, we find aspects of the code
optimization phase, and we find code
generation, along with the necessary error
handling and symbol table operations.

Mr.Binu Vargis AsstProfessor


97
ComputerScienceEngg

LANGUAGE PROCESSOR
DEVELOPMENT TOOLS
There are several software tools that help in
the production of a compiler. Two best known
are :
1. LEX : a lexical analyser generator
2. YAAC: a parser generator

Mr.Binu Vargis AsstProfessor


98
ComputerScienceEngg

49
01-03-2017

LEX
At lexical phase, we use regular expressions to
specify patterns to LEX
It generates code that allows to scan and
match strings in the i/p
LEX is a software tool that takes i/p a
specification of Regular Expression
Also takes action as i/p on recognising each of
these patterns.
The o/p of LEX is a C- Program for lexical
analyses Mr.Binu Vargis AsstProfessor
99
ComputerScienceEngg

Lex source program LEX Lex.yy.c


lex.1

C a.out
COMPILER

Source program for Sequence of tokens


a.out
lexical analysis

Mr.Binu Vargis AsstProfessor


100
ComputerScienceEngg

50
01-03-2017

1. First, a specification of lexical analyzer as


prepared by creating a program lex.1 in the
lex language.
2. Then, lex.1 is run through the lex compiler
to produce a C- program lex.yy.c
3. lex.yy.c is a C- program containing
recogniser for Regular Expression together
with user supplied code.
4. Finally, lex.yy.c is run through C- compiler to
produce an object program a.out which is a
lexical analyzer that transforms an i/p stream
into tokens Mr.Binu Vargis AsstProfessor
101
ComputerScienceEngg

Input is a set of Regular Expression and associate


actions (written in C)
Output is a table driven scanner (lex.yy.c)
INPUT PATTERN:
First part ( definition) OPTIONAL
%%
pattern action (REGULAR EXPRESSION)

%% (OPTIONAL)
Mr.Binu Vargis AsstProfessor
102
ComputerScienceEngg

51
01-03-2017

LEX Specifications

Declarations
%%
Translation rules
%%
User functions
Any of these three sections can be empty but the %%
between the declaration & rules cannot be omitted.

Mr.Binu Vargis AsstProfessor


103
ComputerScienceEngg

Declarations Section
This section contains entries that:
Include standard I/O header file.
Define global variables.
Define the list rule as the place to start
processing.
Define the tokens used by the parser.
Define the operators and their precedence.

Mr.Binu Vargis AsstProfessor


104
ComputerScienceEngg

52
01-03-2017

Rules Section
The rules section defines the rules that parse
the input stream.

Mr.Binu Vargis AsstProfessor


105
ComputerScienceEngg

Programs Section
The programs section contains the following subroutines. Because these
subroutines are included in this file, you do not need to use the yacc library when
processing this file.

main The required main program that calls


the yyparse subroutine to start the
program.

yyerror(s) This error-handling subroutine only prints


a syntax error message.

yywrap The wrap-up subroutine that returns a


value of 1 when the end of input occurs.

Mr.Binu Vargis AsstProfessor


106
ComputerScienceEngg

53
01-03-2017

Mr.Binu Vargis AsstProfessor


107
ComputerScienceEngg

Mr.Binu Vargis AsstProfessor


108
ComputerScienceEngg

54
01-03-2017

YACC
It creates a C-Program for parser.
Available as a command on the Unix system.

Mr.Binu Vargis AsstProfessor


109
ComputerScienceEngg

YACC Specifications YACC Lex.yy.c

C
y.tab.c a.out
COMPILER

Input Tokens from a.out OUTPUT


LEX
Mr.Binu Vargis AsstProfessor
110
ComputerScienceEngg

55
01-03-2017

Steps:
YACC Specification file is prepared.
YACC transforms the file containing
specifications into a C Program called y.tab.c,
which is a parser in C Language.
Y.tab.c is run through C Compiler which
produces object program a.out.
The parser a.out can be run to parse a source
containing stream of tokens.
The input of YACC is divided into three sections
as given below: Mr.Binu Vargis AsstProfessor
111
ComputerScienceEngg

Declarations

%%

Translation rules

%%

C - FUNCTIONS Mr.Binu Vargis AsstProfessor


112
ComputerScienceEngg

56
01-03-2017

FIRST PART
DECLARATIONS section consists of token
declarations & C code bracketed by
% { .%}
YACC definitions.
%start
%token
%union
%type

Mr.Binu Vargis AsstProfessor


113
ComputerScienceEngg

YACC Takes grammar as input, reads it and


generates C code for a syntax analyser or parser.
Actions associated with a rule/grammar are
entered.
Represents a GRAMMAR
Left-hand side is followed by a colon & a right
hand side.
Multiple right-hand sides may be separated by
|
It contains all PRODUCTIONS

Mr.Binu Vargis AsstProfessor


114
ComputerScienceEngg

57
01-03-2017

Mr.Binu Vargis AsstProfessor


115
ComputerScienceEngg

Mr.Binu Vargis AsstProfessor


116
ComputerScienceEngg

58
01-03-2017

Mr.Binu Vargis AsstProfessor


117
ComputerScienceEngg

Third Part
Contains valid C code that supports the
language processing.
Symbol Table implementation.
Functions that might be called by actions
associated with the productions in the second
part.

Mr.Binu Vargis AsstProfessor


118
ComputerScienceEngg

59
01-03-2017

Mr.Binu Vargis AsstProfessor


119
ComputerScienceEngg

Mr.Binu Vargis AsstProfessor


120
ComputerScienceEngg

60
01-03-2017

Mr.Binu Vargis AsstProfessor


121
ComputerScienceEngg

Mr.Binu Vargis AsstProfessor


122
ComputerScienceEngg

61
01-03-2017

Mr.Binu Vargis AsstProfessor


123
ComputerScienceEngg

OVER.....

Mr.Binu Vargis AsstProfessor


124
ComputerScienceEngg

62

You might also like