Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

LAB MANUAL

Course Name: System Programming & Compiler


Construction (SPCC)

Course Code : CSC 601


Lab Code : CSL 601
Class : TE Computer Engineering
Semester : VI
Div :A&B

Mrs. Prajakta S.Khelkar Dr. Rais Mulla

Subject Incharge Name HOD of Computer Department)

T.E. Comp. SEM VI SPCC (CSL601)


LAB NAME: SPCC TE- A& B DIV SEM: VI Course Code: CSC-601

S.N. Experiment Title LO

1 To implement first pass of a two pass assembler for x86 machine LO1
​ (JAVA/C/C++, Python, R-lang, Lex).

2 To implement second pass of a two pass assembler for x86 machine​ LO1
(JAVA/C/C++, Python, R-lang, Lex).

3 To implement single pass Macro Processor ​ (JAVA/C/C++, Python, R-lang, LO2


Lex).

4 To implement Lexical Analyzer programs ​ (JAVA/C/C++, Python, R-lang ,Lex). LO3

5 Write a program to remove left recursion by direct or indirect method for given LO3
set of production rules​ (JAVA/C/C++, Python, R-lang, Lex).
To implement of any one parser (LL-1, SLR, Operator Precedence parser)
6 ​ (JAVA/C/C++, Python, R-lang, Lex). LO3
To implement Intermediate code generation (ex: Three Address Code)
7 ​ (JAVA/C/C++, Python, R-lang, Lex). LO4

8 To study & implement Code Generation Algorithm ​ (JAVA/C/C++, Python, LO4


R-lang, Lex).

9 To Study and Implement LEX. LO5

10 To Study and Implement YACC. LO5

T.E. Comp. SEM VI SPCC (CSL601)


T.E. Comp. SEM VI SPCC (CSL601)
Lab Outcomes: At the end of the course, the students will be able to
LO1. Generate machine code by implementing two pass assemblers.
LO2. Implement Two pass macro processor.
LO3. Parse the given input string by constructing Top down/Bottom-up parser.
LO4. Identify and Validate tokens for given high level language and Implement synthesis phase
of compiler.
LO5..Explore LEX & YACC tools

T.E. Comp. SEM VI SPCC (CSL601)


Academic Year 2021-2022
Experiment No.01
AIM :To implement first pass of a two pass assembler for X86 Processor.
Objective:​ ​ Develop a program to implement first pass:
a. To search instruction in MOT and return its length
b. To search instruction in POT and return the routine called
c. To generate Symbol table
d. To generate literal table
e. To generate intermediate code after pass 1
Outcome: ​ Students are able to design and implement pass 1 of two pass assembler.
Theory:
An assembler performs the following functions
1. Generate instructions
a. Evaluate the mnemonic in the operator field to produce its machine code.
b. Evaluate subfields- find value of each symbol, process literals & assign address.

2. Process pseudo ops.

Pass 1: Purpose - To define symbols & literals

1. Determine length of machine instruction (MOTGET)


2. Keep track of location counter (LC)
3. Remember values of symbols until pass2 (STSTO)
4. Process some pseudo ops. EQU
5. Remember literals (LITSTO)

Pass 1: Database
1. Source program
2. Location counter(LC) which stores location of each instruction
3. Machine Operation Table (MOT). This table indicates the symbolic mnemonic for
each instructions and its length.
4. Pseudo Operation Table (POT). This table indicates the symbolic mnemonic and
action taken for each pseudo-op in pass1.
5. Symbol Table (ST) which stores each label along with its value.
6. Literal Table(LT) which stores each literal and its corresponding address
7. A copy of input which will be used by pass2.

T.E. Comp. SEM VI SPCC (CSL601)


Format of databases
The Machine Operation Table (MOT) and Pseudo Operation Table (POT) are examples of
fixed tables. During the assembly process the contents of this table are not filled in or
altered
1. Machine-op Table (MOT)

2. Pseudo-op Table (POT)


Pseudo-op Address of routine to process
pseudo-op
(3 bytes= 24 bits address)
“EQUbb” P1EQU
“STAR” P1START
“END” P1USING

Let us consider following source code and find the contents of symbol table
and literal table.

Stmt no Symbol Op-code Operands


1 SIC START 0
2 LDA FIVE
3 A MUM2
4 STA 2,FIVE
5 TEMP EQU 10
6 A 3,=F’3’
7 USING TEMP,15
8 FOUR DC F’4’
9 FIVE DC F’5’
10 END

3. Symbol Table:

Symbol Value Length Relocation


“SIC” 0 1 “R”
“NUM1” 9 3 “R”
“NUM2” 12 3 “R”
“SUM” 15 3 “R”

4. Literal Table

Literal Value Length Relocation

F’3’ 20 3 “R”

T.E. Comp. SEM VI SPCC (CSL601)


Stmt no Relative Statement
address
1 SAMPLE START 0
2 USING *,15
3 0 A 1, _ (0,15)
4 4 A 2, _ (0,15)
5 -
6 8 A 3, _ (0,15)
7 -
8 12 FOUR 4
9 16 FIVE 5
10 -

Algorithm:

1. Initially location counter is set to relative address 0 i.e. LC=0


2. Read the statement from source program
3. Examine the op-code field: If match found in MOT then
a. From the MOT entry determine the length field i.e. L=length.
b. Examine the operand field to check whether literal is present or not. If any new
literal is found then corresponding entry is done in LT.
c. Examine the label field for the presence of symbol. If label is present then it is
entered in ST and current value of location counter is assigned to symbol.
d. The current value of location counter is incremented by length of instruction(L)
4. If match found in POT then
a. If it is USING or DROP pseudo-op then first pass do nothing. It just writes a
copy of these cards for pass 2.
b. If it is EQU pseudo-op then evaluate expression in operand field and assign
value to the symbol present in label field.
c. If it is DS or DC pseudo-op then by examining the operand field find out
number of bytes of storage required. Adjust the location counter for proper
alignment.
d. If it is END pseudo-op then pass1 is terminated and control is passed to pass2.
Before transferring the control it assigns location to literals.
5. A copy of source card is saved for pass 2.
6. Go to step 2.

T.E. Comp. SEM VI SPCC (CSL601)


FLOW CHART:

Conclusion: Thus, we have studied and implemented the functionality of first pass of a two
pass assembler. The symbol table, literal table and intermediate code are generated for the
given program.

T.E. Comp. SEM VI SPCC (CSL601)


Experiment No. 2
AIM :To design and implement second pass of a two pass assembler for X86 Processor
Objective:​ ​ Develop a program to implement second pass:
a. To generate Base table
b. To generate machine code

Outcome: ​ Students are able to design and implement pass 2 of two pass assembler.
Theory:
Pass 2: Purpose - To generate object program
1) Look up value of symbols (STGET)
2) Generate instruction (MOTGET2)
3) Generate data (for DC, DS)
4) Process pseudo ops (POT, GET2)
Data Structures:
1) Copy of source program from Pass1
2) Location counter
3) MOT which gives the length, mnemonic format opcode
4) POT which gives mnemonic & action to be taken
5) Symbol table from Pass1
6) Base table which indicates the register to be used or base register
7) A work space INST to hold the instruction & its parts
8) A work space PRINT LINE, to produce printed listing
9) A work space PUNCH CARD for converting instruction into format needed by
loader
10) An output deck of assembled instructions needed by loader.
Format of database:
Let us consider the same example as experiment no. 1 and the base table after statement 2:
Base register Contents
15 0

After statement 7:
Base register Contents
15 10

Base Table:
Assembler uses this table to generate proper base register reference in machine instructions
and to compute offset. Then the offset is calculated as:
offset= value of symbol from ST - contents of base register

T.E. Comp. SEM VI SPCC (CSL601)


Code after pass2:

stmt Relative address Statement


no
3 0 A 1, 12 (0,15)
4 4 A 2, 16(0,15)
6 8 A 3, 10(0,15)
8 12 4
9 16 5
10 -

Algorithm​ :
1. Initialize the location counter as: LC=0
2. Read the statement from source program
3. Examine the op-code field: If match found in MOT then
a. From the MOT entry determine the length field i.e. L=length, binary op-code and
format of the instruction.
Different instruction format requires different processing as described below:
1. RR Instruction : (Register to Register )
Both of the register specification fields are evaluated and placed into second byte of RR
instruction
2. RX Instruction : (Register to Index )
Both of the register and index fields are evaluated and processed similar to RR instruction.
The storage address operand is evaluated to generate effective address (EA). The BT
is examined to find the base register. Then the displacement is determined as:
D=EA- Contents of base register.

The other instruction formats are processed in similar manner to RR and RX.
b. Finally the base register and displacement specification are assembled in third and fourth
bytes of instruction.
c. The current value of location counter is incremented by length of instruction.

4. If match found in POT then


a. If it is EQU pseudo-op then EQU card is printed in the listings.
b. If it is USING pseudo-op then the corresponding BT entry is marked as available.
c. If it is DROP pseudo-op then the corresponding BT entry is marked as
unavailable.
d. If it is DS or DC pseudo-op then various conversions are done depending on the
data type and symbols are evaluated. Location counter is updated by length of
data.
e. END pseudo-op indicates end of source program and then pass2 is terminated.
Before that if any literals are remaining then the code is generated for them.
5. After assembling the instruction it is put in the format required by loader.
6. Finally a listing is printed which consist of copy of source card, its storage location and
hexadecimal representation.
7. Go to step 2.

T.E. Comp. SEM VI SPCC (CSL601)


FLOW CHART:

Conclusion: Thus, we have studied and implemented the functionality of second pass of a two
pass assembler. The base table and machine code are generated for the given program.

T.E. Comp. SEM VI SPCC (CSL601)


Experiment No.3
AIM : ​ To study & implement single pass Macro Processor
Objective:​ ​ Develop a program to implement two pass macro processor:
a. To generate Macro definition Table(MDT)
b. To generate Macro Name table(MNT)
c. To generate Argument List Array(ALA)
d. To generate expanded source code

Outcome: ​ Students are able to design and implement single pass Macro Processor.
Theory: ​ A macro processor is a program that copies a stream of text from one place to
another, making a systematic set of replacements as it does so. Macro processors are often
embedded in other programs, such as assemblers and compilers. Sometimes they are
standalone programs that can be used to process any kind of text.
Macro processors have been used for language expansion (defining new language
constructs that can be expressed in terms of existing language components), for systematic text
replacements that require decision making, and for text reformatting (e.g. conditional
extraction of material from an HTML file).
The following four major tasks are done by macro processor.
1. Recognize macro definition by identifying MACRO and MEND pseudo-ops.
2. Save these definitions which are required for macro expansion process
3. Recognize macro calls by identifying macro name which appears as operation
mnemonic.
4. Finally expand macro call and substitute arguments for the dummy arguments.

Databases Used​ :Pass1 Databases:


1. The input source program
2. The output source deck to be used by pass 2
3. The Macro Definition Table (MDT) : this table stores the macro definition.
4. The Macro name Table(MNT) : This table stores macro names which are defined.
5. The Macro Definition Table Counter (MDTC), which used to indicate next available
entry in MDT.
6. The Macro Name Table Counter (MNTC), which used to indicate next available entry
in MNT.
7. Argument list Array (ALA), which stores the index markers for dummy arguments.

T.E. Comp. SEM VI SPCC (CSL601)


Format of databases:

We will use following example to discuss the format of all databases.

MACRO
&LAB ADDM &ARG1, &ARG2, &ARG3
A 1,&ARG1
A 2,&ARG2
A 3,&ARG3
MEND
………………………………
LOOP ADDM D1, D2, D3
………………………………..
1. Argument List Array (ALA)
This is an array which stores all arguments used in macro definition. Each argument is
assigned an index marker. Consider following macro call,
LOOP ADDM D1, D2, D3
The ALA for this would be:

8 bytes per entry


ent
bbbb”
bbbb”
“D2bbbbbb”
“D3bbbbbb”

2. Macro Definition Table (MDT): This table stores each line of macro definition in it
except the line for MACRO pseudo-op.The MDT for above example is :
as per entry

…….
&LAB ADDM &ARG1, &ARG2, &ARG3
A 1,&ARG1
A 2,&ARG2
A 3,&ARG3
MEND
………
3. Macro Name Table (MNT)​ :Each entry in MNT has following fields:
Index: Name: It is a macro name MDT Index: This is an index from MDT which indicates the
line number from which macro definition is stored in MDT.

T.E. Comp. SEM VI SPCC (CSL601)


Algorithm​
Pass 1 processing:
1. Initialize MDTC and MNTC as MDTC=MNTC=1
2. Read a line from input source card.
3. If it is MACRO pseudo-op then
3.1 Read the next line which will be the macro name line. Enter the macro name in
MNT with current value of MDTC.
3.2 Increment the value of MNTC by 1
3.3 Then the argument list array is prepared for the arguments found in the macro
name line.
3.4 The macro name cared is also inserted in MDT.
3.5 Increment the value of MDTC by 1
3.6 Read the next line from source card
3.7 Substitute the index markers for arguments from ALA prepared in previous step
and then enter this line into MDT
3.8 Increment the value of MDTC by 1
3.9 Check whether it is MEND pseudo-op
then 3.9.1 Go to step 2. else
3.9.2 Go to step 3.6
4. Else, if it is not MACRO pseudo-op then simply write the line in the copy of source
deck prepared for pass2.
5. Check if it is END pseudo-op
then 5.1 Go to step 6 else
5.2 Go to step 2.
6. Stop

Conclusion: ​ Thus we have designed and implemented two pass macro processor.
The MNT, MDT and ALA are generated for the given source program.

T.E. Comp. SEM VI SPCC (CSL601)


Experiment No. 04

AIM : To implement Lexical Analyzer programs

Objective:​ ​ Develop a program to find token from a given grammar.


Outcome: ​ Students are able to implement a program to find token from a given grammar.
Problem Statement: Write a C Program to Scan and Count the number of characters, words, and
lines in a file.

Theory:
Lexical Analysis is the very first phase in the compiler designing. A Lexer takes the modified
source code which is written in the form of sentences. The lexical analyzer breaks this syntax
into a series of tokens. It removes any extra space or comment written in the source code.
Programs that perform Lexical Analysis in compiler design are called lexical analyzers or lexers.
A lexer contains tokenizer or scanner. If the lexical analyzer detects that the token is invalid, it
generates an error. The role of Lexical Analyzer in compiler design is to read character streams
from the source code, check for legal tokens, and pass the data to the syntax analyzer when it
demands.

Lexical Analyzer Architecture: How tokens are recognized

Roles of the Lexical analyzer


Lexical analyzer performs below given tasks:

 Helps to identify token into the symbol table


 Removes white spaces and comments from the source program
 Correlates error messages with the source program
 Helps you to expands the macros if it is found in the source program
 Read input characters from the source program

Example of Lexical Analysis, Tokens, Non-Tokens


Consider the following code that is fed to Lexical Analyzer

#include <stdio.h>
int maximum(int x, int y) {
// This will compare 2 numbers
T.E. Comp. SEM VI SPCC (CSL601)
if (x > y)
return x;
else {
return y;
}
}
Examples of Tokens created
Lexeme Token
int Keyword
maximum Identifier
( Operator
int Keyword
x Identifier
, Operator
int Keyword
Y Identifier
) Operator
{ Operator
If Keyword

Examples of Non tokens


Type Examples
Comment // This will compare 2 numbers
Pre-processor directive #include <stdio.h>
Pre-processor directive #define NUMS 8,9
Macro NUMS
Whitespace /n /b /t

Conclusion: - ​ We study Lexical Phase and Implemented Program for recognized token
from given grammar.

T.E. Comp. SEM VI SPCC (CSL601)


Experiment No. 05
AIM : Write a program for left recursion direct or indirect given the set of production
rule.

Objective:​ ​ Develop a program to find left recursion direct & indirect for given grammar.
Outcome: ​ Students are able to implement a program to find left recursion direct & indirect.
Theory:

Introduction
In the formal language theory of computer science, left recursion is a special case of recursion
where a string is recognized as part of a language by the fact that it decomposes into a string
from that same language (on the left) and a suffix (on the right). For instance, can be
recognized as a sum because it can be broken into, also a sum, and , a suitable suffix.
In terms of context-free grammar, a nonterminal is left-recursive if the leftmost symbol in one
of its productions is itself (in the case of direct left recursion) or can be made itself by some
sequence of substitutions (in the case of indirect left recursion).
A grammar is left-recursive if and only if there exists a nonterminal symbol that can derive
[1]​
to a ​ sentential form​ with itself as the leftmost symbol.​ Symbolically,
,
where indicates the operation of making one or more substitutions, and is any sequence
of terminal and nonterminal symbols.
Direct left recursion
Direct left recursion occurs when the definition can be satisfied with only one substitution. It
requires a rule of the form
where is a sequence of nonterminals and terminals. For example, the rule
is directly left-recursive. A left-to-right
​ recursive descent parser​ for this rule might look like and such code would fall into
infinite recursion when executed.
Indirect left recursion
Indirect left recursion occurs when the definition is satisfied via several substitutions. It entails
a set of rules following the pattern

where are sequences that can each yield the empty string, while may be any
sequences at all. The derivation then gives as leftmost in its final sentential form.

Removing left recursion:

T.E. Comp. SEM VI SPCC (CSL601)


Left recursion often poses problems for parsers, either because it leads them into infinite
recursion (as in the case of most ​ top-down parsers​ ) or because they expect rules in a
normal form that forbids it (as in the case of many ​ bottom-up parsers​ ,including the
​ CYK algorithm​ ). Therefore a grammar is often preprocessed to eliminate the left
recursion.
Removing direct left recursion
The general algorithm to remove direct left recursion follows. Several improvements to this
method have been made.​ [2]​ For a left-recursive nonterminal , discard any rules of the
form and consider those that remain:

where: each is a nonempty sequence of nonterminals and terminals, and


each is a sequence of nonterminals and terminals that does not start with .
Replace these with two sets of productions, one set for :

and another set for the fresh nonterminal (often called the "tail" or the "rest"):

Repeat this process until no direct left recursion remains.


As an example, consider the rule set
This could be rewritten to avoid left recursion as

Removing all left recursion


By establishing a ​ topological ordering​ on nonterminals, the above process can be extended
to also eliminate indirect left recursion
Input​ ​ A grammar: a set of nonterminals and their productions
Output​ ​ A modified grammar generating the same language but without left
recursion

1. For each nonterminal :


1. Repeat until an iteration leaves the grammar unchanged:
1. For each rule , being a sequence of terminals
and nonterminals:
1. If begins with a nonterminal and :
1. Let be without its leading .
2. Remove the rule .
3. For each rule :
1. Add the rule .
2. Remove direct left recursion for as described above.
Note that this algorithm is highly sensitive to the nonterminal ordering; optimizations often
focus on choosing this ordering well

Conclusion: - ​ We study recursion and Implemented Program for Removal of Left


Recursion using direct and indirect method.
T.E. Comp. SEM VI SPCC (CSL601)
Experiment No. 6
AIM:​ To implement of any one parser (LL-1,SLR,Operator Precedence
parser).​ (JAVA/C/C++)​ .
Objective:​ ​ Develop a program to implement
a. Predictive parser
b. Operator precedence parser

Outcome: ​ Students are able to understand various parsing techniques. Also they are able
to implement a program to generate parsing table for respective technique.
Theory:

A special case of top-down parsing without backtracking is called a predictive parsing. While
writing grammar if we eliminate left recursion from it, and left factoring the grammar, we can
obtain a grammar that can be parsed by a recursive-descent parser without backtracking is
called a predictive parser.
A non-recursive predictive parser is build using stack. The main problem in predictive parsing
is, how to decide which production to be applied for a non-terminal. The non-recursive parser
given in figure 8.12, looks for the production to be applied in parsing table which is
constructed from certain grammars.

FIGURE: Model of a non-recursive predictive parser

Algorithm- ​ Construction of predictive parsing table Input-​ Grammar


G ​ Output-​ Parsing Table Method-

1. For each production A​ →​ α of the grammar do steps 2 and 3


2. For each terminal a in FIRST(α), add A​ →​ α to M[A,a]
3. If ε is in FIRST(α), add A​ →​ α M[A,b] for each terminal b in FOLLOW(A).
4. If ε is in FIRST(α) and $ is in FOLLOW(A), add A​ →​ α to M[A,$]
5. Make each undefined entry of M error.

Conclusion:​ Thus, we have studied how to generate parsing table for LL(1) parser parser.

T.E. Comp. SEM VI SPCC (CSL601)


Experiment No. 07
AIM : To study and implement any intermediate code generation(ex: Three address

code)(C,C++,JAVA,PYTHON,LEX)​ .
Objective: ​ Develop a program to implement intermediate code generation using three
address code.​
Outcome: ​ Students are able to appreciate role of intermediate code various techniques.
Theory:
​ Intermediate representation of the program after first three phases of the compile i.e.
lexical,syntax and semantic analysis.
Intermediate Representation (IR) gives
● clean and abstract machine language
● target machine operations are expressed by it
● independent of machine
● not dependent on any source language

If we generate machine code directly from source code then for n target machine we will have
n optimisers and n code generators but if we will have a machine independent intermediate
code,we will have only one optimiser. Intermediate code can be either language specific (e.g.,
Bytecode for Java) or language. independent (three-address code).

The following are commonly used intermediate code representation:


1)Postfix Notation –
The ordinary (infix) way of writing the sum of a and b is with operator in the middle : a + b
The postfix notation for the same expression places the operator at the right end as ab +. In
general, if e1 and e2 are any postfix expressions, and + is any binary operator,
Example –​ The postfix representation of the expression (a – b) * (c + d) + (a – b) is : ab –
cd
+ *ab -+. Read more: Infix​ to Postfix

2)Three-Address Code –
A statement involving no more than three references(two for operands and one for result) is
known as three address statement. A sequence of three address statements is known as three
address code. Three address statement is of the form x = y op z , here x, y, z will have address

T.E. Comp. SEM VI SPCC (CSL601)


(memory location). Sometimes a statement might contain less than three references but it is
still called three address statement.
Example –​ The three address code for the expression a + b * c + d :
T1=b*c
T2=a+T1
T3=T2+d
T 1 , T 2 , T 3 are temporary variables.
3)Syntax Tree –​ Syntax tree is nothing more than condensed form of a parse tree. The
operator and keyword nodes of the parse tree are moved to their parents and a chain of single
productions is replaced by single link in syntax tree the internal nodes are operators and child
nodes are operands. To form syntax tree put parentheses in the expression, this way it's easy to
recognize which operand should come first.
Example – x​ = (a + b * c) / (a – b * c)

Conclusion: ​ Thus we have studied various intermediate code generation techniques and
implemented the Three address code successfully.

T.E. Comp. SEM VI SPCC (CSL601)


Experiment No. 8
AIM: To study & implement Code Generation Algorithm.
Objective: ​ Understand intermediate code generation and code generation phase of compiler.
Develop a program to generate target code

Outcome:​ ​ Students are able to implement​ ​ a program to generate target code


Theory:
The last phase of compiler is the code generator. The intermediate representation of the source
program is given as input to code generator and it produces target program as an output as
shown in figure.

FIGURE Position of code generator

A Code-Generation algorithm-

The code generation algorithm takes a sequence of three-address statements as a input from a
basic block. For each three-address statement of the form ​ x:=y op z we do the following
actions:
1. To store the result of ​ y op z, determine a location ​ L​ ,by invoking ​ getreg
function. ​ L may be a register or a memory location.
2. To determine ​ y’​ ,current location of ​ y​ ,consult address descriptor of ​ y​ .If
value of ​ y​ is currently both in memory and register prefer register for ​ y’​ .If
value of​ y​ is not already in ​ L​ then generate instruction ​ MOV L,y’ ​ to
place a copy of ​ y’​ in ​ L​ .
3. Generate instruction ​ op z’, L​ where ​ z’ ​ is current location of ​ z​ .If
​ z​ is in both register and memory, prefer value from register. Update address
descriptor of ​ x ​ to indicate value of ​ x​ is in location ​ L. ​ If ​ L ​ is a
register, update its register descriptor to indicate that it contains value of ​ x ​ and
remove ​ x​ from all other register descriptors.
4. If the current values of ​ y and ​ z are not used in future or they are not live after exit
from the block and are in the register, alter the register descriptor to indicate that those
register no longer contains the value of ​ y ​ and ​ z.

Conclusion: ​ Thus we have implemented code generation algorithm.

T.E. Comp. SEM VI SPCC (CSL601)


Experiment No. 9
AIM : To Study and implement Lexical Analyzer.
Objective:​ ​ Develop a skill to implement lexical analyzer, Develop a program different
program for Flex. Using Lex tool

Outcome: ​ Students are able to design and implement lexical analyzer for given language.

Theory:

The very first phase of compiler is lexical analysis. The lexical analyzer read the input
characters and generates a sequence of tokens that are used by parser for syntax analysis. The
figure 7.1 summarizes the interaction between lexical analyzer and parser.

The lexical analyzer is usually implemented as subroutine or co-routine of the parser. When
the “get next token” command received from parser, the lexical analyzers read input characters
until it identifies next token.Lexical analyzer also performs some secondary tasks at the user
interface, such as stripping out comments and white spaces in the form of blank, tab and
newline characters. It also correlates error messages from compiler to source program. For
example lexical analyzer may keep track of number of newline characters and correlate the
line number with an error message. In some compilers, a lexical analyzer may create a copy of
source program with error messages marked in it.
An important notation used to specify patterns is a regular expression.Each pattern matches a
set of strings. Regular expression will serve as a name for set of strings.
A ​ recognizer​ for a language is a program that takes as input a string ​ x​ and answer
“yes” if ​ x​ is a sentence of the language and “no” otherwise.
We compile a regular expression into a recognizer by constructing a generalized transition
diagram called a ​ finite automaton​ .
LEX :
Lex is a program generator designed for lexical processing of character input streams.
It accepts a high-level, problem oriented specification for character string matching, and
produces a program in a general purpose language which recognizes regular expressions. The
regular expressions are specified by the user in the source specifications given to Lex. The Lex
written code recognizes these expressions in an input stream and partitions the input stream
into strings matching the expressions. At the boundaries between strings program sections
provided by the user are executed. The Lex source file associates the regular expressions and
the program fragments. As each expression appears in the input to the program written by Lex,
the corresponding fragment is executed.
T.E. Comp. SEM VI SPCC (CSL601)
Lex variables¥
Yyin Of the type FILE*. This points to the current file being parsed by the lexer.
Yyout Of the type FILE*. This points to the location where the output of the lexer
will be written. By default, both yyin and yyout point to standard input and output.
Yytext The text of the matched pattern is stored in this variable (char*).
Yyleng Gives the length of the matched pattern.
Yylineno Provides current line number information. (May or may not be supported by the lexer.)

Lex functions
yylex() The function that starts the analysis. It is automatically generated by Lex.
yywrap() This function is called when end of file (or input) is encountered. If this
function returns 1, the parsing stops. So, this can be used to parse multiple
files. Code can be written in the third section, which will allow multiple
files to be parsed. The strategy is to make yyin file pointer (see the
preceding table) point to a different file until all the files are parsed. At the
end, yywrap() can return 1 to indicate end of parsing.
yyless(int n) This function can be used to push back all but first „n‟ characters of the
read token.
yymore() This function tells the lexer to append the next token to the current token.
Regular Expressions
Character Meaning
A-Z, 0-9, a- Characters and numbers that form part of the pattern.
z
. Matches any character except \n.
Used to denote range. Example: A-Z implies all characters from A to Z.
-
A character class. Matches any character in the brackets. If the first character is ^ then it
[] indicates a negation pattern. Example: [abC] matches either of a, b, and C.
* Match zero or more occurrences of the preceding pattern.
Matches one or more occurrences of the preceding pattern.(no empty string).
+ Ex: [0-9]+ matches “1”,”111” or “123456” but not an empty string.
Matches zero or one occurrences of the preceding pattern. Ex: -?[0-9]+ matches a signed
? number including an optional leading minus.
$ Matches end of line as the last character of the pattern.
1) Indicates how many times a pattern can be present. Example: A{1,3} implies one to
{} three occurrences of A may be present.
2) If they contain name, they refer to a substitution by that name. Ex: {digit}
Used to escape meta characters. Also used to remove the special meaning of characters
\ as defined in this table.

Examples of token declarations


Token Associated expression Meaning
Number ([0-9])+ 1 or more occurrences of a digit
Chars [A-Za-z] Any character
Blank "" A blank space
Word (chars)+ 1 or more occurrences of chars
Variable (chars)+(number)*(chars)*( number)*

Install lex : $sudo apt-get install flex/lex


$sudo apt-get install bison

T.E. Comp. SEM VI SPCC (CSL601)


Steps in writing LEX Program:
1. Step: Using gedit create a file with extension l. For example: prg1.l
2. Step: lex prg1.l
3. Step: cc lex.yy.c –ll
4. Step: ./a.out

Structure of LEX source program:


lex is a scanner generator,it has three parts

declaration part
%%
pattern{action}
%%
auxiliary part
example: %{#include<stdio.h>
%}
%%
[a-z] { printf(“hello”);}
[0-9] {printf(“TE students”);}
%%

Write a LEX program to recognize valid arithmetic expression. Identifiers in the


expression could be only integers and operators could be + and *. Count the identifiers &
operators present and print them separately.
%{
int a[]={0,0,0,0},i,valid=1,opnd=0;
%}
%x OPER
%%
[0-9]+ { BEGIN OPER; opnd++;}
<OPER>"+" { if(valid) { valid=0;i=0;} else ext();}
<OPER>"-" { if(valid) { valid=0;i=1;} else ext();}
<OPER>"*" { if(valid) { valid=0;i=2;} else ext();}
<OPER>"/" { if(valid) { valid=0;i=3;} else ext();}
<OPER> [a-zA-Z0-9]+ { opnd++;
if(valid==0)
{
valid=1; a[i]++;
}
Else
ext();
}
<OPER> "¥n" { if(valid==0)
ext();
else
return 0;
}
.¥n ext();
%%
ext()
{
printf(" Invalid Expression ¥n");
exit(0);
}
T.E. Comp. SEM VI SPCC (CSL601)
main()
{
printf(" Type the arithmetic Expression ¥n");
yylex();
printf(" Valid Arithmetic Expression ¥n");
printf(" No. of Operands/Identifiers : %d ¥n ",opnd);
printf(" No. of Additions : %d ¥n No. of Subtractions : %d ¥n",a[0],a[1]); printf(" No. of
Multiplications : %d ¥n No. of Divisions : %d ¥n",a[2],a[3]);
}

Output: $ gedit exp.l


$ lex exp.l
$ cc lex.yy.c -ll
$ ./a.out
Type the arithmetic Expression
6*3
Valid expression
No. of operands/identifiers:2
No. of additions: 0
No. of multiplication :1

Conclusion: ​ Thus, we understand the role of lexical analyzer and implement


lexical program using flex tool.

T.E. Comp. SEM VI SPCC (CSL601)


Experiment No. 10
AIM : To Study and implement YACC.
Objective:​ ​ Develop a skill to implement Syntax analyzer Using YACC

Outcome: ​ Students are able to design and implement syntax analyzer YACC for given
language.
THEORY:
INTRODUCTION TO YACC

YACC: Lex programs recognize only regular expressions; Yacc writes parsers that accept a
large class of context free grammars, but require a lower level analyzer to recognize input
tokens. Thus, a combination of Lex and Yacc is often appropriate. When used as a pre-
processor for a later parser generator, Lex is used to partition the input stream, and the parser
generator assigns structure to the resulting pieces.

YACC provides a general tool for imposing structure on the input to a computer program. The
input specification is a collection of grammar rules. Each rule describes an allowable structure
and gives it a name. YACC prepares a specification of the input process. YACC generates a
function to control the input process. This function is called a parser.

The name is an acronym for “Yet Another Compiler Compiler”. YACC


generates the code for the parser in the C programming language. YACC was developed at
AT& T for the Unix operating system. YACC has also been rewritten for other languages,
including Java, Ada. The function parser calls the lexical analyzer to pick up the tokens from the
input stream. These tokens are organized according to the input structure rules .The input
structure rule is called as grammar. When one of the rule is recognized, then user code supplied
for this rule ( user code is action) is invoked. Actions have the ability to return values and makes
use of the values of other actions.

T.E. Comp. SEM VI SPCC (CSL601)


Structure of YACC source program:

Basic Specification:

Every YACC specification file consists of three sections. The declarations, Rules (of grammars),
programs. The sections are separated by double percent “%%” marks. The % is generally used
in YACC specification as an escape character.
The general format for the YACC file is very similar to that of the lex file.
{definations}
%%
{rules}
%%
{ user subroutines}
%% is a delimiter to the mark the beginning of the Rule section.

Defination Section

%union It defines the Stack type for the Parser. It is a union of various datas/structures/
Objects
%token These are the terminals returned by the yylex function to the YACC. A token can also
have type associated with it for good type checking and syntax directed translation. A
type of a token can be specified as %token <stack member>tokenName.
Ex: %token NAME NUMBER
%type The type of a non-terminal symbol in the Grammar rule can be specified with this. The
format is %type <stack member>non-terminal.
%noassoc Specifies that there is no associatively of a terminal symbol.
%left Specifies the left associatively of a Terminal Symbol
%right Specifies the right associatively of a Terminal Symbol.
%start Specifies the L.H.S non-terminal symbol of a production rule which should be taken as
the starting point of the grammar rules.
%prec Changes the precedence level associated with a particular rule to that of the following
token name or literal

Rules Section: it’s consisted of a list of grammar rules. A grammar rule has the form:
A: BODY A represents a nonterminal name; the colon and the semicolon are YACC
punctuation and BODY represents names and literals. The names used in the body of a
grammar rule may represent tokens or nonterminal symbols. The literal consists of a character
enclosed in single quotes.

Program:
Write YACC program to evaluate arithmetic expression involving operators: +, -,*, and /.
Lex Part
%{
#include "y.tab.h"
extern yylval;
%}
%%
[0-9]+ {yylval=atoi(yytext);return num;} /* convert the string to number and send
the value*/ [¥+¥-¥*¥/] {return yytext[0];}
[)] {return yytext[0];}
T.E. Comp. SEM VI SPCC (CSL601)
[(] {return yytext[0];}
. {;}
¥n {return 0;}
%%

YACC Part
%{
#include<stdio.h>
#include <stdlib.h>
%}
%token num
%left '+' '-'
%left '*' '/'
%% input:exp {printf("%d¥n",$$);exit(0);}
exp:exp'+'exp {$$=$1+$3;}
|exp'-'exp{$$=$1-$3;}
|exp'*'exp{$$=$1*$3;}
|exp'/'exp { if($3==0){printf("Divide by Zero error¥n");exit(0);} else $$=$1/$3;}
|'('exp')'{$$=$2;}
|num{$$=$1;};
%%
int yyerror()
{
printf("error");
exit(0);
}
int main()
{
printf("Enter an expression:¥n");
yyparse();
}

Output:
$ gedit exp2.l
$ gedit exp2.y
$ lex exp2.l
$ yacc -d exp2.y
$ cc lex.yy.c y.tab.c -ll
$ ./a.out
Enter an Expression:
(3+9)*(5*2)
120

Conclusion: ​ Thus, we understand the role of Syntax analyzer and implement


YACC program successfully.

T.E. Comp. SEM VI SPCC (CSL601)

You might also like