Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Netaji Subhas University of Technology

A STATE UNIVERSITY UNDER DELHI ACT 06 OF 2018, GOVT. OF NCT OF DELHI

Azad Hind Fauj Marg, Sector-3, Dwarka, New Delhi-110078

LABORATORY FILE
Principle of Compiler Construction
(COCSC14)

Submitted By:
Name: Shobhit
Roll Number: 2021UCS1618
Branch: CSE-3
Index
S.No. Practical Dates Remarks
1. To setup Lex(Flex) and Yacc(Bison) then print 29.08.2023
“Hello world!” using them.

2. Write a program to separate tokens in Lex. 05.09.2023

3. Write a program to implement the lexical 12.09.2023


analysis phase of compiler.

4. Develop a lexical and syntax analyser for the 03.10.2023


same using the LEX and YACC tools. Also,
implement the bookkeeper module.

5. To build a simple calculator in Lex and Yacc. 10.10.2023

6. Represent ‘C’ language using Context Free 17.10.2023


Grammar.

7. Implement a two-pass assembler 8085/8086. 31.10.2023

8. Add assignment statement, If then else statement 07.11.2023


and while loop to the calculator and generate the
three address code for the same.
Intro Class
Aim: To setup Lex(Flex) and Yacc(Bison) then print “Hello world!” using
them.

Theory:
Lex (lexical analyzer) and Yacc (yet another compiler compiler) are powerful tools used in
compiler construction. They aid in the process of translating source code into structured data
that can be processed by compilers or interpreters.

- Lex generates lexical analyzers, which break down input text into smaller units called
tokens using regular expressions.
- Yacc generates parsers that analyze tokens and determine the structure of input using
context-free grammars.
To Install in Ubuntu/Debian:
- sudo apt update && sudo apt install bison flex -y
To Compiler && Run:
- lex hello.l
- yacc -d hello.y
- gcc lex.yy.c y.tab.c -o hello -ll
- ./hello

Code:
Hello.l (Lex):
%{
#include "y.tab.h"
%}

%%
"hello" { return HELLO; }
[ \t\n] ; /* Skip whitespace and newlines */
. ; /* Ignore any other characters */

%%

int yywrap() {
return 1;
}
Hello.y(Yacc)
%{
#include <stdio.h>
int yylex();
void yyerror(const char* msg);
%}

%token HELLO

%%
start: HELLO { printf("Hello, World!\n"); }
%%

void yyerror(const char* msg) {


fprintf(stderr, "Error: %s\n", msg);
// You can add error recovery code here if needed
}

int main() {
yyparse();
return 0;
}

Output:
Experiment-1
Aim: Write a program to separate tokens in Lex

Theory:
Tokenization is the process of breaking a sequence of characters into meaningful
chunks, called tokens. Tokens are fundamental building blocks in programming
languages, representing keywords, identifiers, constants, operators, and more. In this
experiment, you will use Lex to tokenize a sample C code file and categorize different
parts of the code into various types of tokens.

Code:
Tokens.l
%{
%}

%%

[0-9]+[.][0-9]+ printf("%s is a floating point number\n",yytext);

int|float|char|double|void printf("%s is a datatype\n",yytext);

[0-9]+ printf("%s is an integer number\n",yytext);

[a-z]+[()] printf("%s is a function\n",yytext);

[a-z]+ printf("%s is an identifier\n",yytext);

[+=*/-] printf("%s is an operator\n",yytext);

; printf("%s is an delimiter\n",yytext);

, printf("%s is a separator\n",yytext);

[#][a-z\.h]+ printf("%s is a preprocessor\n",yytext);


%%

int yywrap(void)
{
return 1;
}
int main()
{
// reads input from a file named test.c rather than terminal
freopen("test.c", "r", stdin);
yylex();
return 0;
}

Test.c
int main()
{
int a = 10, b = 20;
int c = 0;
// find the greater integer
if (a < b)
c = b;
else
c = a;
// c is now the greater integer
return 0;
}
Output:

9
Experiment-2
Aim: Write a program to implement the lexical analysis phase of compiler.

Theory:
The lexical analysis phase, often referred to as the lexer or scanner, is the initial stage of a
compiler responsible for transforming the source code into a structured stream of tokens, where
tokens are the fundamental units like keywords, identifiers, numbers, and symbols, while
eliminating extraneous elements like whitespace and comments. This crucial process lays the
foundation for subsequent compiler phases, enabling syntax analysis and further translation or
execution of the program.

Code:
Tokens.l
%{
%}

%%

[0-9]+[.][0-9]+ printf("%s is a floating point number\n",yytext);


int|float|char|double|void printf("%s is a datatype\n",yytext);
[0-9]+ printf("%s is an integer number\n",yytext);
[a-z]+[()] printf("%s is a function\n",yytext);
[a-z]+ printf("%s is an identifier\n",yytext);
[+=*/-] printf("%s is an operator\n",yytext);
; printf("%s is an delimiter\n",yytext);
, printf("%s is a separator\n",yytext);
[#][a-z\.h]+ printf("%s is a preprocessor\n",yytext);

%%

int yywrap(void)
{
return 1;
}

int main()
{
// reads input from a file named test.c rather than terminal
freopen("test.c", "r", stdin);
yylex();
return 0;
}
Input.c
int main()
{
int a = 10, b = 20;
int c = 0;
// find the greater integer
if (a < b)
c = b;
Else
c = a;
// c is now the greater integer
return 0;
}
Output:

9
Experiment-5
Aim: Develop a lexical and syntax analyser for the same using the LEX and YACC tools.
Also, implement the bookkeeper module.

Theory:
In the process of compiler construction, the Lexical Analysis phase, implemented using tools
like LEX, transforms the source code into a sequence of tokens based on defined regular
expressions. The Syntax Analysis phase, facilitated by tools like YACC or Bison, parses these
tokens according to a specified context-free grammar, creating a parse tree. A bookkeeper
module manages symbol tables, type checking, and scope handling. These phases ensure the
conversion of human-readable source code into a structured representation for further
compilation stages.

Code:
Lexer.l
%{
#include "y.tab.h"
int countn=0;
%}

%option yylineno

alpha [a-zA-Z]
digit [0-9]
unary "++"|"--"

%%

"printf" { return PRINTFF; }


"scanf" { return SCANFF; }
"int" { return INT; }
"float" { return FLOAT; }
"char" { return CHAR; }
"void" { return VOID; }
"return" { return RETURN; }
"for" { return FOR; }
"if" { return IF; }
"else" { return ELSE; }
^"#include"[ ]*<.+\.h> { return INCLUDE; }
"true" { return TRUE; }
"false" { return FALSE; }
[-]?{digit}+ { return NUMBER; }
[-]?{digit}+\.{digit}{1,6} { return FLOAT_NUM; }
{alpha}({alpha}|{digit})* { return ID; }
{unary} { return UNARY; }
"<=" { return LE; }
">=" { return GE; }
"==" { return EQ; }
"!=" { return NE; }
">" { return GT; }
"<" { return LT; }
"&&" { return AND; }
"||" { return OR; }
"+" { return ADD; }
"-" { return SUBTRACT; }
"/" { return DIVIDE; }
"*" { return MULTIPLY; }
\/\/.* {;}
\/\*(.*\n)*.*\*\/ {;}
[ \t]* {;}
[\n] { countn++; }
. { return *yytext; }
["].*["] { return STR; }
['].['] { return CHARACTER; }

%%

int yywrap() {
return 1;
}

Parser.y
%{
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include<ctype.h>
#include"lex.yy.c"

void yyerror(const char *s);


int yylex();
int yywrap();
void add(char);
void insert_type();
int search(char *);
void insert_type();
struct dataType {
char * id_name;
char * data_type;
char * type;
int line_no;
} symbol_table[40];

int count=0;
int q;
char type[10];
extern int countn;
%}

%token VOID CHARACTER PRINTFF SCANFF INT FLOAT CHAR FOR IF ELSE TRUE FALSE
NUMBER FLOAT_NUM ID LE GE EQ NE GT LT AND OR STR ADD MULTIPLY DIVIDE
SUBTRACT UNARY INCLUDE RETURN

%%

program: headers main '(' ')' '{' body return '}'


;

headers: headers headers


| INCLUDE { add('H'); }
;

main: datatype ID { add('F'); }


;

datatype: INT { insert_type(); }


| FLOAT { insert_type(); }
| CHAR { insert_type(); }
| VOID { insert_type(); }
;

body: FOR { add('K'); } '(' statement ';' condition ';' statement ')' '{' body '}'
| IF { add('K'); } '(' condition ')' '{' body '}' else
| statement ';'
| body body
| PRINTFF { add('K'); } '(' STR ')' ';'
| SCANFF { add('K'); } '(' STR ',' '&' ID ')' ';'
;
else: ELSE { add('K'); } '{' body '}'
|
;

condition: value relop value


| TRUE { add('K'); }
| FALSE { add('K'); }
|
;

statement: datatype ID { add('V'); } init


| ID '=' expression
| ID relop expression
| ID UNARY
| UNARY ID
;

init: '=' value


|
;

expression: expression arithmetic expression


| value
;

arithmetic: ADD
| SUBTRACT
| MULTIPLY
| DIVIDE
;

relop: LT
| GT
| LE
| GE
| EQ
| NE
;

value: NUMBER { add('C'); }


| FLOAT_NUM { add('C'); }
| CHARACTER { add('C'); }
| ID
;
return: RETURN { add('K'); } value ';'
|
;

%%

int main() {
yyparse();
printf("\n\n");
printf("\t\t\t\t\t\t\t\t PHASE 1: LEXICAL ANALYSIS \n\n");
printf("\nSYMBOL DATATYPE TYPE LINE NUMBER \n");
printf("_______________________________________\n\n");
int i=0;
for(i=0; i<count; i++) {
printf("%s\t%s\t%s\t%d\t\n", symbol_table[i].id_name, symbol_table[i].data_type,
symbol_table[i].type, symbol_table[i].line_no);
}
for(i=0;i<count;i++) {
free(symbol_table[i].id_name);
free(symbol_table[i].type);
}
printf("\n\n");
}

int search(char *type) {


int i;
for(i=count-1; i>=0; i--) {
if(strcmp(symbol_table[i].id_name, type)==0) {
return -1;
break;
}
}
return 0;
}

void add(char c) {
q=search(yytext);
if(!q) {
if(c == 'H') {
symbol_table[count].id_name=strdup(yytext);
symbol_table[count].data_type=strdup(type);
symbol_table[count].line_no=countn;
symbol_table[count].type=strdup("Header");
count++;
}
else if(c == 'K') {
symbol_table[count].id_name=strdup(yytext);
symbol_table[count].data_type=strdup("N/A");
symbol_table[count].line_no=countn;
symbol_table[count].type=strdup("Keyword\t");
count++;
}
else if(c == 'V') {
symbol_table[count].id_name=strdup(yytext);
symbol_table[count].data_type=strdup(type);
symbol_table[count].line_no=countn;
symbol_table[count].type=strdup("Variable");
count++;
}
else if(c == 'C') {
symbol_table[count].id_name=strdup(yytext);
symbol_table[count].data_type=strdup("CONST");
symbol_table[count].line_no=countn;
symbol_table[count].type=strdup("Constant");
count++;
}
else if(c == 'F') {
symbol_table[count].id_name=strdup(yytext);
symbol_table[count].data_type=strdup(type);
symbol_table[count].line_no=countn;
symbol_table[count].type=strdup("Function");
count++;
}
}
}

void insert_type() {
strcpy(type, yytext);
}

void yyerror(const char* msg) {


fprintf(stderr, "%s\n", msg);
}

Input1.c
#include<stdio.h>
#include<string.h>
int main() {
int a;
int x=1;
int y=2;
int z=3;
x=3;
y=10;
z=5;
if(x>5) {
for(int k=0; k<10; k++) {
y = x+3;
printf("Hello!");
}
} else {
int idx = 1;
}
for(int i=0; i<10; i++) {
printf("Hello World!");
scanf("%d", &x);
if (x>5) {
printf("Hi");
}
for(int j=0; j<z; j++) {
a=1;
}
}
return 1;
}

Commands:
Output:
Experiment-7
Aim: To build a simple calculator in Lex and Yacc

Theory:
In this calculator program, we are using Lex and Yacc to tokenize and parse arithmetic
expressions. Lex generates a lexical analyzer, and Yacc generates a parser to evaluate
the expressions.

Code:
Calc.l
%{
#include<stdio.h>
#include "y.tab.h"
extern int yylval;
%}

%%
[0-9]+ {
yylval=atoi(yytext);
return NUMBER;
}
[\t] ;
[\n] return 0;
. return yytext[0];
%%
int yywrap()
{
return 1;
}

Calc.y
%{
#include<stdio.h>
int flag=0;
%}
%token NUMBER
%left '+' '-'
%left '*' '/' '%'
%left '(' ')'
%%

ArithmeticExpression: E{
printf("\nResult=%d\n",$$);
return 0;
};
E:E'+'E {$$=$1+$3;}
|E'-'E {$$=$1-$3;}
|E'*'E {$$=$1*$3;}
|E'/'E {$$=$1/$3;}
|E'%'E {$$=$1%$3;}
|'('E')' {$$=$2;}
| NUMBER {$$=$1;}
;
%%

void main()
{
printf("\nEnter Any Arithmetic Expression which can have operations Addition,
Subtraction, Multiplication, Divison, Modulus and Round brackets:\n");
yyparse();
if(flag==0)
printf("\nEntered arithmetic expression is Valid\n\n");
}

void yyerror()
{
printf("\nEntered arithmetic expression is Invalid\n\n");
flag=1;
}
Output:
Experiment-11
Aim: Represent ‘C’ language using Context Free Grammar.

Theory:
The provided Context-Free Grammar (CFG) defines a simplified version of the C programming
language. It uses symbols like `<program>` to represent language structures, terminals like `ID`
for specific elements, and production rules to outline how various parts of the language fit
together. While it doesn't cover the entire C language, this CFG serves as a basis for creating a
parser capable of understanding and analyzing the structure of C code.The provided
Context-Free Grammar (CFG) defines a simplified version of the C programming language. It
uses symbols like `<program>` to represent language structures, terminals like `ID` for specific
elements, and production rules to outline how various parts of the language fit together. While it
doesn't cover the entire C language, this CFG serves as a basis for creating a parser capable of
understanding and analyzing the structure of C code.

Context Free Grammer:


program -> declaration_list

declaration_list -> declaration | declaration_list declaration

declaration -> var_declaration | func_declaration

var_declaration -> type_specifier ID ;

type_specifier -> int | float | char

func_declaration -> type_specifier ID ( params ) compound_stmt

params -> param_list | empty

param_list -> param | param_list , param

param -> type_specifier ID

compound_stmt -> { local_declarations statement_list }

local_declarations -> local_declarations var_declaration | empty

statement_list -> statement | statement_list statement

statement -> expression_stmt | compound_stmt | selection_stmt | iteration_stmt | return_stmt


expression_stmt -> expression ;
expression_stmt -> ;

selection_stmt -> if ( expression ) statement | if ( expression ) statement else statement

iteration_stmt -> while ( expression ) statement | for ( expression_stmt ; expression ;


expression_stmt ) statement

return_stmt -> return ; | return expression ;

expression -> var = expression| simple_expression

var -> ID

simple_expression -> additive_expression relop additive_expression | additive_expression

relop -> < | <= | > | >= | == | !=

additive_expression -> additive_expression addop term | term

addop -> + | -

term -> term mulop factor | factor

mulop -> * | /

factor -> ( expression ) | var | call | INT_LITERAL | FLOAT_LITERAL | CHAR_LITERAL

call -> ID ( args )

args -> arg_list | empty

arg_list -> expression | arg_list , expression

empty ->
ID -> [a-zA-Z_][a-zA-Z0-9_]*
INT_LITERAL -> [0-9]+
FLOAT_LITERAL -> [0-9]+\.[0-9]+
CHAR_LITERAL -> '[a-zA-Z0-9]'

Result:
Given Context Free Grammer defines the rules for ‘C’ language.
Practical 9
Aim: Implement a two-pass assembler 8085/8086

Theory:
A two-pass assembler for 8085/8086 in C involves a two-step process to convert assembly
language code into machine code. In the first pass, the assembler scans the source code,
creates a symbol table with addresses for labels and variables, and generates intermediate
code. Syntax errors are checked at this stage. In the second pass, the assembler uses the
symbol table to replace symbolic addresses with actual values, resolves forward references, and
generates the final machine code. The two-pass approach ensures that all symbols are correctly
addressed and facilitates the creation of error-free machine code for execution by the target
processor. Implementation in C requires data structures for the symbol table and intermediate
code, along with algorithms for address resolution and code generation.

Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_SIZE 100

typedef struct {
char label[10];
int address;
} SymbolTableEntry;

typedef struct {
char opcode[10];
char operand[10];
char label[10];
int location;
} IntermediateCode;

void passOne(FILE *inputFile, SymbolTableEntry symbolTable[], int *symbolTableSize);


void passTwo(FILE *inputFile, SymbolTableEntry symbolTable[], int symbolTableSize,
IntermediateCode intermediateCode[], int *intermediateCodeSize);

int main() {
FILE *inputFile;
SymbolTableEntry symbolTable[MAX_SIZE];
int symbolTableSize = 0;
IntermediateCode intermediateCode[MAX_SIZE];
int intermediateCodeSize = 0;
inputFile = fopen("input.asm", "r");
if (inputFile == NULL) {
perror("Error opening file");
return 1;
}
passOne(inputFile, symbolTable, &symbolTableSize);
fseek(inputFile, 0, SEEK_SET);
passTwo(inputFile, symbolTable, symbolTableSize, intermediateCode,
&intermediateCodeSize);
fclose(inputFile);
printf("\nIntermediate Code:\n");
printf("Location\tOperand\tLabel\tOpcode\n");
for (int i = 0; i < intermediateCodeSize; i++) {
printf("%d\t\t%s\t%s\t%s\n", intermediateCode[i].location, intermediateCode[i].operand,
intermediateCode[i].label, intermediateCode[i].opcode);
}
return 0;
}
void passOne(FILE *inputFile, SymbolTableEntry symbolTable[], int *symbolTableSize) {
char line[100];
int locationCounter = 0;

while (fgets(line, sizeof(line), inputFile) != NULL) {


char *token = strtok(line, " \t\n");

while (token != NULL) {


if (token[strlen(token) - 1] == ':') {
token[strlen(token) - 1] = '\0';
strcpy(symbolTable[*symbolTableSize].label, token);
symbolTable[*symbolTableSize].address = locationCounter;
(*symbolTableSize)++;
}

if (strcmp(token, "ADD") == 0 || strcmp(token, "SUB") == 0 || strcmp(token, "MOV") == 0 ||


strcmp(token, "HLT") == 0) {
locationCounter += 3;
}
token = strtok(NULL, " \t\n");
}
}
}
void passTwo(FILE *inputFile, SymbolTableEntry symbolTable[], int symbolTableSize,
IntermediateCode intermediateCode[], int *intermediateCodeSize) {
char line[100];
int locationCounter = 0;
while (fgets(line, sizeof(line), inputFile) != NULL) {
char opcode[10];
char operand[10];
char label[10];
sscanf(line, "%s %s %s", opcode, operand, label);
for (int i = 0; i < symbolTableSize; i++) {
if (strcmp(operand, symbolTable[i].label) == 0) {
sprintf(operand, "%d", symbolTable[i].address);
break;
}
}
strcpy(intermediateCode[*intermediateCodeSize].opcode, opcode);
strcpy(intermediateCode[*intermediateCodeSize].operand, operand);
strcpy(intermediateCode[*intermediateCodeSize].label, label);
intermediateCode[*intermediateCodeSize].location = locationCounter;
(*intermediateCodeSize)++;
locationCounter += 3;
}
}

Input.asm file :
Output :

Conclusion:
Two-pass assembler for 8085/8086 successfully implemented in C lang.
Practical 12
Aim : Add assignment statement, If then else statement and while loop to the
calculator and generate the three address code for the same.

Theory :
In the context of enhancing a calculator program, the addition of control flow constructs like
assignment statements, if-then-else statements, and while loops contributes to its functionality.
An assignment statement allocates values to variables, allowing for dynamic input. The
if-then-else statement introduces conditional logic, enabling the calculator to make decisions
based on specified conditions. Meanwhile, the inclusion of a while loop facilitates repetitive
calculations until a certain condition is met. To represent these constructs in a format suitable
for machine processing, the three-address code is generated. This code consists of instructions
using three operands, where the first two represent the source operands, and the third
represents the destination operand. It offers a structured way to represent complex
computations and control flow in a form that can be easily translated into machine code or
intermediate code for further processing.

Code :
#include <stdio.h>
#include <string.h>
int i, choice, j, l, address = 100;
char userInput[10], expr[10], expr1[10], expr2[10], id1[5], op[5], id2[5];
int main()
{
printf("Enter the Expression : ");
scanf("%s", userInput);
strcpy(expr, userInput);
l = strlen(expr);
expr1[0] = '\0';
for (i = 0; i < 2; i++)
{
if (expr[i] == '+' || expr[i] == '-')
{
if (expr[i + 2] == '/' || expr[i + 2] == '*')
{
strrev(expr);
j = l - i - 1;
strncat(expr1, expr, j);
strrev(expr1);
printf("Three Address Code\nT = %s\nT1 = %c%cT\n", expr1,
expr[j + 1], expr[j]);
}
else
{
strncat(expr1, expr, i + 2);
printf("Three Address Code\nT = %s\nT1 = T%c%c\n", expr1, expr[i + 2], expr[i + 3]);
}
}
else if (expr[i] == '/' || expr[i] == '*')
{
strncat(expr1, expr, i + 2);
printf("Three Address Code\nT = %s\nT1 = T%c%c\n", expr1, expr[i + 2],
expr[i + 3]);
}
}
return 0;
}
Output :

Conclusion-
Conditional statements with while loop for calculator and 3 address code generator
successfully implemented.

You might also like