Compiler Design (CD) : Lab Assignment 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

Gauri Choudhari

Roll no. 62
Batch 3
TY-CSA
PRN: 12111387

Compiler Design (CD)


Lab Assignment 1

Title: Implement LEX/FLEX code to count the number of characters,


words and lines in an input file.

Theory:
Lexical Analysis
Lexical analysis is the first phase of compiling a program. It involves breaking the
input text into a sequence of tokens, which are meaningful units such as keywords,
identifiers, literals, and punctuation symbols. The lexical analyser reads the input
character stream and groups characters into tokens based on predefined patterns.
It takes the input as a stream of characters and gives the output as tokens also
known as tokenization. The tokens can be classified into identifiers, Keywords,
Operators, Constant and Special Characters.

Working of Lexical Analyser

• Input preprocessing: This stage involves cleaning up the input text and
preparing it for lexical analysis. This may include removing comments,
whitespace, and other non-essential characters from the input text.
• Tokenization: This is the process of breaking the input text into a sequence
of tokens. This is usually done by matching the characters in the input text
against a set of patterns or regular expressions that define the different types
of tokens.
• Token classification: In this stage, the lexer determines the type of each
token. For example, in a programming language, the lexer might classify
keywords, identifiers, operators, and punctuation symbols as separate token
types.
• Token validation: In this stage, the lexer checks that each token is valid
according to the rules of the programming language. For example, it might
check that a variable name is a valid identifier, or that an operator has the
correct syntax.
• Output generation: In this final stage, the lexer generates the output of the
lexical analysis process, which is typically a list of tokens. This list of tokens
can then be passed to the next stage of compilation or interpretation.

Lex / Flex– Tool for Lexical Analysis


Lex is a tool or a computer program that generates Lexical Analysers (converts the
stream of characters into tokens). The Lex tool itself is a compiler. The Lex
compiler takes the input and transforms that input into input patterns. It is
commonly used with YACC (Yet Another Compiler Compiler). It was written by
Mike Lesk and Eric Schmidt.

Working of Lex:

• Standard input stream is processed to match regular expression.


• When a RE is matched, the corresponding body of code is executed.
• A return is possible after a match--the general use for a compiler project
• If no return, then the next RE is match

Lex File Format


A Lex program consists of three parts and is separated by %% delimiters:-

Declarations
%%
Translation rules
%%
Auxiliary procedures

A. Declarations:
The declarations include declarations of variables. They are made up of

• declarations of C variables used in the auxiliary procedures


• manifest constants (protocol uses all caps)
• regular definitions for use in the translation rules
B. Translation rules

Translation rules are constructed as follows

r.e.1 {action1}

r.e.2 {action2}

....

r.e.n {actionn}

The actionsi are C code to be carried out when the regular expression matches the
input. Think event-driven programming. For example:

[Ii][Ff]
{return(IF);}
{id} {yylval = storeId(yytext,yyleng);
return(ID);}
{snum} {yylval = storeNum(yytext,yyleng,atoi(yytext),INTEGER);
return(CON);}

Processing of tokens will continue through the list of actions until a return() is
performed. The longest token is matched when there may be any ambiguity.

C. Auxiliary Functions

Auxiliary functions can be added to this lex.l file. These functions will contain more C
code that can be called by the actions of the translation rules, thus making the code
in the translation rules simpler. These functions can also be compiled separately-
they need not be in the lex file since linking with other files is likely.

Code:
%{

#include<stdio.h>

int cc = 0, wc = 0, sc = 0, lc = 0;

%}

%%

[^ \t\n]+ {cc+=yyleng; wc++;}

[ ] {sc++;}

[\t] {sc+=4;}
[\n] {lc++;}

%%

int main(){

yyin = fopen("jan18_input.txt", "r");

yyout = fopen("jan18_output.txt", "w");

while(!feof(yyin)){

yylex();

fprintf(yyout, "Character Count is : %d\n", cc);

fprintf(yyout, "Word Count is : %d\n", wc);

fprintf(yyout, "Space Count is : %d\n", sc);

fprintf(yyout, "Line Count is : %d\n\n", lc);

return 0;

int yywrap(){

return 1;

Input File:

Output:
Conclusion:
Lex and Flex are powerful tools for generating lexical analysers, making them essential in the
development of compilers, interpreters, and other language processing tools. By defining
lexical rules using regular expressions, Lex/Flex allows developers to efficiently tokenize
input text, enabling various text processing tasks such as counting characters, words, and
lines as demonstrated in this assignment. It showcases the ease and flexibility of Lex/Flex in
handling such tasks efficiently.
Gauri Choudhari
Roll no. 62
Batch 3
TY-CSA
PRN: 12111387

Compiler Design (CD)


Lab Assignment 2

Title: Tokenization of C program using Lex/Flex

Theory:
Tokenization in compiler design is the process of breaking down a sequence of
characters (source code) into meaningful units called tokens. Tokens are the
smallest units of a program that have meaning, such as keywords, identifiers, literals,
and operators.
Purpose of Tokenization:
Tokenization serves as the first step in the compilation process of a programming
language. It involves dividing the input source code into tokens, which are the
smallest meaningful units such as keywords, identifiers, literals, operators, and
punctuation symbols. These tokens serve as building blocks for subsequent phases
of compilation, such as parsing, semantic analysis, and code generation.
Tokenization enables the compiler or interpreter to understand the structure and
semantics of the program and perform various analyses and optimizations.

Tokenization of C Programs:
Tokenizing C programs using Lex/Flex involves defining rules to recognize various
tokens in the C language. These tokens include keywords (e.g., int, if, while),
identifiers (e.g., variable names, function names), literals (e.g., integers, strings),
operators (e.g., arithmetic, logical, relational), and punctuation symbols (e.g., braces,
semicolons).

Lex/Flex Rules for C Tokenization:


• Keywords: Keywords are reserved words with special meanings in the C
language. Lex/Flex rules can be defined to recognize keywords based on
predefined lists.
• Identifiers: Identifiers are names given to various entities such as variables,
functions, and types. Lex/Flex rules can use regular expressions to match
valid identifier names.
• Numbers: Numbers represent constant values such as integers. Lex/Flex
rules can recognize numeric literals based on patterns like digits and optional
decimal points.
• Operators: Operators are symbols used to perform operations on operands.
Lex/Flex rules can define patterns for arithmetic, logical, bitwise, and
relational operators.
• Special Symbols: Special symbols include parentheses, braces, brackets,
commas, and semicolons. Lex/Flex rules can match these symbols
individually or collectively.

Code:
%{

#include<stdio.h>

#include<string.h>

struct tab

char lexeme[50];

char type[50];

};

struct tab keyword[10], identifier[10], specialsymbol[10], operators[10],


constants[10];

int counter_keyword=0, counter_id=0, counter_ss=0, counter_opr=0,


counter_cons=0;

%}

%%

"#include"[ ]*"<"[a-zA-Z0-9]+".h>" { printf("%s is a header file\n", yytext); }

"#include"[ ]*["][a-zA-Z0-9]+".h"["] { printf("%s is a header file\n", yytext); }

"int"|"char"|"string"|"float"|"double"|"long"|"long long"|"struct"|"union" {
strcpy(keyword[counter_keyword].lexeme, yytext);

strcpy(keyword[counter_keyword].type, "datatype");

counter_keyword++; }

["int"|"char"|"string"|"float"|"double"|"long"|"long long"|"struct"|"union"][
]*[*] { strcpy(keyword[counter_keyword].lexeme, yytext);
strcpy(keyword[counter_keyword].type, "pointer");

counter_keyword++; }

"return"|"void"|"null"|"if"|"else"|"if
else"|"break"|"continue"|"switch"|"while"|"for" {
strcpy(keyword[counter_keyword].lexeme, yytext);

strcpy(keyword[counter_keyword].type, "keyword");

counter_keyword++; }

"main"[ ]*"("[ ]*"void"?"int[ ]+argc"?("char"[ ]*[*][ ]+"argv")?[ ]*")" {


printf("%s is the main function\n", yytext); }

("printf"|"scanf")"(".*")" { printf("%s is a predefined function\n", yytext); }

(==)|<(=)?|>(=)?|"!=" { strcpy(operators[counter_opr].lexeme, yytext);

strcpy(operators[counter_opr].type, "Comparison
Operator");

counter_opr++; }

(&&)|"||"|"!" { strcpy(operators[counter_opr].lexeme, yytext);

strcpy(operators[counter_opr].type, "Boolean Operator");

counter_opr++; }

[=|+|-|*|/|%|"**"] { strcpy(operators[counter_opr].lexeme, yytext);

strcpy(operators[counter_opr].type, "Mathematical
Operator");

counter_opr++; }

[;] { strcpy(specialsymbol[counter_ss].lexeme, yytext);

strcpy(specialsymbol[counter_ss].type, "Terminating Symbol");

counter_ss++; }

[&|^|\|"|'|.|,] { strcpy(specialsymbol[counter_ss].lexeme, yytext);

strcpy(specialsymbol[counter_ss].type, "Special
Symbol");

counter_ss++; }

[(|)|{|}]|"["|"]" { strcpy(specialsymbol[counter_ss].lexeme, yytext);

strcpy(specialsymbol[counter_ss].type,
"Paranthesis");

counter_ss++; }
["].*["] { strcpy(constants[counter_cons].lexeme, yytext);

strcpy(constants[counter_cons].type, "String");

counter_cons++; }

[a-zA-Z_]([a-zA-Z]|[0-9]|_)* { strcpy(identifier[counter_id].lexeme, yytext);

strcpy(identifier[counter_id].type,
"identifier");

counter_id++; }

[+-]?[0-9]*"."[0-9]* { strcpy(constants[counter_cons].lexeme, yytext);

strcpy(constants[counter_cons].type, "Float Number");

counter_cons++; }

[+-]?[0-9]* { strcpy(constants[counter_cons].lexeme, yytext);

strcpy(constants[counter_cons].type, "Integer");

counter_cons++; }

[ \n]* {};

%%

int main()

yyin = fopen("jan25_input.c", "r");

while(!feof(yyin))

yylex();

int itr = 0;

printf("\nKeyword Table\n Lexeme\t| Type\t\n");

for(int i=0; i<counter_keyword; i++)

printf(" %s\t\t| %s\t\n", keyword[i].lexeme, keyword[i].type);

fflush(stdout);

itr = 0;

printf("\nIdentifier Table\n Lexeme\t| Type\t\n");

for(int i=0; i<counter_id; i++)


{

printf(" %s\t\t| %s\t\n", identifier[i].lexeme, identifier[i].type);

fflush(stdout);

itr = 0;

printf("\nConstants Table\n Lexeme\t| Type\t\n");

for(int i=0; i<counter_cons; i++)

printf(" %s\t\t| %s\t\n", constants[i].lexeme, constants[i].type);

itr = 0;

printf("\nSpecial Symbol Table\n Lexeme\t| Type\t\n");

for(int i=0; i<counter_ss; i++)

printf(" %s\t\t| %s\t\n", specialsymbol[i].lexeme,


specialsymbol[i].type);

fflush(stdout);

itr = 0;

printf("Operator Table\n Lexeme\t| Type\t\n");

for(int i=0; i<counter_opr; i++)

printf(" %s\t\t| %s\t\n", operators[i].lexeme, operators[i].type);

return 0;

int yywrap()

return 1;

}
Input File:

Output:
Conclusion:
The tokenization of C programs using Lex/Flex demonstrates the lexical analysis
phase in compiler design. By breaking down the code into tokens, Lex/Flex facilitates
various compilation tasks such as parsing, semantic analysis, and code generation.
This assignment helps us understand and get practical experience in using Lex/Flex
for tokenizing C programs, enhancing understanding of compiler construction
concepts and techniques.
Gauri Choudhari
Roll no. 62
Batch 3
TY-CSA
PRN: 12111387

Compiler Design (CD)


Lab Assignment 3

Title: Write Lex Program to convert all uppercase to lowercase letter and
summation of digits if a number is found in the file.

Theory:
Text processing and numerical operations are fundamental aspects of computer
programming, essential for various applications such as data parsing, analysis, and
transformation. In this theoretical exploration, we delve into the design and
implementation of a Lex program to convert uppercase letters to lowercase and
calculate the summation of digits in a given text file.

Lexical Analysis:
The first step is to define lexical rules using regular expressions to identify uppercase
letters and numeric digits within the input text. Regular expressions provide a
concise and powerful mechanism for pattern matching, allowing us to specify
complex patterns with ease.

Tokenization:
Once the lexical rules are defined, Lex breaks down the input text into tokens based
on the specified patterns. Tokens represent meaningful units of text, such as words,
numbers, or punctuation symbols. For this problem, tokens correspond to uppercase
letters and numeric digits.

Action Execution:
Upon identifying tokens corresponding to uppercase letters, the Lex program
executes actions to convert them to lowercase. Similarly, when numeric digits are
encountered, the program computes the summation of digits and accumulates the
result.
Regular Expressions:
Regular expressions expressions consist of symbols and operators that represent
sequences of characters or character classes. By constructing appropriate regular
expressions, we can identify uppercase letters and numeric digits within the input
text efficiently.

Lexical Tokens:
In Lex programming, each token corresponds to a specific pattern defined by regular
expressions and triggers corresponding actions when matched. By tokenizing the
input text, Lex breaks down complex input sequences into manageable units,
facilitating further processing and analysis.

Code:
%{

#include<stdio.h>

#include<string.h>

int sum, num;

%}

%%

[0-9]+ {

num = atoi(yytext);

sum = 0;

while(num!=0)

sum+=num%10;

num/=10;

fprintf(yyout, "%s (sum = %d) ", yytext, sum);

printf("%s -> sum = %d\n", yytext, sum);

[A-Z] {fprintf(yyout, "%c", yytext[0]+32);}


. {fprintf(yyout, "%s", yytext);

%%

int main()

yyin = fopen("feb08_input.txt", "r");

yyout = fopen("feb08_output.txt", "w");

while(!feof(yyin))

yylex();

return 0;

int yywrap()

return 1;

Input:
Output: Modified File
Conclusion:
This Lex program effectively achieves its goals of converting uppercase letters to
lowercase and computing the summation of digits in a given text file. By utilizing
simple rules and actions defined in the Lex syntax, the program seamlessly
processes the input text, providing a lowercase version while also calculating the
sum of digits whenever numbers are encountered. This program demonstrates the
versatility and efficiency of Lex in performing text processing tasks, making it a
valuable tool for various applications requiring lexical analysis and manipulation of
textual data.
Gauri Choudhari
Roll no. 62
Batch 3
TY-CSA
PRN: 12111387

Compiler Design (CD)


Lab Assignment 4

Title: Write a program in lex and yacc to recognize sentence as simple


statement or compound statement

Theory:
Parsing:
The process of transforming the data from one format to another is called Parsing.
This process can be accomplished by the parser. The parser is a component of the
translator that helps to organise linear text structure following the set of defined rules
which is known as grammar.
YACC
YACC (yet another compiler-compiler) is an LALR (LookAhead, Left-to-right,
Rightmost derivation producer with 1 lookahead token) parser generator. YACC was
originally designed for being complemented by Lex.

Working of YACC

• Yacc reads a grammar file that specifies the language's syntax rules.
• It generates a parser based on this grammar.
• The generated parser reads tokens produced by Lex.
• It constructs a parse tree or performs actions based on the grammar rules.
Input File: YACC input file is divided into three parts.

/* definitions */
....
%%

/* rules */
....
%%

/* auxiliary routines */
....

Definition Part:
The definition part includes information about the tokens used in the syntax
definition:
%token NUMBER
%token ID

Yacc automatically assigns numbers for tokens, but it can be overridden
by
%token NUMBER 621

Yacc also recognizes single characters as tokens. Therefore, assigned
token numbers should no overlap ASCII codes.
• The definition part can include C code external to the definition of the
parser and variable declarations, within %{ and %} in the first column.
• It can also include the specification of the starting symbol in the
grammar:
%start nonterminal

Rule Part:
• The rules part contains grammar definitions in a modified BNF form.
• Actions is C code in { } and can be embedded inside (Translation
schemes).

Auxiliary Routines Part:


• The auxiliary routines part is only C code.
• It includes function definitions for every function needed in the rules part.
• It can also contain the main() function definition if the parser is going to
be run as a program.
• The main() function must call the function yyparse().
Code:
A. Lex File

%{
#include "Feb22_yacc.tab.h"
%}
%%
He|She|They|I|You|he|she|they|you|we|We|It|it { return PRON; }
is|was|are|writes|write|reads|read|watch|watches|study|learn|studies|eat|li
kes { return VERB; }
girl|boy|teacher|books|TV|email|college|engineering|food { return NOUN; }
and|but|or|because|so|since|though { return CONJ; }

%%

int yywrap() {

return 1;
}

B. Yacc File

%{
#include <stdio.h>
FILE *yyin;
int yylex();
void yyerror();
%}

%token NOUN PRON VERB CONJ


%start stmt
%%

stmt: stmt simple {printf("This is a simple statement\n");}


| stmt compound {printf("This is a compound statement\n");}
|
;

compound: simple CONJ simple


;

simple: subject VERB object


;

subject: NOUN | PRON


;

object: NOUN
;
%%

int main()
{
yyin = fopen("feb22_input.txt", "r");
while(!feof(yyin))
{
yyparse();
}
fclose(yyin);
}
void yyerror(char *s)
{
printf("%s", s);
}

Input:

Output:

Conclusion:
This assignment successfully distinguishes between simple and compound
statements with language processing and parsing. Through the integration of lexical
analysis (Lex) and syntax analysis (Yacc), we've created a robust tool capable of
accurately identifying the structural complexity of input sentences. By defining lexical
rules in Lex to tokenize input text and specifying grammar rules in Yacc to parse
syntactic structures, our program effectively distinguishes between simple
statements, consisting of a single clause or expression, and compound statements,
comprised of multiple clauses or expressions connected by conjunctions or
punctuation.
Gauri Choudhari
Roll no. 62
Batch 3
TY-CSA
PRN: 12111387

Compiler Design (CD)


Lab Assignment 5

Title: Implement a code optimizer for C/C++ subset.

Theory:
Code Optimization
The code optimization in the synthesis phase is a program transformation
technique, which tries to improve the intermediate code by making it consume
fewer resources (i.e. CPU, Memory) so that faster-running machine code will result.
Compiler optimizing process should meet the following objectives :
• The optimization must be correct, it must not, in any way, change the
meaning of the program.
• Optimization should increase the speed and performance of the program.
• The compilation time must be kept reasonable.
• The optimization process should not delay the overall compiling process.

Optimization of the code is often performed at the end of the development stage
since it reduces readability and adds code that is used to increase the
performance.

Need of Optimization:

Optimizing an algorithm is beyond the scope of the code optimization phase. So


the program is optimized. And it may involve reducing the size of the code. So
optimization helps to:
• Reduce the space consumed and increases the speed of compilation.
• Manually analyzing datasets involves a lot of time. Hence we make use
of software like Tableau for data analysis. Similarly manually performing
the optimization is also tedious and is better done using a code optimizer.
• An optimized code often promotes re-usability.

Types of Code Optimization:


The optimization process can be broadly classified into two types :
1. Machine Independent Optimization: This code optimization phase
attempts to improve the intermediate code to get a better target code as
the output. The part of the intermediate code which is transformed here
does not involve any CPU registers or absolute memory locations.
2. Machine Dependent Optimization: Machine-dependent optimization is
done after the target code has been generated and when the code is
transformed according to the target machine architecture. It involves
CPU registers and may have absolute memory references rather than
relative references. Machine-dependent optimizers put efforts to take
maximum advantage of the memory hierarchy.

Code:
A. Lex Code

%{
#include"Mar28_Yacc.tab.h"
extern char yyval;
%}

%%
[0-9]+ {yylval.symbol = (char)yytext[0];return NUMBER;}
[a-zA-Z]+ {yylval.symbol =(char) yytext[0];return ID;}
[-+*/=();] { return yytext[0]; }
[ \t\n] { }
. { }
%%

B. Yacc Code
%{

#include<stdio.h>
#include<stdlib.h>
int yylex();
void yyerror();
char temp ='A'-1;
int index1=0;
char addtotable(char, char, char);
struct expr{
char operand1;
char operand2;
char operator;
char result;
};
%}

%union
{
char symbol;
}

%left '+''-'
%left '*''/'
%token <symbol> NUMBER ID
%type <symbol> exp

%%
st: ID '=' exp ';' {addtotable((char)$1,(char)$3,'='); YYACCEPT;};
exp: exp '+' exp {$$ = addtotable((char)$1,(char)$3,'+');}
|exp '-' exp {$$ = addtotable((char)$1,(char)$3,'-');}
|exp '/' exp {$$ = addtotable((char)$1,(char)$3,'/');}
|exp '*' exp {$$ = addtotable((char)$1,(char)$3,'*');}
|'(' exp ')' {$$ = (char)$2;}
|NUMBER {$$ = (char)$1;}
|ID {$$=(char)$1;};
%%
struct expr table[20];
void printTable()
{
int i;
for(i=0;i<index1;i++)
{
if(table[i].operator=='!') continue;
printf("%c:= %c %c %c \n",table[i].result, table[i].operand1,
table[i].operand2, table[i].operator);
}
}

char addtotable(char a, char b, char c)


{
temp++;
table[index1].operand1=a;
table[index1].operand2=b;
table[index1].operator=c;
table[index1].result=temp;
index1++;
return temp;
}

void optim()
{
int i,j;
for(i=0;i<index1;i++)
for(int j=i+1;j<index1;j++)
{
if(table[i].operator==table[j].operator && table[i].operand1
==table[j].operand1
&& table[i].operand2 == table[j].operand2){
int z;
for(int z=j+1;z<index1;z++){
if(table[z].operand1==table[j].result)
table[z].operand1=table[i].result;
if(table[z].operand2==table[j].result)
table[z].operand2=table[i].result;
}
table[j].operator='!';
}}}

int main()
{
temp='A'-1;
printf("Enter the expression\n");
yyparse();
printTable();
optim();
printf("After Optimization\n");
printTable();
}

int yywrap()
{
return 1;
}

void yyerror(char *s)


{
printf("Error %s",s);
}
Input:

Output:

Conclusion:
The code optimizer for a subset of the C programming language has been developed
in this assignment. By implementing these optimizations within the constraints of the
C subset, we have demonstrated the ability to improve code performance without
sacrificing correctness or portability. The optimizer analyses code structures, identifies
optimization opportunities, and applies transformation rules to generate optimized
code that exhibits improved runtime behaviour and resource utilization. Optimized
code not only enhances application performance but also reduces energy
consumption and improves scalability, making it crucial for a wide range of computing
platforms, including embedded systems, mobile devices, and high-performance
computing clusters.
Gauri Choudhari
Roll no. 62
Batch 3
TY-CSA
PRN: 12111387

Compiler Design (CD)


Lab Assignment 6

Title: Implement a code generator for C/C++ subset.

Theory:
Code Optimization
Code generation is the final phase phase in the compilation process, where the
compiler translates the intermediate representation (IR) of the source code into
target machine code or another intermediate representation suitable for execution on
the target platform. This phase bridges the gap between the platform-independent
representation of the source code and the platform-specific machine instructions.

Purpose of Code Generation:


• Translation to Machine Code: The primary goal of code generation is to
translate the high-level source code into machine code that can be executed
by the target hardware.
• Efficiency Optimization: Code generation aims to produce efficient code that
minimizes execution time, memory usage, and power consumption.
• Target Platform Adaptation: Generated code must be tailored to the
characteristics and constraints of the target platform, including instruction set
architecture (ISA), memory hierarchy, and hardware capabilities.
• Abstraction Removal: Code generation eliminates high-level abstractions
present in the intermediate representation, such as expressions, control flow
structures, and data types, and replaces them with low-level machine
instructions.

Phases of Code Generation:


• Intermediate Representation (IR) Generation: The compiler generates an
intermediate representation of the source code, which serves as an abstract
and platform-independent representation. Common IR forms include abstract
syntax trees (ASTs), three-address code (TAC), and static single assignment
(SSA) form.
• IR Optimization: Before code generation, the compiler may perform
optimization passes on the intermediate representation to improve code
quality, such as constant folding, dead code elimination, loop optimization,
and register allocation.
• Code Generation: This phase translates the optimized intermediate
representation into target machine code. It involves mapping high-level
constructs to corresponding machine instructions, managing memory access,
and handling control flow.

Techniques for Code Generation:


• Static Code Generation: Generate machine code offline during compilation,
suitable for ahead-of-time (AOT) compilation models.
• Just-In-Time (JIT) Compilation: Dynamically translate and optimize code at
runtime, typically used in interpreters and virtual machines to improve
execution performance.
• Intermediate Representation Translation: Translate the intermediate
representation into target machine code directly or through multiple
intermediate stages, such as assembly language or LLVM Intermediate
Representation (IR).
• Profile-Guided Optimization (PGO): Use runtime profiling data to guide code
generation decisions and optimize frequently executed code paths.

• function's closing braces and returns 0.

Code:
A. Lex Code
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "Apr04_yacc.tab.h"
%}

%%
"int"|"char"|"string"|"float"|"if"|"else"|"else if" { yylval.expr =
strdup(yytext); return KEYWORD; }
"printf""(".*");" { yylval.expr = strdup(yytext); return INBUILT; }
[0-9]+ { yylval.num = atoi(yytext); return NUMBER; }
[a-zA-Z]+ { yylval.var = strdup(yytext); return VARIABLE; }
"=="|"!="|"<"|">"|"<="|">=" { yylval.var = strdup(yytext); return
COMPARISON; }
[-+*/=();{}%] { return yytext[0]; }
[ \t\n] { /* ignore whitespace */ }
. { }
%%

B. Yacc Code

%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
extern FILE *yyin;
char *buffer;
int i = 0;
int yylex();
void yyerror();
%}

%union
{
int num;
char* var;
char* expr;
}
%left '+''-'
%left '*''/'
%token <num> NUMBER
%token <var> VARIABLE COMPARISON BOOLOPR
%token <expr> KEYWORD INBUILT
%type <expr> exp st finalst condition condblock
%%

finalst: finalst st {
printf("%s", $2);
}
| {}
;
st: VARIABLE '=' exp ';' {
buffer = (char *)malloc(strlen($1) + strlen($3)+7); // Allocate memory
for the result string
sprintf(buffer, "%s = %s;\n", $1, $3);
//printf("%s = %s;\n", $1, $3);
$$ = buffer;
}
| condblock {
buffer = (char *)malloc(strlen($1)); // Allocate memory for the result
string
sprintf(buffer, "%s\n", $1);
//printf("%s\n", $1);
$$ = buffer;
}

| KEYWORD {
buffer = (char *)malloc(strlen($1)); // Allocate memory for the result
string
sprintf(buffer, "%s ", $1);
$$ = buffer;
}
| INBUILT {
//printf("%s\n", $1);
buffer = (char *)malloc(strlen($1)); // Allocate memory for the result
string
sprintf(buffer, "%s\n", $1);
//printf("%s\n", $1);
$$ = buffer;
}
| '{' {
buffer = (char *)malloc(3); // Allocate memory for the result string
sprintf(buffer, "{ \n");
$$ = buffer;
}
| '}' {
buffer = (char *)malloc(3); // Allocate memory for the result string
sprintf(buffer, "} \n");
$$ = buffer;
}
;

condblock: KEYWORD '(' condition ')'


{
buffer = (char *)malloc(strlen($1) + strlen($3) + 5); // Allocate
memory for the result string
sprintf(buffer, "%s (%s)", $1, $3);
$$ = buffer;
}
;

condition : exp COMPARISON exp


{
buffer = (char *)malloc(strlen($1) + strlen($2) + strlen($3) + 5); //
Allocate memory for the result string
sprintf(buffer, "%s %s %s ", $1, $2, $3);
$$ = buffer;
}
;
exp: exp '+' exp {
buffer = (char *)malloc(strlen($1) + strlen($3) + 5);
sprintf(buffer, "(%s + %s)", $1, $3);
$$ = buffer;
}
| exp '-' exp {
buffer = (char *)malloc(strlen($1) + strlen($3) + 5);
sprintf(buffer, "(%s - %s)", $1, $3);
$$ = buffer;
}
| exp '*' exp {
buffer = (char *)malloc(strlen($1) + strlen($3) + 5);
sprintf(buffer, "(%s * %s)", $1, $3);
$$ = buffer;
}
| exp '/' exp {
buffer = (char *)malloc(strlen($1) + strlen($3) + 5);
sprintf(buffer, "(%s / %s)", $1, $3);
$$ = buffer;
}
| exp '%' exp {
buffer = (char *)malloc(strlen($1) + strlen($3) + 5);
sprintf(buffer, "(%s %% %s)", $1, $3);
$$ = buffer;
}
| '(' exp ')' {
buffer = (char *)malloc(strlen($2) + 3);
sprintf(buffer, "(%s)", $2);
$$ = buffer;
}
| NUMBER {
buffer = (char *)malloc(20); // Allocate memory for integer to string
conversion
sprintf(buffer, "%d", $1);
$$ = buffer;
}
| VARIABLE {
buffer = (char *)malloc(strlen($1) + 1);
strcpy(buffer, $1);
$$ = buffer;
}
;

%%

int main()
{
printf("#include<stdio.h>\n\n");
printf("int main(){\n");
yyin = fopen("apr04_input.txt", "r");
while(!feof(yyin))
{
yyparse();
}
fclose(yyin);
printf("return 0;\n}");
return 0;
}

void yyerror(char *s)


{
printf("%s\n", s);
}
int yywrap() {
return 1;
}

Input:

Output:

Conclusion:
This assignment successfully demonstrates code generation for C language.
Throughout the assignment, we've explored various aspects of code generation,
including: tokenization, parsing, code generation, etc. By implementing these stages,
the code generator can produce executable code that faithfully reflects the original
program's behavior within the constraints of the C/C++ subset. This assignment
highlights the essential role of code generation in compiler design and software
development.

You might also like