Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Lab No.

1
TITLE:
IDENTIFYING WHETHER THE GIVEN INPUT STRING IS VALID OR NOT

OBJECTVE:
Using FLEX to analyze given whether the given string is and identifier or not using regular
expression

RELATED THEORY:
1. Flex:
Flex is a tool for generating scanners.
A scanner is a program which recognizes lexical patterns in text.
The flex program reads the given input file or its standard input if no files name are given, for
a description of a scanner to generate.
Flex generates the output as a source file “lex.yy.c” by default, which defines a ‘c’ source file.
This file can be composed and linked when the executable is run, it analyzes its input for
occurrence of the regular expression. Whatever if finds one, it executes the corresponding
code.
Lab No. 2
TITLE:
LEXICAL ANALYZER FOR WORD LENGTH AND CONVERT DECIMAL TO BINARY

OBJECTVE:
To write a lex/flex program to find the length of longest word and convert decimal to binary

RELATED THEORY:
Lex/Flex:
It is a tool for generating lexical analyzers for tokenization of input streams. They work by
specifying patterns and corresponding actions to be taken when those pattern are matched
in the input stream.

1. To find longest word


A word consist of a-z or A-Z separated by space. So, we use regex [ x-z A-Z ] + to get all words
and compare their lengths.

2. Convert decimal to binary


A decimal number consists of digits from 0-9. So we use regex [ 0-9 ] + to get all decimal
numbers and perform the computation to convert it to binary.
Lab No. 3.1
TITLE:
DETERMINE THE CHARACTER WITH HIGHEST FREQUENCY IN THE GIVEN STRING

OBJECTVE:
Write a program for lex/flex scanner generator to determine the character with highest
frequency in the given string.

RELATED THEORY:
Lex/Flex:
It is a tool for generating lexical analyzers for tokenization of input streams. They work by
specifying patterns and corresponding actions to be taken when those pattern are matched
in the input stream.

Steps Involved:
I.First initialize an array ‘z’ of size 129 since there are 128 standard ASCII characters in trying
to find the character with highest frequency.
II. The next step uses predefined terminology I.e regex expression form
III. Calculate and store the occurrences of each character until 128 ASCII values if a character
is not present we assign a value ‘0’.
IV. The max occurrence is stored and char ASCII value or index in character variables
respectively.

The function ‘yylex()’ is the main flex function that runs the rule section and extension (.l) is used to
save the program which then compiles in a c program.
Lab No. 3.2
TITLE:
DETERMINE THE CHARACTER WITH HIGHEST FREQUENCY IN THE GIVEN STRING

RELATED OBJECTVE:
Write a program for lex/flex scanner generator to determine the character with highest
frequency in the given string.

THEORY:
Lex/Flex:
It is a tool for generating lexical analyzers for tokenization of input streams. They work by
specifying patterns and corresponding actions to be taken when those pattern are matched
in the input stream.

Steps Involved:
1. First we initialize two variables to 0 to store the count of the spaces and character
respectively
2. Take the user input string
3. Count the white-spaces present in string using a regex and store the count to a variable
4. Count the characters present in string using regex and store the count to a variable
5. Print the count of white-spaces and characters to the console.
Lab No. 4
TITLE:
WRITE A PROGRAM FOR LEX/FLEX SCANNER TO RECOGNIZE STRINGS aaab, abbb USING anbn
WHERE n>0

RELATED THEORY:
In lex program lex analyzer partitions the input stream, it recognizes the regular
expression and formats them into tokens.
The parser (from the Yaac command) assigns structure to the resulting pieces.
The Yaac command generates a parser program that analyzes input using the
tokens.

Steps Involved:
1. Here declare 3 tokens one for ‘a‘ and one for ‘b’ and ‘NL’ for newline.
2. Declared it in Yaac code and these tokens used by the parser are included in
‘Y.tab.h’.
3. The comparison checking is done by parser in Yaac code.
4. Declare a sting S and define S as S = ASB which means each time ti only can be
replaced by ASB and at the end if S is NL then it is a valid string according to
given grammar. Since the same number of a’s and b’s.
5. If an error comes we pass it to yyerror and return invalid string.
Lab No. 5
TITLE:
IMPLEMENTATION OF FIRST AND FOLLOW

RELATED THEORY:

1. Compute FIRST:
FIRST(α) is a set of terminal symbols which occur as first symbol in strings derived from α where
α is any string of grammar symbols

Rules:
 If a is terminal, then FIRST(a) is {a}.
 If A → Ɛ is a production, then FIRST(A) is Ɛ.
 For any non terminal A with production rules
A → α1 | α2 | α3 | ……. | αn
Then, FIRST(A) = FIRST(α1) U FIRST(α2) U FIRST(α3) U ………….. U FIRST(αn)
 If then production rule of the form A→ β1 β2 β3 β4 ……………. βn
Then, FIRST(A) = FIRST( β1 β2 β3 β4 ……………. βn)

2. Compute FOLLW:
FOLLOW(A) is the set of terminals that can immediately follow non-terminal A except Ɛ.

Rules:
 If A is starting symbol of given grammar then FOLLOW(A) = {$}
 For every production β → αA.B where α and β are any string of grammar symbols and A is a
non-terminal, then everything in FIRST(β) except Ɛ is FOLLOW(A)
 For every production β → αA or a production β → αAB, FIRST(β) contains Ɛ, then everything
in FOLLOW(B) is FOLLOW(A).
Lab No. 6
TITLE:
IMPLEMENTATION OF LL(1) PARSER

RELATED THEORY:

LL(1) Parser
The parser table construction requires two functions FIRST and FOLLOW. A grammar G is
suitable of LL(1) parsing table if the grammar is free from left recursion and left factoring.
To compute LL(1) parsing table, FIRST and FOLLOW is needed.

1. Compute FIRST:
FIRST(α) is a set of terminal symbols which occur as first symbol in strings derived from α where
α is any string of grammar symbols

Rules:
 If a is terminal, then FIRST(a) is {a}.
 If A → Ɛ is a production, then FIRST(A) is Ɛ.
 For any non terminal A with production rules
A → α1 | α2 | α3 | ……. | αn
Then, FIRST(A) = FIRST(α1) U FIRST(α2) U FIRST(α3) U ………….. U FIRST(αn)
 If then production rule of the form A→ β1 β2 β3 β4 ……………. βn
Then, FIRST(A) = FIRST( β1 β2 β3 β4 ……………. βn)

2. Compute FOLLW:
FOLLOW(A) is the set of terminals that can immediately follow non-terminal A except Ɛ.

Rules:
 If A is starting symbol of given grammar then FOLLOW(A) = {$}
 For every production β → αA.B where α and β are any string of grammar symbols and A is a
non-terminal, then everything in FIRST(β) except Ɛ is FOLLOW(A)
 For every production β → αA or a production β → αAB, FIRST(β) contains Ɛ, then everything
in FOLLOW(B) is FOLLOW(A).
SOURCE CODE:

Input Productions:
S->A
A->aB|Ad
B->b
C->g

Code:
#include<bits/stdc++.h>
using namespace std;

set<char> ss;
bool dfs(char i, char org, char last, map<char,vector<vector<char>>> &mp){
bool rtake = false;
for(auto r : mp[i]){
bool take = true;
for(auto s : r){
if(s == i) break;
if(!take) break;
if(!(s>='A'&&s<='Z')&&s!='e'){
ss.insert(s);
break;
}
else if(s == 'e'){
if(org == i||i == last)
ss.insert(s);
rtake = true;
break;
}
else{
take = dfs(s,org,r[r.size()-1],mp);
rtake |= take;
}
}
}
return rtake;
}

int main(){
int i,j;
ifstream fin("inputllcheck.txt");
string num;
vector<int> fs;
vector<vector<int>> a;
map<char,vector<vector<char>>> mp;
char start;
bool flag = 0;
cout<<"Grammar: "<<'\n';
while(getline(fin,num)){
if(flag == 0) start = num[0],flag = 1;
cout<<num<<'\n';
vector<char> temp;
char s = num[0];
for(i=3;i<num.size();i++){
if(num[i] == '|'){
mp[s].push_back(temp);
temp.clear();
}
else temp.push_back(num[i]);
}
mp[s].push_back(temp);
}
int flag2 = 0;
vector<char> del;
map<char,vector<vector<char>>> add;
vector<int> viz(100,0);
for(auto q : mp){
int one = 0;
char c;
for(auto r : q.second){
if(q.first == r[0]){
one = 1;
flag2 = 1;
del.push_back(q.first);
c = 'A';
while(mp.count(c)||viz[c-'A']) c++;
vector<char> temp;
for(i=1;i<r.size();i++)
temp.push_back(r[i]);
temp.push_back(c);
add[c].push_back(temp);
add[c].push_back({'e'});
}
}
if(one){
for(auto r : q.second){
if(q.first != r[0]){
vector<char> temp;
for(i=0;i<r.size();i++)
temp.push_back(r[i]);
temp.push_back(c);
add[q.first].push_back(temp);
}
}
viz[c-'A'] = 1;
}
}

for(auto q : del) mp.erase(q);


for(auto q : add) mp[q.first] = q.second;

if(flag2){
cout<<"\nGiven CFG is not suitable for LL1\nConverting...\n"<<'\n';
cout<<"New Grammar:"<<'\n';
for(auto q : mp){
string ans = "";
ans+=q.first;
ans += "->";
for(auto r : q.second){
for(auto s : r) ans+=s;
ans+='|';
}
ans.pop_back();
cout<<ans<<'\n';
}
}
else cout<<"\nGiven CFG is not suitable for LL1"<<'\n';

map<char,set<char>> fmp;
for(auto q : mp){
ss.clear();
dfs(q.first,q.first,q.first,mp);
for(auto g : ss) fmp[q.first].insert(g);
}

cout<<'\n';
cout<<"FIRST: "<<'\n';
for(auto q : fmp){
string ans = "";
ans += q.first;
ans += " = {";
for(char r : q.second){
ans += r;
ans += ',';
}
ans.pop_back();
ans+="}";
cout<<ans<<'\n';
}

map<char,set<char>> gmp;
gmp[start].insert('$');
int count = 10;
while(count--){
for(auto q : mp){
for(auto r : q.second){
for(i=0;i<r.size()-1;i++){
if(r[i]>='A'&&r[i]<='Z'){
if(!(r[i+1]>='A'&&r[i+1]<='Z')) gmp[r[i]].insert(r[i+1]);
else {
char temp = r[i+1];
int j = i+1;
while(temp>='A'&&temp<='Z'){
if(*fmp[temp].begin()=='e'){
for(auto g : fmp[temp]){
if(g=='e') continue;
gmp[r[i]].insert(g);
}
j++;
if(j<r.size()){
temp = r[j];
if(!(temp>='A'&&temp<='Z')){
gmp[r[i]].insert(temp);
break;
}
}
else{
for(auto g : gmp[q.first]) gmp[r[i]].insert(g);
break;
}
}
else{
for(auto g : fmp[temp]){
gmp[r[i]].insert(g);
}
break;
}
}
}
}
}
if(r[r.size()-1]>='A'&&r[r.size()-1]<='Z'){
for(auto g : gmp[q.first]) gmp[r[i]].insert(g);
}
}
}
}

cout<<'\n';
cout<<"FOLLOW: "<<'\n';
for(auto q : gmp){
string ans = "";
ans += q.first;
ans += " = {";
for(char r : q.second){
ans += r;
ans += ',';
}
ans.pop_back();
ans+="}";
cout<<ans<<'\n';
}

return 0;
}
Lab No. 7
TITLE:
IMPLEMENTATION OF LR PARSER

RELATED THEORY:

LR Parser
It is a type of bottom up parsing method.
It is used to parse large class of grammar.
In LR parsing L stands for left to right scanning of input and R stands for constructing right most
derivation in reverse
It is of 4 types:
LR(0), SLR, CLR and LALR

SOURCE CODE:

Input Productions:
E->BB
B->cB|d

Code:
#include <bits/stdc++.h>
#define error(x) cerr << #x << " = " << x << '\n'
using namespace std;

set<char> ss;
map<char, vector<vector<char>>> mp;
bool dfs(char i, char org, char last, map<char, vector<vector<char>>> &mp)
{
bool rtake = false;
for (auto r : mp[i])
{
bool take = true;
for (auto s : r)
{
if (s == i)
break;
if (!take)
break;
if (!(s >= 'A' && s <= 'Z') && s != 'e')
{
ss.insert(s);
break;
}
else if (s == 'e')
{
if (org == i || i == last)
ss.insert(s);
rtake = true;
break;
}
else
{
take = dfs(s, org, r[r.size() - 1], mp);
rtake |= take;
}
}
}
return rtake;
}

map<int, map<char, set<pair<deque<char>, deque<char>>>>> f;


map<int, vector<pair<int, char>>> g;

int num = -1;


void dfs2(char c, char way, int last, pair<deque<char>, deque<char>> curr)
{
map<char, set<pair<deque<char>, deque<char>>>> mp2;
int rep = -2;
if (last != -1)
{
for (auto q : g[last])
{
if (q.second == way)
{
rep = q.first;
mp2 = f[q.first];
}
}
}
mp2[c].insert(curr);
int count = 10;
while (count--)
{
for (auto q : mp2)
{
for (auto r : q.second)
{
if (!r.second.empty())
{
if (r.second.front() >= 'A' && r.second.front() <= 'Z')
{
for (auto s : mp[r.second.front()])
{
deque<char> st, emp;
for (auto t : s)
st.push_back(t);
mp2[r.second.front()].insert({emp, st});
}
}
}
}
}
}
for (auto q : f)
{
if (q.second == mp2)
{
g[last].push_back({q.first, way});
return;
}
}
if (rep == -2)
{
f[++num] = mp2;
if (last != -1)
g[last].push_back({num, way});
}
else
{
f[rep] = mp2;
}
int cc = num;
for (auto q : mp2)
{
for (auto r : q.second)
{
if (!r.second.empty())
{
r.first.push_back(r.second.front());
r.second.pop_front();
dfs2(q.first, r.first.back(), cc, r);
}
}
}
}

int main()
{
int i, j;
ifstream fin("inputslr.txt");
string num;
vector<int> fs;
vector<vector<int>> a;
char start;
bool flag = 0;
cout << "Grammar: " << '\n';
while (getline(fin, num))
{
if (flag == 0)
start = num[0], flag = 1;
cout << num << '\n';
vector<char> temp;
char s = num[0];
for (i = 3; i < num.size(); i++)
{
if (num[i] == '|')
{
mp[s].push_back(temp);
temp.clear();
}
else
temp.push_back(num[i]);
}
mp[s].push_back(temp);
}
map<char, set<char>> fmp;
for (auto q : mp)
{
ss.clear();
dfs(q.first, q.first, q.first, mp);
for (auto g : ss)
fmp[q.first].insert(g);
}

cout << '\n';


cout << "FIRST: " << '\n';
for (auto q : fmp)
{
string ans = "";
ans += q.first;
ans += " = {";
for (char r : q.second)
{
ans += r;
ans += ',';
}
ans.pop_back();
ans += "}";
cout << ans << '\n';
}

map<char, set<char>> gmp;


gmp[start].insert('$');
int count = 10;
while (count--)
{
for (auto q : mp)
{
for (auto r : q.second)
{
for (i = 0; i < r.size() - 1; i++)
{
if (r[i] >= 'A' && r[i] <= 'Z')
{
if (!(r[i + 1] >= 'A' && r[i + 1] <= 'Z'))
gmp[r[i]].insert(r[i + 1]);
else
{
char temp = r[i + 1];
int j = i + 1;
while (temp >= 'A' && temp <= 'Z')
{
if (*fmp[temp].begin() == 'e')
{
for (auto g : fmp[temp])
{
if (g == 'e')
continue;
gmp[r[i]].insert(g);
}
j++;
if (j < r.size())
{
temp = r[j];
if (!(temp >= 'A' && temp
<= 'Z'))
{

gmp[r[i]].insert(temp);
break;
}
}
else
{
for (auto g : gmp[q.first])
gmp[r[i]].insert(g);
break;
}
}
else
{
for (auto g : fmp[temp])
{
gmp[r[i]].insert(g);
}
break;
}
}
}
}
}
if (r[r.size() - 1] >= 'A' && r[r.size() - 1] <= 'Z')
{
for (auto g : gmp[q.first])
gmp[r[i]].insert(g);
}
}
}
}

cout << '\n';


cout << "FOLLOW: " << '\n';
for (auto q : gmp)
{
string ans = "";
ans += q.first;
ans += " = {";
for (char r : q.second)
{
ans += r;
ans += ',';
}
ans.pop_back();
ans += "}";
cout << ans << '\n';
}
string temp = "";
temp += '.';
temp += start;
deque<char> emp;
deque<char> st;
st.push_back(start);
dfs2('!', 'k', -1, {emp, st});

cout << "\nProductions: " << '\n';


int cc = 1;
set<char> action, go;
map<pair<char, deque<char>>, int> pos;
for (auto q : mp)
{
go.insert(q.first);
for (auto r : q.second)
{
cout << "r" << cc << ": ";
string ans = "";
ans += q.first;
ans += "->";
deque<char> temp;
for (auto s : r)
ans += s, temp.push_back(s);
pos[{q.first, temp}] = cc;
for (auto s : r)
{
if (s >= 'A' && s <= 'Z')
go.insert(s);
else
action.insert(s);
}
cout << ans << '\n';
cc++;
}
}

cout << "\nGraph: " << '\n';


for (auto mp2 : f)
{
cout << '\n';
cout << "I";
cout << mp2.first << ": \n";
for (auto q : mp2.second)
{
string ans = "";
ans += q.first;
ans += "->";
for (auto r : q.second)
{
for (auto t : r.first)
ans += t;
ans += '.';
for (auto t : r.second)
ans += t;
ans += '|';
}
ans.pop_back();
for (auto tt : ans)
{
if (tt == '!')
cout << start << '\'';
else
cout << tt;
}
cout << '\n';
}
}
cout << '\n';
cout << "Edges: " << '\n';
for (auto q : g)
{
for (auto r : q.second)
{
cout << "I" << q.first << " -> " << r.second << " -> "
<< "I" << r.first << "\n";
}
}
action.insert('$');
cout << "\nParsing Table:" << '\n';
cout << "St.\t\tAction & Goto" << '\n';
int tot = f.size();
cout << " \t";
for (auto q : action)
cout << q << '\t';
for (auto q : go)
cout << q << '\t';
cout << '\n';
for (i = 0; i < tot; i++)
{
cout << "I" << i << '\t';
for (auto q : action)
{
if (g.count(i))
{
int flag = 0;
for (auto r : g[i])
{
if (r.second == q)
{
flag = 1;
cout << "S" << r.first << "\t";
break;
}
}
if (!flag)
cout << "-" << '\t';
}
else
{
int flag = 0;
for (auto r : f[i])
{
if (r.first == '!')
{
if (q == '$')
{
cout << "AC\t";
flag = 1;
}
else
cout << "-\t";
}
}
if (!flag)
{
for (auto r : f[i])
{
char ccc = r.first;
deque<char> chk = (*r.second.begin()).first;
int cou = 1;
for (auto r : gmp[ccc])
{
if (q == r)
{
cout << "r" << pos[{ccc, chk}] << "\t";
}
cou++;
}
}
}
}
}
for (auto q : go)
{
if (g.count(i))
{
int flag = 0;
for (auto r : g[i])
{
if (r.second == q)
{
flag = 1;
cout << r.first << "\t";
break;
}
}
if (!flag)
cout << "-" << '\t';
}
else
{
cout << "-" << '\t';
}
}
cout << '\n';
}

return 0;
}
Lab No. 8
TITLE:
IMPLEMENTATION OF SYMBOL TABLE

RELATED THEORY:

Symbol Table:
It is an important data structure created and maintained by the compiler in order to keep
track of semantics of the variables
It is built in lexical and syntax analysis phase. It is used by various phases of compiler as
follows:
 Lexical Analysis: creates entries for identifiers and literals in the table
 Syntax Analysis: checks if identifier is declared and has valid scope within the table
 Semantic Analysis: updates entries with info like data types, dimension and scope
 Intermediate Code Generation: uses info (data type, memory locations,etc) from table
for code generation.
 Code Optimization: table may be used to identify optimization opportunities
 Target Code Generation: tables contents help in generation of machine instructions

ALGORITHM:
Step 1: Start
Step 2: Get the input from the user until the terminating symbol ($) is triggered.
Step 3: Allocate memory for the variable by dynamic allocation function
Step 4: If the next character of the symbol is an operation then only the memory is
allocated.
Step 5: While reading the input is inserted into the symbol table along with its
memory address.
Step 6: The steps are repeated till ‘$’ is reached.
Step 7: To reach a variable, enter the variable to the searched and symbol table has
been checked for corresponding variable, the variable along its address is
displayed as result.
Step 8: Stop
Lab No. 9
TITLE:
IMPLEMENTATION OF TYPE CHECKING

RELATED THEORY:

Type Checking:
Compiler must check that the source program follow both the syntactic and semantic conventions
of the source language.
Type checking is the process of checking the data type of different variables.
The design of a type checker for a language si based on information about the syntactic
construction in the language, the notations of type, and the rules for assigning types to the
language construction.

Fig. Position of Type Checker

ALGORITHM:
Step 1: Start
Step 2: Track the global scope type information.
Step 3: Determine the type of expressions recursively I.e., bottom-up passing the
resulting type upwards
Step 4: If type found correct, do the operation.
Step 5: Type mismatches, semantic error will not be notified.
Step 6: Stop

You might also like