Bottom Up LR (0) Parsing in C.

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

BOTTOM UP PARSING

A bottom-up parse corresponds to the construction of a parse tree for an input string
beginning at the leaves (the bottom) and working up towards the root (the top) . It is
convenient to describe parsing as the process of building parse trees, although a front end
may in fact carry out a translation directly without building an explicit tree.
We can think of bottom-up parsing as the process of "reducing" a string w to the start symbol
of the grammar. At each reduction step, a specific substring matching the body of a
production is replaced by the non terminal at the head of that production. The key decisions
during bottom-up parsing are about when to reduce and about what production to apply, as
the parse proceeds.

LR PARSERS:
The 'most prevalent type of bottom-up parser today is based on a concept called LR(k)
parsing; the "L" is for left-to-right scanning of the input, the "R" for constructing a rightmost
derivation in reverse, and the k for the number of input symbols of lookahead that are used in
making parsing decisions.
LR parsing is attractive because of variety of reasons:
LR parsers can be constructed to recognize virtually all programming language
constructs for which context-free grammars can be written. Non LR context-free
grammars exist, but these can generally be avoided for typical programminglanguage constructs.
The LR-parsing method is the most general non back tracking shift-reduce parsing
method known, yet it can be implemented as efficiently as other, more primitive shiftreduce methods .
An LR parser can detect a syntactic error as soon as it is possible to do so on a left-toright scan of the input.
LR grammars can describe more languages than LL grammars.

ITEMS AND THE LR(0) AUTOMATION:


An LR parser makes shift-reduce decisions by maintaining states to keep track of where we
are in a parse. States represent sets of "items." An LR(O) item (item for short) of a grammar
G is a production of G with a dot at some position of the body. Thus, production A -> XYZ
yields the four items
A -> XYZ
A -> X YZ
A -> XY Z
A -> XYZ
The production A -> generates only one item, A -> . .
Intuitively, an item indicates how much of a production we have seen at a given point in the
parsing process. For example, the item A -> XY Z indicates that we hope to see a string
derivable from XY Z next on the input. Item A -> X Y Z indicates that we have just seen on
the input a string derivable from X and that we hope next to see a string derivable from Y Z.
Item A ->XY Z indicates that we have seen the body XY Z and that it may be time to reduce
XYZ to A.

CLOSURE OF ITEM SETS:


If I is a set of items for a grammar G, then CLOSURE(I) is the set of items constructed from I
by the two rules:
1. Initially, add every item in I to CLOSURE(I).
2. If A ->B is in CLOSURE(I) and B -> is a production, then add the item B -> to
CLOSURE(I), if it is not already there. Apply this rule until no more new items can
be added to CLOSURE (I).
THE FUNCTION GOTO:
The second useful function is GOTO(I, X) where I is a set of items and X is a grammar
symbol. GOTO (I, X) is defined to be the closure of the set of all items [A -> X. ] such that
[A -> . X] is in I. Intuitively, the GOTO function is used to define the transitions in the
LR(O) automaton for a grammar. The states of the automaton correspond to sets of items, and
GOTO (I, X) specifies the transition from the state for I under input X.

THE LR PARSING ALGORITHM:


A schematic of an LR parser is shown in Fig. It consists of an input, an output, a stack, a
driver program, and a parsing table that has two parts (ACTION and GOTO) . The driver
program is the same for all LR parsers; only the parsing table changes from one parser to
another. The parsing program reads characters from an input buffer one at a time. Where a
shift-reduce parser would shift a symbol, an LR parser shifts a state. Each state summarizes
the information contained in the stack below it.

FIG: LR PARSING ALGORITHM.

STRUCTURE OF LR PARSING TABLE:


The parsing table consists of two parts: a parsing-action function ACTION and a goto
function GOTO.
1. The ACTION function takes as arguments a state i aI1d a terminal a (or $, the input
end marker). The value of ACTION[i, a] can have one of four forms:
Shift j , where j is a state. The action taken by the parser effectively shifts input a
to the stack, but uses state j to represent a .
Reduce A ->. The action of the parser effectively reduces on the top of the
stack to head A.
Accept. The parser accepts the input and finishes parsing;
Error. The parser discovers an error in its input and takes some
corrective action.
2. We extend the GOTO function, defined on sets of items, to states: if GOTO [Ii , A] = Ij ,
then GOTO also maps a state i and a non terminal A to state j.

ALGORITHMS USED:
Algorithm to compute closure of an item:

Algorithm to compute set of canonical LR(0) items:

Algorithm to construct LR(0) Parsing Table:

LR PARSING ALGORITHM:

IMPLEMENTATION CODE IN C LANGUAGE:


#include<stdio.h>
#include<conio.h>
#include<stdlib.h>
#include<string.h>
#define size 20
struct state
{
char productions[size][10],on_symbol;
short int
scanned_productions[size],no_of_productions,state_number,shift_info[10],number_of_shift;
struct state *link;
};
typedef struct state * NODE;
struct action
{
int state;
char act;
};
typedef struct action ACTION;
ACTION field[30][15];
NODE first = NULL,last = NULL;
int number_of_states,no_of_variables,no_of_terminals,count=1,jp=0,p_goto[30][10],tp;
char closure_productions[size][10],input[10][10],p_first[10][10],p_follow[10]
[10],variables[size],terminals[size],p[10];
void closure(NODE,int *);
void items();
NODE getnode()
{
NODE temp;
int i;
temp = (NODE) malloc(sizeof(struct state));
for(i = 0 ;i < size ;i++)

{
strcpy(temp->productions[i],"\0");
temp->scanned_productions[i] = 0;
}
temp->no_of_productions = 0;
temp->number_of_shift = 0;
temp->link = NULL;
return temp;
}
void insert(NODE temp)
{
if(first == NULL)
{
first = temp;
last = temp;
}
else
{
last->link = temp;
last = temp;
}
}
void dot_productions(char input[][10],int count)
{
int i = 1,j,k = 1;
char buffer[10]={'\0'};
while(i < count)
{
k = 0;
j = 0;
while(input[i][k] != '\0')
{
if(input[i][k] == '>')
{
buffer[j++] = '>';
buffer[j++] = '.';
}
else
buffer[j++] = input[i][k];
k++;
}
buffer[j] = '\0';
strcpy(closure_productions[i],buffer);
i++;
}
}
int check_for_presence_in_productions(NODE temp,char *buffer)
{
int i = 0;
while(temp->productions[i][0])
{
if(!strcmp(temp->productions[i],buffer))
return 1;
i++;
}
return 0;
}
void augment_grammar(char input[][10])
{
input[0][0] = input[1][0];

input[0][1]='1';
input[0][2]='-';
input[0][3]='>';
input[0][4] = input[1][0];
input[0][5] = '\0';

}
void initial_state(char input[][10])
{
int i = 0,j = 0,place;
char buffer[30] = {'\0'};
NODE temp = NULL;
while(input[0][i] != '\0')
{
if(input[0][i] == '>')
{
buffer[j++] = '>';
buffer[j++] = '.';
}
else
buffer[j++] = input[0][i];
i++;
}
buffer[j] = '\0';
temp = getnode();
insert(temp);
temp->state_number = 0;
temp->on_symbol = '\0';
strcpy(temp->productions[0],buffer);
temp->no_of_productions += 1;
place = 1;
closure(temp,&place);
}
int state_not_added(NODE temp1,int sno)
{
NODE temp;
int count;
temp = first;
while(temp != NULL)
{
if(temp->no_of_productions == temp1->no_of_productions)
{
count = 0;
while(count < temp->no_of_productions)
{
if(!strcmp(temp->productions[count],temp1->productions[count]))
count++;
else
break;
}
if(count == temp->no_of_productions)
{
temp->shift_info[++temp->number_of_shift] = sno;
return 1;
}
}
temp = temp->link;
}
return 0;
}
int findv(char c)
{
int i=0;
for(i=0;i<no_of_variables;i++)

if(c==variables[i])
{
return i;
}

}
return -1;

}
int findt(char c)
{
int i=0;
for(i=0;i<no_of_terminals;i++)
{
if(c==terminals[i])
return 1;
}
return 0;
}
int eptrans(char c)
{
int i=0;
for(i=1;i<=count;i++)
{
if(input[i][0]==c)
{
if(input[i][3]=='?')
return 1;
}
}
return 0;
}
void compute_first(char c,int index)
{
int j,i=0,k=3,y=0;
for(j=1;j<=count;j++)
{
if(input[j][0]==c)
{
if(input[j][3]=='?')
{
while(p_first[index][y]!='\0')
{
if(p_first[index][y]=='?')break;y++;
}
if(y==strlen(p_first[index]))
p_first[index][jp++]='?';
}
else if(findt(input[j][3]))p_first[index][jp++]=input[j][3];
else if(input[j][3]==c);
else
{
while(input[j][k]!='\0')
{
if(eptrans(input[j][k]))
{compute_first(input[j][k],index);
k++;
tp++;
}
else
{
compute_first(input[j][k],index);
break;

}
}

}
}
int detect_epslon(char c)
{
int var,i;
var = findv(c);
if(var == -1)
return 0;
for(i=0;i<strlen(p_first[var]);i++)
{
if(p_first[var][i]=='?')
return 1;
}
return 0;
}
int presence(char c,int index)
{
int i;
for(i=0;i<strlen(p_follow[index]);i++)
if(c==p_follow[index][i])
return 0;
return 1;
}
void compute_follow(char c,int index)
{
int j,i=0,k=0,var,m=0;
char temp;
for(j=1;j<=count;j++)
{
for(k=3;k<strlen(input[j]);k++)
{
if(c==input[j][k])
{
m=k;
while(input[j][m]!='\0')
{
temp=input[j][m+1];
if(findt(temp))
{
if(presence(temp,index))
p_follow[index][jp++]=temp;
break;
}
else if(temp==NULL||detect_epslon(temp))
{
var = findv(input[j][0]);
for(i=0;i<strlen(p_follow[var]);i++)
if(presence(p_follow[var][i],index))
p_follow[index][jp++]=p_follow[var][i];
}
if(var=findv(temp))
{
for(i=0;i<strlen(p_first[var]);i++)
if(p_first[var][i] != '?'&&presence(p_first[var][i],index))
p_follow[index][jp++]=p_first[var][i];
}

if(detect_epslon(temp))

else
}

m++;
break;

}
}

}
int findp(char *str)
{ int i;
char buff[20];
strcpy(buff,str);
buff[strlen(str)-1]='\0';
for(i=1;i<=count;i++)
{
if(!strcmp(buff,input[i]))
return i;
}
return 0;
}
int posterm(char c)
{
int i=0;
for(i=0;i<no_of_terminals;i++)
{
if(c==terminals[i])
return i;
}
return 0;
}
void compute_action()
{
NODE temp;
int i,j,k,l,m;
char ch;
for(i=0;i<no_of_terminals;i++)
{
ch=terminals[i];
temp = first;
while(temp!= NULL)
{
if(temp->on_symbol==ch)
{
for(j=1;j<=temp->number_of_shift;j++)
{
field[temp->shift_info[j]][i].state=temp->state_number;
field[temp->shift_info[j]][i].act='s';
}
}
temp=temp->link;
}
}
temp=first;
while(temp!=NULL)
{
if(temp->state_number!=1)
for(i=0;i<temp->no_of_productions;i++)
{
k=strlen(temp->productions[i]);
k--;

if(temp->productions[i
][k]=='.')
{
l=findp(temp->productions[i]);
m=findv(input[l][0]);
for(j=0;j<strlen(p_follow[m]);j++)
{
if(field[temp->state_number][posterm(p_follow[m]
[j])].act==NULL)

{
field[temp->state_number][posterm(p_follow[m]

[j])].state=l;

field[temp->state_number][posterm(p_follow[m]

[j])].act='r';

}
else
{

printf("%cr confilct\n",field[temp>state_number][posterm(p_follow[m][j])].act);
getch();
exit(0);
}
}
}
}
else
{
field[1][posterm('$')].state=1;
field[1][posterm('$')].act='a';
}
temp=temp->link;
}
}
void compute_goto()
{ int i,j;
char ch;
NODE temp;
for(i=0;i<no_of_variables;i++)
{
temp=first;
ch=variables[i];
while(temp!= NULL)
{
if(temp->on_symbol==ch)
{
for(j=1;j<=temp->number_of_shift;j++)
{p_goto[temp->shift_info[j]][i]=temp->state_number;
}
}
temp=temp->link;
}
}
}
void parse(char *str)
{
int i=0,stack[15],j,top = 0,k,m,pos=-1,l=0;

char ch,action[8]={'\0'},symbol[20]={'\0'},*p,temp;
stack[0] = 0;
while(str[i] !='\0')
{
j = posterm(str[i]);
p=&str[i];
if(field[stack[top]][j].act == 'a')
{
strcpy(action,"Accept");
printf("\n");
printf("%15s
",action);
for(l=0;l<=top;l++)
printf("%d",stack[l]);
printf("%15s",symbol);
printf("%15s",p);
printf("\nString parsed");
break;
}
else if(field[stack[top]][j].act=='s')
{
stack[top+1] = field[stack[top]][j].state;
symbol[++pos]=str[i];
strcpy(action,"Shift");
top++;
i++;
}
else if(field[stack[top]][j].act=='r')
{
strcpy(action,"Reduce");
k = strlen(input[field[stack[top]][j].state])-3;
m = findv(input[field[stack[top]][j].state][0]);
temp=input[field[stack[top]][j].state][0];
while(k>0)
{
stack[top--] = 0;
symbol[pos--]='\0';
k--;
}
symbol[++pos]=temp;
stack[top+1]=p_goto[stack[top]][m];
top++;

}
else
{

printf("\nERRROR");
return;
}
printf("\n");
printf("%15s
",action);
for(l=0;l<=top;l++)
printf("%d",stack[l]);
printf("%15s",symbol);
printf("%15s",p);

}
}
int validi(char *str)
{
int i=0;
for(i=0;i<strlen(str);i++)
{
if(!findt(str[i]))
return 0;
}
return 1;
}

int main()
{
int i = 0,j,k;
char buffer[10],str[20]={'\0'};NODE temp;
printf("\nenter the variables\n");
scanf("%s",variables);
no_of_variables=strlen(variables);
printf("\nenter the terminals\n");
scanf("%s",terminals);
terminals[strlen(terminals)]='$';
// terminals[strlen(terminals)]='?';
no_of_terminals=strlen(terminals);
printf("\nEnter the productions(? FOR EPSILON)$ to end the input\n");
while(1)
{
scanf("%s",buffer);
if(!strcmp(buffer,"$"))
break;
else
{
strcpy(input[count],buffer);
count++;
}
}
for(i=0;i<no_of_variables;i++)
{
jp=0;
tp=0;
compute_first(variables[i],i);
//printf("%c--->%s\n",variables[i],p_first[i]);
}
for(j=0;j<no_of_variables;j++)
{ for(i=0;i<no_of_variables;i++)
{
jp=0;
compute_follow(variables[i],i);
if(j==0&&i==0)p_follow[j][jp++]='$';
}
}
/*for(i=0;i<no_of_variables;i++)
{printf("f%c--->%s\n",variables[i],p_follow[i]);
}*/
dot_productions(input,count);
augment_grammar(input);
initial_state(input);
i = 0;
/*while(first->productions[i][0] != '\0')

printf("\n%s",first->productions[i++]);
printf(" %d\n",first->no_of_productions);
*/
items();
temp = first;
while(temp != NULL)
{
i = 0;
printf("\nI%d ",temp->state_number);
while(i < temp->no_of_productions)
printf("\n%s",temp->productions[i++]);
for(i = 1;i <=temp->number_of_shift;i++)
printf("\nI%d on %c = %d",temp->shift_info[i],temp->on_symbol,temp>state_number);
temp =temp->link;
}
getch();
compute_action();
compute_goto();
printf("\n--------------------------LR(0) PARSING TABLE---------------------------------");
printf("\n
");
for(i=0;i<no_of_terminals;i++)
{ printf("%7c",terminals[i]);
}
printf(" ");
for(i=0;i<no_of_variables;i++)
{
printf("%7c",variables[i]);
}
for(i=0;i<=number_of_states;i++)
{
printf("\n%7d",i);
for(j=0;j<no_of_terminals;j++)
printf(" %c%d ",field[i][j].act,field[i][j].state);
for(j=0;j<no_of_variables;j++)
printf("%7d",p_goto[i][j]);
}
printf("\nEnter the string to be parsed\n");
scanf("%s",str);
if(!validi(str))
{
printf("invalid input\n");
getch();
exit(0);
}
printf("\n------------------------------------PARSING------------------------------------");
parse(str);
getch();
return 0;
}
void closure(NODE temp,int *place)
{
int i = 0,j = 0,l = 0,k = 0,m;
i = (*place)-1;
j = 0;
while(temp->productions[i][0] != '\0')

m = 0;
while(temp->productions[i][m] != '.')
m++;
m++;
if(temp->productions[i][m] >= 65 && temp->productions[i][m] <=90)//findv
{
j = 1;
while(closure_productions[j][0] != '\0')
{
l=0;
if(closure_productions[j][0] == temp->productions[i][m])
{
if(!check_for_presence_in_productions(temp,closure_productions[j]))
{
strcpy(temp->productions[(*place)++],closure_productions[j]);
temp->no_of_productions += 1;
}
}
j++;

}
}
i++;
}
}
void items()
{
NODE temp,temp1 = NULL;
int i=0,j=0,k=0,place=0;
char ch,buffer[10];
temp = first;
while(temp != NULL)
{
while(i < temp->no_of_productions)
{
if(temp->scanned_productions[i] == 0)
{
j = 0;
temp->scanned_productions[i] = 1;
while(temp->productions[i][j++] != '.');
ch = temp->productions[i][j];
if( ch != '\0'&&ch!='?')
{
place = 0;
temp1 = getnode();
temp1->on_symbol = ch;
temp1->shift_info[++temp1->number_of_shift] = temp>state_number;
strcpy(buffer,temp->productions[i]);
buffer[j-1] = ch;
buffer[j] = '.';
strcpy(temp1->productions[0],buffer);
temp1->no_of_productions = 1;
place = 1;
closure(temp1,&place);
k = 0;
while(temp->productions[k][0] != '\0')

j = 0;
if(temp->scanned_productions[k] == 0)
{
while(temp->productions[k][j++] != '.');
if(ch == temp->productions[k][j] )
{
temp->scanned_productions[k] = 1;
strcpy(buffer,temp->productions[k]);
buffer[j-1] = ch;
buffer[j] = '.';
strcpy(temp1->productions[place++],buffer);
temp1->no_of_productions += 1;
closure(temp1,&place);
}

}
k++;
}
/*for(i = 0;i < temp1->no_of_productions;i++)
printf("\n%s",temp1->productions[i]);*/
if(!state_not_added(temp1,temp->state_number))
{
insert(temp1);
temp1->state_number = ++number_of_states;
}
else
free(temp1);
}
i=0;
}
else
i++;
}
temp = temp->link;
i = 0;
}
}

SAMPLE OUTPUT:

You might also like