Professional Documents
Culture Documents
CD Unit4
CD Unit4
CHAPTER 9
SYMBOL TABLES
Symbol table
Symbol table is a table that has entries for all the symbols in the program.
A compiler has to collect all the names appearing in the source program and enter them in
symbol table.
Each entry in the symbol table is of the form name, information
Each time when a name is encountered, Symbol table is searched to see whether the name
is already there.
If it is a new name, then it is inserted in the symbol table.
Information about the name is entered in lexical and syntactic analysis.
9.1 Contents of Symbol table
Symbol table is a table with 2 fields (name and information)
Name – name of the symbol (label, identifier, constant, procedure name, array name)
Information – Attributes , Parameters, offset
Features:
We must be able to do the following things
1. determine whether a given name is present in the table
2. add new name in the table
3. Access the information associated with the name, if the name is present
4. Add new information for the given name
5. Delete a name or group of names from the table
There may be separate tables for variable names, labels, procedure names, constant and field
names etc.
Data structures used for the symbol table
Linear list
- It is very slow
- But, Simple and Easy to implement
Hash table
- It is fast
- But more complex
Tree structure
- Intermediate in performance
Identifier DIMPLE
Information Integer, variable
Identifier AMAX
Information Integer, const
. .
. .
. .
Indirect method
If there is no limit on the length of the identifier, then indirect scheme can be used.
ALGOL uses indirect scheme
.
.
.
.
. . . R F D I M P L E Q U . . .
Separate string array
A compiler can be designed to run in lesser space, if the space occupied by the identifiers
can be reused.
If there are no further references to the identifier, then the space can be reused in subsequent
passes.
Eg. The space occupied by the string table(indirect scheme) can be reused in subsequent
passes.
In the direct scheme also, if the identifier and information are stored as 2 separate arrays,
then the space occupied by the identifiers can be used in subsequent passes.
Array names :
If the identifier is array name , then all the subscript information should be entered in the
symbol table.
FORTRAN limits to 3 dimensions
In all the case, the lower limit is 1
The information regarding array subscript will look like:
UL1 UL2
UL3 B
A(3)(2)(10)
UL1 3
UL2 2
UL3 10
9.2 Data structures used for symbol tables :
3 data structures can be used to represent symbol table
Linear list
Search trees
Hash tables
Linear lists:
Name 1
Information 1
Name 2
Information 2
.
.
.
Name n
Information n
Available
Simplest scheme
Single array or several arrays can be used to store names and associated information.
New names are added to the list, in the order in which they occur.
To get information for a name, searching is done from the beginning of the array upto the
position marked by the pointer called AVAILABLE.
AVAILABLE pointer points to the beginning of the empty portion of the array.
When the name is found, the information is read from the next location.
If AVAILABLE is reached without finding names, then fault occurs.(undefined name)
To insert a new name, scan down the list from the beginning(to ensure it is not already
there)
If it is available, fault occurs (multiply defined name)
If it is not available, then the new name is stored immediately following the AVAILABLE
and then pointer is incremented.
Time:
If a symbol table contain n names, then to insert new name, the time needed is proportional
to n
To find a data about name, on an average n/2 names have to be searched.
Cost of inquiry is proportional to n
To insert n names and m inquiries, the total cost is cn(n+m), where c is a constant
Merits:
1. Very easy to implement
2. Minimum space is taken
Demerits:
1. Very slow in retrieving data
2. Useful only for small jobs
3. Performance is poor when n and m becomes large.
Self organizing lists:
While p≠ null do
If NAME = NAME(P) then
…../* Name found, take action on access */
else If NAME < NAME(P) then
P= LEFT(P) /* visit the left child */
else
/* NAME(P) < NAME */
then P= RIGHT(P) /* visit the right child */
Time:
If names are encountered in random order, average length of the path is proportional to log n
To insert n names and m inquiries, time is proportional to (n+m) log n
Merit:
Binary search gives better performance than linear search
Search is narrowed quickly
Demerit
Implementation is difficult than linear list
Hash Tables:
Cost :
To insert n names and m inquiries, time is proportional to n (n+m) / k
Advantage:
This method is superior to linear search and binary search.
Provides best Performance
Disadvantage:
more programming effort is needed
Also it needs more space
UNIT 5
CHAPTER 10
RUNTIME STORAGE ADMINISTRATION
10.1 Implementation of stack allocation scheme
Local variables can be accessed only by the procedure in which it is declared.
Global variables can be accessed and are available to all procedures.
TOP
Extra storage for R
Activation record for R
SP
Extra storage for Q
Activation record for Q
Extra storage for P
Activation record for P
Direction of growth
Local data
SP Old SP
Return value
Return address
Arg count
Actual parameters
Fig Activation record for a C prodcedure
- SP points to the position below the local data
- Hence local data can be accessed by a negative offset from SP and the parameters can be
accessed by +ve offset from SP
- In a c language, all the local variables including arrays are of fixed size.The size of the
activation record can be accessed by the compiler
The local name can be accessed by X[SP] where x stands for offset x
Procedure calls in C
Param T1
Param T2
.
.
Call P,n
Each Param, T instruction is converted to PUSH (P)
Push(p)
Top= top-1
*top=x
These instructions push T onto the runtime stack.
In block structured languages, procedures and blocks may define their own data. Hence the
activation records or portion of the activation records must be reserved for blocks.
Block structured language permits arrays of variable length
Data referencing environment of procedure / block includes , all procedures and blocks
surrounding it in the program
Displays :
Display is a method that will have direct access to nonlocal data
It consists of an array of pointers to the currently accessible activation record
EG:
Q calls R(X,Y) and R has the following declaration
integer i;;
real array A[ 0:n-1, 1:m];
real array B[2:10];
The fig shows the activation record for R
Activation record
Data descriptor and
elements of B
Data descriptor and
elements of A
Display pointer for MAIN
Pointer to A
Pointer to B
Local data
Value of i
Old sp
SP
Return address
Arg count=2
Actual parameter y
Actual parameter x
Procedure calls
Suppose that we are currently executing a procedure Q and in that procedure we call R(x,y)
The level of Q [ number of procedures and blocks in static environment] is 3 - (main, P, Q)
So when Q is in execution, the display has 3 pointers to the top of activation records of main, P and
Q
Procedure Main();
Procedure P(a);
Procedure Q(b);
R(x,y);
endQ;
Q(Z);
endP;
Procedure R(c,d);
endR;
….
…..
End Main;
Main();
Parameter passing
The operand of Param 3 address statement will be treated as a pointer to the value of the actual
parameter or as the value itself
Eg:
Suppose we call the procedure R (A+B*C, D) where A, B,C and D are integers
The translation to 3 address code is:
T1:= B*C
T2:=A+T1
Param T2
Param D
Call R, 2
Blocks
Blocks are “Parameter less procedures”
Blocks may be given display pointers similar to procedures.
Procedure P(X,Y);
Integer I;
Real array A[40];
Begin
Integer I, J;
Integer array B[50];
.
.
Begin
Integer K; B1
.
. B2
Go to L;
End;
End;
.
.
L:
Begin
Real array C[60]; B3
.
.
End;
.
.
End P
The fig shows the activation record when B2 and B3 are active
B2 is active B3 is active
UNIT 5
CHAPTER 11
ERROR DETECTION AND RECOVERY
11.1 Errors
Syntactic errors:
Missing right parenthesis (deletion errors) MIN(A,2*(3+B)
Extra comma (insertion errors) DO 10, I=1,100
Colon in place of ; (replacement errors) I=1: J=2;
Misspelled keyword (transposition errors) PORCEDURE SUM
Extra blank (insertion errors) /* COMMENT * /
Dynamic errors:
In some languages, Some errors can be detected only at runtime. Some languages (APL,
SNOBOL) have several data types and type of a name can change at runtime. Hence type
checking cannot be done at compile time. They should be postponed till runtime
Range checking can be done only at runtime. Range for checking array subscripts and case
stmt selection can be done only at runtime. [ subscript out of range, Arbitrary value in case
stmt]
Errors can be classified based on the phase in which the compiler detects them
Lexical Phase errors
Syntactic phase errors
Semantic phase errors
11.2 Lexical phase errors
[ function of lexical analyzer : break the input not stream of tokens]
If after some processing, the lexical analyzer discover that no prefix of the remaining input, fits
into any token classs, it calls error recovery routine. If the input does not match into any token
class, it calls error recovery routine
The problem of recovering the lexical phase errors may sometimes cause problems
Eg . IF ( long condition .OR. X .LT. Y ) goto 30
Assume that after X, the line split’s to next line with continuation mark.
Suppose if the continuation mark is wrongly entered in 8th column, , after reading X it finds the end
of statement
Statement does not match with any syntax. Here syntax error occurs, Parser must do recovery
Reduce the statement with IF to a nonterminal statement
Then the lexical analyser must process the continuation
1.LT. Y ) goto 30
1. real constant ( variable)
LT. variable
Insert = sign in between (id=id)
.Y (delete . and try to find next token.)
Y (idn) --- it tries to add + id +id
Time of detection:
LR parsers have the valid prefix property
ie. they will announce the error as soon as they find a prefix of the input with invalid continuation
LR parsers announce the errors at the earliest
Advantage of a parser with valid prefix property
Reports the error at the earliest
Limits the amount of erroneous output passed to subsequent passes.
Panic mode :
In the panic mode of the recovery, the parser discard the input symbols until a synchronizing token
is encountered. (delimeter such as ; or end)
The parser then deletes stack entries until it finds an entry such that it can continue parsing
Advantage:
Simple to implement
It never get into infinite loop
Error recovery in operator precedence parsing;
There are 2 cases in which the operator precedence Parsing discovers syntactic errors
If no precedence relation holds between the terminal on the top of the stack and current
input symbol
If a handle is found, but there is no production with this handle as the right side
These 2 errors will be noted and recovered
Handling errors during reduction
1. Normally the checker checks and displays the following errors
If +or * is reduced, it checks whether non terminals appear on both sides. If not, error
message will be displayed.
MISSING EXPRESSION
2. If id is reduced, it checks whether there is no nonterminal to the right or left.
If so, 2 expressions not connected by operator message will be displayed
3. If ( ) is reduced, it checks whether there is a nonterminal between the parenthesis
If not, null expression between parenthesis will be displayed
4. Nonterminal should not appear on the sides of ( and ) Parenthesis
If so, 2 expressions not connected by operator message will be displayed
In the precedence matrix, the blank entries will be filled with the names of error handling routines.
e1: ($ - e1 is called when the expression ends with left parenthesis
action : pop ( from stack
msg: Illegal left parenthesis
e2: id or ) is followed by id or (
[ id id, id(, )id, )( ]
action : insert + on the input
msg: missing operator
Undefined names
Type incompatibility
We have to recover from such errors and we should suppress the duplicate error messages
When the first time, we encounter an undeclared name, we make an entry in the symbol table for
that name with necessary attributes
In addition to this, a flag is set to indicate that the entry was made due to semantic error and is not a
declaration
When the name appear next time, a check is made to determine whether a previous instance of
error msg is noted in the symbol table. If so, no new error message is printed
If not, error message is printed, and new erroneous usage is now added to the symbol table
Hence duplicate error messages are avoided.