Compiler Notes Unit 3

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

~ ~-

MODULE-1,..TYPE CHECKING

TYPE CHl.:CKING
A compil er must check that th e source µrogrn m fo ll ows bot h synla ct1c and scrn ont1 c conventi ons
of the source IAnguage.
This checking. ca ll ed .Hath · c/1eck111g. detects and report s prog1a111m111g errors

Some e:-.:amples of sta 11 c checks.

I. T)'PC cherks - A compiler should repo,1 an error 1f an operator 1s applied to an 1ncompat1ble


opera nd Exa mple: lf an array variable and function variable are added together
2. Flow-of-control checks - Statements that cause flow of control to leave a construct must have
some place to which to transfer the flow of control. Example: An error occurs when an
enclosing statement, such as break, does not exist in switch statement.
Position of type checker

oken . parser syntax . type diecker syntax . intermediate intermediate


.
s tream tree tree code generator representatio n

• A type checker verifies that the type of a construct matches that expected by its context.
For example : arithmetic operator mod in Pascal requires integer operands, so a type
checker verifies that the operands of mod have type integer.
• Type information gathered by a type checker may be needed when code is generated.

TYPE SYSTEMS
The design of a type checker for a language is based on information about the syntactic
constructs in the language, the notion of types, and the rules for assigning types to language
constructs.
For example : " if both operands of the arithmetic operators of+,- and * are of type integer, then
the result is of type integer "
Type Expre.~sions
• The type of a language construct will be denoted by a "type expression."
• A type expression is either a basic type or is formed by applying an operator called a type
comtructor to other type expressions.
• The sets of basic types and constructors depend on the language to be checked.

The fol lowing are the definitions of type expressions:

I. Basic types such as boolean, char, integer, real are type expressions.
A special basic type, type error , will signal an error during type checking; void denoting
''the absence of a value" allows statements to be checked.
2. Since type expressions may be named, a type name is a type expression.
3. A type constructor applied to type expressions is a type expression.
Constructors include:
Array1· : If T is a type expression then array (1 ,T) is a type expression denoting the type
of an array with elements of type T and index set I.

Products : If T1 and T2 are type expressions, then their Cartesian product T 1 X T2 is a


type expression.
. . , ·d and a product is that the fields of a record have
R . ti • · The difference between a rec 0 1
ecor s · · · · . b lied to a tuple formed from field names and
names. The record type constructor WI I1 e a PP
field types.
For example:
type row = record
tltidres.1,: integer;
Jereme: arrt1y/1..15/ of du11·
end;
""r tllble: array/1... IO 1/ <?f row;
declares the type name row representing the type expression recortl(("'ltlre.,·s X integer) X
(lexeme X array(l.. 15,clutr))) and the variable table to be an array of records of this type.

Pointer.s : If T is a type expression, then poimer(T) is a type expression denoting the type
"pointer to an object of type T '.
For example, var p : j row declares variable p to have type pointer(row).

Functiom : A function in programming languages maps a domain type D to a range t;,pe R.


The type of such function is denoted by the type expression D -> R

4. Type expressions may contain variables whose values are type expressions .

Tree representation for char x char--+ pointer (integer)

/ ~poTter
.
mteger

Type systems

► A type .system is a collection of rules for assigning type expressions to the various parts of
a program.

► A type checker implements a type system. It is specified in a syntax-directed manner.

► Different type systems may be used by different compilers or processors of the same
language.

Static and Dynamic Checking of Types

► Checking done by a compiler is said to be static, while checking done when the target
program runs is termed dynamic.

► Any check can be done dynamically, if the target code carries the type of an element
along with the value of that element.
Sound type system
A sound type syst
allows us to d t .
. .
em e1unmates the need for dynamic checking for type errors because it
That ,s .f. e erdmme statically that these errors cannot occur when the target program runs .
' I a soun type syste . " .
I
m assigns a type other than 1y1,e error to a program part then type
errors cannot occur wl ,
1en t1le target code for the program part is run.
Strongly typed language
_ A lang~age is strongly typed if its compiler can guarantee that the programs it accepts
will execute without type errors.

Error Recovery

r Since type checking has the potential for catching errors in program, it is desirable for
type checker to recover from errors, so it can check the rest of the input.

► Error handling has to be designed into the type system right from the start; the type
checking rules must be prepared to cope with errors.

SPECIFICATION OF A SIMPLE TYPE CHECKER

Here, we specify a type checker for a simple language in which the type of each identifier
must be declared before the identifier is used. The type checker is a translation scheme that
sy nth es izes the type of each expression from the types of its subexpressions . The type checker
can handle arrays, pointers, statements and functions.

A Sim1>le Language

Consider the following grammar:

P - D;E
D .- D ; D I id : T
T -► char I integer I array [ num ] of T I j T
E -► litera l I num I id I E mod E I E [ E ] I E i

Translation scheme:

P -► D ; E
0 -► 0 ; D
o - id : T { addtJpe (id.entry , T.type) }
T - char { T. type : =char }
T .- integer { T.type : = integer }
T.- jTI { T.type : = pointer(T1.rype)}
T - array [ num] of TI { T.type : = array ( I .. . num. val , Ti .type)}

In the above language,


- There are two basic types : char and integer ;
.- type error is used to signal errors; _
.- the prefix operator j builds a pointer type. Example , j integer leads to the type expression
pointer ( integer ).
Type checking of (~x 1wessions

In the following rul es, th e attribute ~)lf'e for E give.~ th e type ex press ion assigned to th e
expression generated by E

E -+ literal I E.rype = char l


E -. 1111111 { E.ty pe = integer}

Here, constants represented by the tokens litcrnl and 1111111 have type chCII' and tnleKer

2. E _. id { E. ~we : = lookup ( id.en fl y) }


lookup ( e) is used to fetch the type saved in the symbol table entry pointed to by e.

3 E-+ E 1 mod E2 { F:. 1ype : = if E,. type= ;n,eger :rnd


E2- type = integer then inleger
else type error }
The expression formed by applying the mod operator to two subexpressions of type integer has
type integer; otherwise, its type is type error.

4. E-. E1 [ E2] { Etype : == if E2.type = integer and


E1.type = array(.v,r) then ,
else type error }
In an array reference E 1 [ E2] , the index expression E2 must have type integer. The result is
the element type t obtained from the type array(.\·,t) of E1.

{ E.type : = if £ 1.type =pointer (t) then f


else type error }

The postfix operator i yields the object pointed to by its operand. The type of E i is the type ,
of the object pointed to by the pointer E.

Type checking of statements

Statements do not have values~hence the basic type void c.an be assigned to them. If an error is
detected within a statement, then type error is assigned.

Translation scheme for checking the type of statements:

I . Assignment statement:
S-.icl : =E { S.type : = if id.type= £.type then void
else rype error}

2. Conditional statement:
S -. if Ethen S 1 { S.1ype : = if Etype = boolean then S 1.type
else ~ype error}

3. While staf'crnent:
S -·► while E do S 1 { S.~ype : = if E.~VJJe = boolean then Si .type
else type aror }
~
-
bilia :we co- n
-
-
MODULE ·.2;· RUN-TIME ENVIRONMENTS

SOURCE LANGUAGE ISSUES

Procedures:

A procedure definition is a d l · •
: · ec arat1on t11at associates an identifier with a statement The
identifier is theprocedr ,.., d .
' e name, an t11e statement 1s the procedure body.

For example, the following is the definition of procedure named readarray :

pl'Ocedure readarray;
var i : integer;
begin

for i : = I to 9 do read(a[i])
end;

When a procedure name appears within an executable statement, the procedure is said to be

called at that point.

Activation trees:
An ac/ivalion tree is used to depict the way control enters and leaves activations. In an
activation tree,

I . Each node represents an activation of a procedure.


2. The root represents the activation of the main program.
3. The node for a is the parent of the node for h if and only if control flows from activation a to
b.
4. The node for a is to the left of the node for b if and only if the lifetime of a occurs before the
lifetime of b.

Control stack:

• A control stack is used to keep track of live procedure activations. The idea is to push the
node for an activation onto the control stack as the activation begins and to pop the node
when the activation ends.

• The contents of the control stack are related to paths to the root of the activation tree.
When node n is at the top of control stack, the stack contains the nodes along the path
from n to the root.
The Sco1>e of a Declaration:
A declaration is a syntactic construct that associates information with a name.
Declarations may be explicit, such as:
var i . integer .
or they may be implicit. Example, any variable name starting with l is assumed to denote an
integer.
The portion of the program to which a declaration applies is called the ,;cope of that declaration.

Binding of name..~:
Even if each name is declared once in a program, the same name may denote different
data objects at mn time. "Data object" corresponds to a storage location that holds values.

The term environment refers to a function that maps a name to a storage location .
The term state refers to a function that maps a storage location to the value held there.

environment state

name storage value

When an environment associates storage locations with a name x, we say that xis bound
to s. This association is referred to as a binding of x.

STORAGE ORGANISATION

• The executing target program runs in its own logical address space in which each
program value has a location.
• The management and organization of this logical address space is shared between the
complier, operating system and target machine. The operating system maps the logical
address into physical addresses, which are usually spread throughout memory.

Typical subdivision of r·un-time memory:

Code

Static Data

Stack

~
free memory
t
Heap
• Run-time s.torage come..,; 111 blocks, where a byte ,s the small est unit of addressable
memory Fo ur bytes fo rm a machine word. Mult1 byte obJects are stored in consecuti ve
bytes and given the nddress of firs t byte.
• The storage layout fo r data obJects 1s strongly innuenced by the addressing constraints of
the target machme.
• A character array of length IO needs only enough bytes to hold IO characters, a com piler
may allocate 12 bytes to get alignment, leaving 2 bytes unused
• This unused space due to alignment considerations 1s referred to as padd ing.
• Th e size of some program objects may be known at run time and may be placed 111 an
area called static.
• The dynamic areas used to maximize the utilization of space at run time are stack and
heap.

Activation records :
• Procedure calls and returns are usu.ally managed by a run time stack called the con1rol
stack.
• Each live activation has an activation record on the control stack, with the root of the
activation tree at the bottom, the latter activation has its record at the top of the stack.
• The contents of the activation record vary with the language being implemented. The
diagram below shows the contents of activation record.

I
C
C
C I
C
C
C
• such as those arising from the evaluation of expressions.
Temporary values
• Local data belonging to the procedure whose activation record this is.
• A saved machine status, with information about the state of the machine just before the
call to procedures.
• An access link may be needed to locate data needed by the called procedure but found
elsewhere.
• A control link pointing to the activation record of the caller.
. , . t al I ca l led procedures
• Space fo r the return value of the ca lled fun ct:1011s , tf ,wy Again . no f
return a value, and if ono does, we may prefer to pl ace that value in a regi Ster or
effi ciency. .
• The actu~I parameters used by the ca lling procedure. These are not placed 111 acti vati on
rec0rd but rath er 111 registers, when pof>s ihle, for grea ter effi ciency

STORAG ~ A LLOCATION STKAT~:Gms


The differen t storage alloCfttion strategies arc .
l . Static 11lltwl'ltion - lays out storage for all data obj ects at compile time
2. Stnck allocation - manages the run-time storage as a stack.
3. He~p allocation - allocates and deallocates storage as needed at run time from a data area
kno\.vn as heap.

STATIC ALLOCATION t
• ln static allocation, names are bound to storage as the program is compiled, so there is no i
need for a run-time support package. I
• Since the bindings do not change at run-time, everytime a procedure is activated, its
names are bound to the same storage locations.
• Therefore values of local names are retained across activations of a procedure. That is,
l
I

when control returns to a procedure the values of the locals are the same as they were
'II
I
when control left the last time. t
• From the type of a name, the compiler decides the an10unt of storage for the name and
decides where the activation records go. At compile time, we can fill in the addresses at
which the target code can find the data it operates on.

STACK ALLUCATJON OF SPACE

• All compilers for languages that use procedures, functions or methods as units of user-
defined actions manage at least part of their run-time memory as a stack.
• Each time a procedure is called , space for its local variables is pushed onto a stack, and
when the procedure terminates, that space is popped off the stack.

Calling sequences:
• Procedures called are implemented in what is called as calling sequence, which consists
of code that aIJocates an activation record on the stack and enters information into its
fields.
• A return sequence is similar to code to restore the state of machine so the calling
procedure can continue its execution after the call.
• The code in calling sequence is often divided between the calling procedure ( caller) and
the procedure it calls (callee).
• When designing calling sequences and the layout of activation records, the following
principles are helpful :
► Values communicated between caller and callee are generally placed at the
begi nning of the callee's activation record, so they are as close as possible to the
ca ller's activation record.

tra
► Fixed lengt·h items ttre genc,-,1II I . .
the control I'111 k ' Y Paced 111 th e middl e. Such items typically include
1
It· . • t l e access link, and the mA chine statu s fi elds
► ems w 11ose size may not I k
, •. ' )C nown c1uly enough are r,laced at th e end of the
act1 va t1011 record The 111 . . . .
, · ost common ox11111pl e 1s dyn,1mi call y sized ,may where the
Wl Iue ot 0ne of the a 111, ,,
u , ec s parameters cletcrm,n cs the length of th e array
► .
vve must loCl-'ltc the tot) of. t k
.
. . ·
- -s ac po1111 er .1 11 c1,c ,owdy A common approa ch ,~ to have
it pomt to the end of fixed •lcngth fields m the acti vation record. Fixed-length data
can 1:hen be accessed by fixed offsets, known to the interm ediate-code generator,
relative to the top-of-stack pointer.

•►

T
'''

Parameters and returned values


caller's
activation
d
T --------------------------------
control link
links and saved status
---------------------------- ----
-

.::
'j'
+
caller· s temporaries and local data
!responsibility
Parameters and returned values
callee's ----------------------------- ---
activation control link
record links and saved status
-------------------------------- _. Iop sp
/ callee's
~esponsi¥1ity temporaries and local data

Division of tasks between caller and callee

• The calling sequence and its division between caller and callee are as follows.

► The caller evaluates the actual parameters.


► The caller stores a return address and the old value of top sp into the callee's
activation record. The caller then increments the top sp to the respective
positions.
► The callee saves the register values and other status information.
>"" The callee initializes its local data and begins execution.
• A suitable, corresponding return sequence is:

► The callee places the return value next to the parameters.


► Using the information in the machine-status field, the callee restores rop ~1' and
other registers, and then branches to the return address that the caller placed in
the status field.
, Although top -~1J has been decremented, the caller knows where the return value
is, relative to the current value of top .\p; the caller therefore may use that value.
Variable length data on stack: · f
. . . , 1 f' tly with the allocat1on o
• The run-tune memory management system must dea requen h. h
·1e time but w ic are
space for objects, the sizes of which are not known at tI1e corn pi '
local to a procedure and thus may be allocated on the stack.
.
• The reason to prefer placing o~jects on the stack 1s that we avoi
·d th e expense of garbage
collecting their space.
• The same scheme works for objects of any type if they are local to the procedure called
and have a size that depends on the parameters of the call.

Tactivation control link ______________ ____ ___ +--


record for p
- - ______ pointer to A ____________ __,_t - - - - + - - - ,
pointer to B
pointer to C

array A
arrays of p --------------------------------
array B -~----~
------------ ------ --------------
array C
-::-----~
activation record for
procedure q called by p

+-
arrays of q top
+ ~--------------'i----.~
Access to dynamically allocated an·ays

• Procedure p has three local arrays, whose sizes cannot be determined at compile time.
The storage for these arrays is not part of the activation record for p.
• Access to the data is through two pointers, top and rop-sp. Here the top marks the actual
top of stack; it points the position at which the next activation record will begin.
• The second top-.\J} is used to find local, fixed-length fields of the top activation record.
• The code to reposition top and rop-.sp can be generated at compile time, ·in terms of sizes
that will become known at run time.
HEAP ALLOCATION
Stack allocation strategy cannot be d ·r .
I. The values of local names must ~se_ , .either of the following is possible :
2. A called activation outl . I e ietamed when an activation ends.
ives t 1e caller.

• Heap allocation parcels out pieces f .


or other objects. o contiguous storage, as needed for activation records
• Pieces may be deallocated i d .
n any or er, so over the time the heap will consist of alternate
areas t h at are free and in use.

Position in the Activation records in the heap Remarks


activation tree

s Retained activation
,,,,,,' I s record for r

r' q ( I , 9) control link

-- control link

q(l ,9)

____ control link _____ _

• The record for an activation of procedure r is retained when the activation ends.

• Therefore, the record for the new activation q( I , 9) cannot follow that for s physically.

• If the retained activation record for r is deallocated, there will be free space in the heap
between the activation records for s and q.
-
7 ■ MSDfaoL TA-BLE •
ANAGEMENT

7.1 THE SYMBOL TABLE

A_ s~b~l table is. a data structure used by a compiler to keep track of scope/
b1ndmg information about names. This information is used in the source
program to identify the various program elements, like variables, constants,
procedures, and the labels of statements. The symbol table is searched every
time a name is encountered in the source text. When a new name or new
information about an existing name is discovered, the content of the symbol
table changes. Therefore, a symbol table must have an effici~t mechan~
for accessing the information held in the table as well as for adding new entnes
to the symbol table.
. th . 1 entation data structure for the
For efficiency, our choice of ~ imp em t should be stress a minimal
· ation its conten s .
symbol table and the org~ . th infonnation on existing entnes.
cost when adding new entnes or accessing . :UY
as necessary, then it is more
Also, if the symbol table can grow dynanuc
useful for a compiler.

!,2 IMPLEMENTATION rd that consists of


ted as a reco
can be irnplemen . tion to be saved about
Each entry in a symbol table d endent on the mforma
several fields. These fields are ep
239
ner Design
Comprehensive comp
240
. about a name depends on the usage
· ft rmat10n h ) of
But since the in o I ment identified by t e name , the entries in
the name.(. e on the progra~ e et be uniform . Hence, to keep the SYrnb l
the name 1· ., d will no h · k 0
b O I table recor s . ''ormation about t e name 1s ept 0Utsir1-
the sym e of the tn•• • · fi t' · \4C
ds uniform, sorn inter to this 1n orma 10n 1s stored in th
table recor bol table record, an~ aF~o re 7 1 Here, the information about the
of the sym hown m 1gu · · d • k e
symbol table record, as s the dimension of the_ a?'ay nam~ ~ is ept outside
lower and upper bounds of he ointer to this 1nformat1on 1s stored Within
of the symbol table record, and t p
the symbol table record.

~[:;=:===;====s__jj_-.,{ LB, j
[ a I int _ I
I I I UB,
I I IL----=--1~~;::==LB=======;I 2

I I I UB, \
SYMBOL TABLE
I I
FIGURE 7 .1 A pointer steers the symbol table to remotely stored informa-
tion for the array a.

7.3 ENTERING INFORMATION INTO THE SYMBOL TABLE

Information is entered into the symbol table in various ways. In some cases,
the symbol table record is created by the lexical analyzer as soon as the name
is encountered in the input, and the attributes of the name are entered when
the declarations are processed. But very often, the same name is use~ to
denote different objects, perhaps even in the same block. For example, 10 C
programming, the same name can be used as a variable name and as a member
name of a structure, both in the same block. In such cases, the lexical analyzer
only returns ~e name to the parser, rather than a pointer to the symbol tabl~ I
record_. ~at 1s, a symbol table record is not created by the lexical analyzerd
th stnn
e g itself is returned to the parser, and the symbol table record is create
th
when e name's syntactic role is discovered. \
. compiler oeslgn
242 comPret,ens1ve
ES TO SYMBOL TABLE ORGAttt~
VARIOUS APPROACH ..
7.6 N
TIO .zing the symbol table. These methOds
al methods of organ1 are
There are sever
discussed below.

Linear List
7 .6.1 Th• . is the easiest way to implement a SY_lllbol table. The
A linear bst of records th t ble in the order that they amve. Whenever
new names are added tod e tahe table the table is first searched linearly ora
• to be adde to ' . d •
new name 1s h th or not the name 1s alrea y present m the table.
sequentially _to check w ; t:Cn the record for new na~e is created and added
If the name is not P~~sen ' .tied by the available pomter, as shown in the
to the list at a pos1t1on spec1
Figure 7.3.
name1
info 1
namei

inf~

available ..-

FIGURE 7 .3 A new record is added to the linear list of records.

To retrieve the information about the name, the table is searched


sequentially, starting from the first record in the table. The average number of
comparisons, p, required for search are p = (n + 1)/2 for successful search and
P = n for an unsuccessful search, where n is the number of records in symbol
table. The advantage of this organization is that it takes less space, and additions
to the. tabl~ are simple. This method's disadvantage is that it has a higher
accessing ttme.

7.6.2 Search Trees

A search _tree is a more efficient approach to symbol table organization. We


add two links, left and right, in each record, and these links point to the record
SyrnbolTabl 8
in the search tree. When Management 24l
. ever an
in the tree. If 1t does not . arne is to b
exist th e add d
added at the proper posit. ' en a record ft e ' fi rst the nam .
property of alphabetical a~on i~ the search ; r the new name isc:: searched
name; will, by following a clesfts1b~Iity; that is eel]. This organizati~ahted anhd
'mil I all h e hnk ' a . the as t e
S1 ar Y, t e name access'bl ' precede n narnes accessible fr
c. ll . 1 e fr0 ame i om
order b y 10 owmg the right . m name will c. ll 1 n a1PhabeticaJ ord
1Ink (se p· , io ow na . er.
to enter n names and to make ,n qu e. tgure 7.4) . The ex rne, in alphabet·tea1
greater numbers of records (hi enes is proporti pected time needed
list organization. gher n) this method~:~ to (m + n) lo~n; so for
advantages over linear

left name1 info

left namP-
-z ~-c.0
uu• right
info right

FIGURE 7 .4 The search tree organization approach to a symbol table.

7.6.3 Hash Tables


A hash table is a table of k pointers numbered from zero to k-1 that point to
the symbol table and a record within the symbol table. To enter a name into
symbol table, we find out the hash value of the name by applying ,a suitable
hash function. The hash function maps the name into an integer between
zero and k-1, and using this value as an index in the hash table, we search the
list of the symbol table records that is built on that hash index. If the name is
not present in that list we create a record for name and insert it at the head of
the list. When retrieving the information associated with the ~ame, th~ hash
. d d th the list that was built on this hash
value of the name is first obtame , an en . 7
value is searched for information about the name (Figure .S).

compiler oestQn
c ornprehenstve
2AA

1
-
---- ..
-
-
- info
..-
- nam e
, .
name info
-
name info
--.r---J
..-
- .. '
k-1
Hash Table
FIGURE 7 .5 Hash table method of symbol table organization.

91 ERROR HA~•rtDLIN<i

9.1 ERROR RECOVERY

One
d of the important
fr tasks that a compiler must --r.
~uonn 1s· the detection .
of
an r~cove:Y om err?rs. Recovery from errors is important, because the
compiler will be scannmg and compiling the entire program, perhaps in the
presence of errors; so as many errors as possible need to be detected.
Every phase of a compilation expects the input to be in a particular format,
and whenever that input is not in the required format, an error is returned.
When detecting an error, a compiler scans some of the tokens that are ahead
of the error's point of occurrence. The fewer the number of tokens that must
be scanned ahead of the point of error occurrence, the better the compiler's
error-detection capability. For example, consider the following statement:
ifa= bthenr. =y+z; . .
Th . the above statement will be detected in the syntactic analysis
e error m al zer sees the token "then"; but the first
phase, but not before the syntax an Y
token, itself, is in error. thin that a compiler is supposed to do ~s
After detecting an error, ~he ~~tabl! diagnostic. A good error diagnosnc
to report the error by prod~ClDS •es.
should possess the folloW1D8 properU. rms of the original source program
uld be produced m te . f the source pro-
1. The message sho f some internal representauon ~ with the line
rather than in terms o e should be produced ong
-nft"lnle the messag
gram. F or ~ ·a r , gram
numbers of the source pro · 259
pller oeslgn
~ hanslve corn
cornP•" sy to understan d b Y t h e user
110 o ld be ea .
or message s b specific a nd should localize the p
2. The err r message should ce should read, "xis not declared in ~bl~tl\.
3. The erro le an error o,cssagd laration " nct10Y\
for cxan1p ' . "missing cc . .
r. ,, and not Just, b edu ndant; th at ts, the sam e message h-- .
,un, should not e r . s.lV\J.ld
4 The message .n and again.
. not be produced ag_a1 h ld report errors by generating message .
Therefore, a compiler s orus captured by the compiler can be class;! wd1th
rt' s The erro . 1ue as
the above prope ie · emantic errors. Syntactic errors .are those P?..
·c errors or s . 1 . h b . . . 10ts
either syntactI . the lexical or syntactic ana ys1s p. ase y the compiler
that are detected in ors detected by the compiler. .
Semantic errors are those err

- ~~R~EC~O~V~E~R~V:Jf:!R~OrnM~L~E~X~•~c~A~L..:..P..:..H==-A=-=S:.=E:;...;E~R~R;;_O...;,_,R....;.S______
~2
. d tects an error when it discovers that an input's prefix
The lexical ana1yzer e ft d ·
.fication of any token class. A er etect1ng an error the
does not fi t the speCl • Th' • ·' .
lexical analyzer can invoke an error recovery routme. 1s can entail a vanety
of remedial actions.
The simplest possible error recovery is to skip t~e ~rr~neous characters
until the lexical analyzer finds another token. But this 1s likely to cause the
parser to read a deletion error, which can cause severe difficulties in the syntax-
analysis and remaining phases. One way the parser can help the lexical analyzer
can improve its ability to recover from errors is to make its list of legitimate
tokens (in the current context) available to the error recovery routine. The
error-recovery routine can then decide whether a remaining input's prefix
matches one of these tokens closely enough to be treated as that token.

9.3 RECOVERY FROM SYNTACTIC PHASE ERRORS

A parser detects an err h .


configuration Th L or w en It has no legal move from its current
1
therefore, they. aree caL(b)l a nd LR( 1) parsers use the valid prefix propertY;
• pa e of ann · d an
input that is not a vali'd . ounc1ng an error as soon as they rea
. . contm · · 1·s
earhest tune that a left-t . uation of the previous input's prefix. This
• o-nght par a
van ety of other types of ser can announce an error But there are
parsers th t d O ·
The advantages of us. a not necessarily have this propertY· .
that it reports an error a~n~oa Parser with a valid-prefix-property capabili!Y ~
erroneous output passed t on as possible and 1·t • • • the amount o
o subseque , mirum1zes
nt Phases of the compiler.
Error Handling
panic Mode Recovery 281
panic mode recovery is an
. error rec
k,ind of parsing, because erro overy rnethOd
. r recove that
arsing tee h n1que used. In p . ry depends can be used .
P . an1c mod sornewh tn any
symbols untl 1 a statement d 11. . e recovery at on the typ 0 f
e tntter , a pars d' e
encountered. The parser then d I , such as a se . er tscards input
. . e etes sta k Olicoion
will allow tt to contmue parsin . c entries until . or an end, is
· · g, given th it finds
This method 1s simple to implement a de .8Ynchroniz,,. a.1,g to k an entry that
en on the Input
.
, n tt never .
gets into an infinite l00p..
4 ERROR RECOVERY IN LR PAR
9
~ SING
A systematic
. method for error recovery m . LR . .
stack until a state S with a goto on . parsing 1s to scan down th
a Particular nontenninaI . e
then discard zero or more input symbols til . A Is found, and
;r1
legitimately follow A . The parser then h. h a symbol a Is found that can
1
and resumes normal parsing. s s t e state goto [S, A] on the stack
There might be more than one choice for the nonterminal A N
these would be nonterminals representing maJ· or program pieces,
. · ormally,
such as
statements.
Another method of error recovery that can be implemented is called
"phrase level recovery." Each error entry in the LR parsing table is examined,
and, based on language usage, an appropriate error-recovery procedure is
constructed. For example, to recover from an construct error that starts with
an operator, the error-recovery routine will push an imaginary id onto the
stack and cover it with the appropriate state. While doing this, the error entries
in a particular state that call for a particular reduction on some input symbols
are replaced by that reduction. This has the effect of postponing _the ~rror
detection until one or more reductions are made; but the error will still be
caught before a shift. LR · shown
• 1 tation for an parser 15
A phrase level error-recovery llilP emen
below. The parsing table's grammar is:
. E ➔ E + E I E * E I id . h wn in Table 9.1.
"rl..
1 ne SLR parsing table for the a
bo grammar 1s so
ve
ompiler oesign
henslve
C
comPre
262 . Table for E ➔E+E I E *E I id
TABLE 9. l Parsmg
$
* E

-
id +
--- 1

-
Io S2
S4 Accept ---
-
11
S3

R3 R3 R3 -
-
12
5
-
13 S2
-
6
14 S2
-
S/ R1 S4 /R 1 Rl
15
S/R2 S4/R 2 R2
16

The conflict is resolved by giving higher precedence to * and using left.


associativity, as shown in Table 9.2.

TABLE 9.2 Higher Precedent* and Left-Associativity

id + * $ E

Io S2 1
11
- 12 SJ S4 Accept

R3 R3 R3
13 s2
5
14
~

s2
6
15 ~

16
- Rl
S◄ Rt
--
R2
R2 R2
-
. Error Handn,,
The parsing table with eno . . g 21 3
r routines .
T,A nL 1s shawn .
1'D E 9 .3 Parsing 1' b ltl Table 9 3
a le W 'th l! ,
r--
id +
-
r----
.,.
I •
. . rror flo .
·Utrnt3
- lo s ,.._

,....
1.
S2
E2
el

SJ
el
r-

ti
r- 2 l
:

~ s4 Accept - l
-
12 R3 R3
- R3
R3
-
13 S2 el
E, I
El 5
14 S2 El El
~
El 6
ls R1 RI S4 R1
16 R2 R2 R2 R2

where routine e1 is called fro~ st~tes 10, 13, and /4, which pushes an imaginary
id onto the stack and covers 1t with state 12• The routine e is called from state
2
/ 1, which pushes + onto stack and covers it with state J .
3
For example, if we trace the behavior of the parser described above for the
input id + *id $:
Stack Unspent
Contents Input Moves
$Io id+*id$ shift and enter into state 2
+*id$ reduce by production number 3
$Ioidl2
+*id$ shift and enter into state 3
$Ic,EI1
*id$ call error routine e1
$Ic,EI1+I3
reduce bY production number 3
$IoEI1+/ 3id 1i *id$
(id 12 pushed by e1) 'ft and enter into state 4
*id$ sh1 2
$IoEI1+I3EI5 . d enter into state
id$ shift an 1...a.. l

$/r,E/1 +I3E ls*14 0 duction nUDl~ ·


reduce by P1' - . . ber 2
$/c;E/1+I 3E 15*I4id12 $ roduction num.
reduce by p . . number 1
$JoEJ1+J3E 15*14El6 $ . producnon
reduce by
$
$IoEJ1 +I3EI5 accept
$Ic;EI1 $
r
. . . th«- t,c:-lu,vu,r of the par~~r tc,r the in .
SianUnrly, if w,- unc c Un11pcnt Put td id"'14 \
Stark Input Movea
t'ontrnt1
id iJ*idS shift and enter into
$lo statt 2
id*idS reduce by PToducti0n
$10 idf:1 nurn~ .
id*id$ call error routine t i
$10 1!:I, 2
id*id$ shift and enter into .....
..'(ltt 2
$!0El1+ IJ
(/ pushed by e2)
.l
$/0E/ 1+ l:1id 12 *id$ reduce by production nu
.ft m~r3
$/0E/1+ ~1Els *id$ sh1 and enter into state 4
$/0 H/1+I3EI5*1 4 id$ shift and enter into state 2
$10~71+l3EI5*14id/2 $ reduce by production numbe
3
$!0El 1+ /3EZ5*14El6 $ reduce by production number 2
$l0EI1+/3EI5 $ reduce by production number 1
$/0£11 $ accept

You might also like