Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

CS143 Summer2012

TopDownParsing

Handout09 July2th,2012

HandoutwrittenbyMaggieJohnsonandrevisedbyJulieZelenski.

PossibleApproaches Thesyntaxanalysisphaseofacompilerverifiesthatthesequenceoftokensextractedby thescannerrepresentsavalidsentenceinthegrammaroftheprogramminglanguage. Therearetwomajorparsingapproaches:topdownandbottomup.Intopdownparsing, youstartwiththestartsymbolandapplytheproductionsuntilyouarriveatthedesired string.Inbottomupparsing,youstartwiththestringandreduceittothestartsymbol byapplyingtheproductionsbackwards.Asanexample,letstracethroughthetwo approachesonthissimplegrammarthatrecognizesstringsconsistingofanynumberof asfollowedbyatleastone(andpossiblymore)bs:


S A B > > > AB aA | b | bB

Hereisatopdownparseofaaab. Webeginwiththestartsymbolandateachstep, expandoneoftheremainingnonterminalsbyreplacingitwiththerightsideofoneofits productions.Werepeatuntilonlyterminalsremain.Thetopdownparseproducesa leftmostderivationofthesentence.


S AB aAB aaAB aaaAB aaaB aaab S > AB A > aA A > aA A > aA A > B > b

Abottomupparseworksinreverse.Webeginwiththesentenceofterminalsandeach stepappliesaproductioninreverse,replacingasubstringthatmatchestherightside withthenonterminalontheleft.Wecontinueuntilwehavesubstitutedourwaybackto thestartsymbol.Ifyoureadfromthebottomtotop,thebottomupparseprintsouta rightmostderivationofthesentence.


aaab aaab aaaAb aaAb aAb (insert ) A > A > aA A > aA

Ab AB S

A > aA B > b S > AB

Increatingaparserforacompiler,wenormallyhavetoplacesomerestrictionsonhow weprocesstheinput.Intheaboveexample,itwaseasyforustoseewhichproductions wereappropriatebecausewecouldseetheentirestringaaab.Inacompilersparser, however,wedonthavelongdistancevision.Weareusuallylimitedtojustonesymbol oflookahead.Thelookaheadsymbolisthenextsymbolcomingupintheinput.This restrictioncertainlymakestheparsingmorechallenging.Usingthesamegrammarfrom above,iftheparserseesonlyasinglebintheinputanditcannotlookaheadanyfurther thanthesymbolweareon,itcantknowwhethertousetheproductionB >borB > bB. Backtracking Onesolutiontoparsingwouldbetoimplementbacktracking.Basedontheinformation theparsercurrentlyhasabouttheinput,adecisionismadetogowithoneparticular production.Ifthischoiceleadstoadeadend,theparserwouldhavetobacktracktothat decisionpoint,movingbackwardsthroughtheinput,andstartagainmakingadifferent choiceandsoonuntiliteitherfoundtheproductionthatwastheappropriateoneorran outofchoices.Forexample,considerthissimplegrammar:
S A > > bab | bA d | cA

Letsfollowparsingtheinputbcd. Inthetracebelow,thecolumnontheleftwillbethe expansionthusfar,themiddleistheremaininginput,andtherightistheaction attemptedateachstep:


S bab ab S bA A d A cA A d bcd bcd cd bcd bcd cd cd cd cd d d Try S > bab match b dead-end, backtrack Try S > bA match b Try A > d dead-end, backtrack Try A > cA match c Try A > d match d Success!

Asyoucansee,eachtimewehitadeadend,webackuptothelastdecisionpoint, unmakethatdecisionandtryanotheralternative.Ifallalternativeshavebeen exhausted,webackuptotheprecedingdecisionpointandsoon.Thiscontinuesuntil

weeitherfindaworkingparseorhaveexhaustivelytriedallcombinationswithout success. Anumberofauthorshavedescribedbacktrackingparsers;theappealisthattheycanbe usedforavarietyofgrammarswithoutrequiringthemtofitanyspecificform.Fora smallgrammarsuchasabove,abacktrackingapproachmaybetractable,butmost programminglanguagegrammarshavedozensofnonterminalseachwithseveral optionsandtheresultingcombinatorialexplosionmakesathisapproachveryslowand impractical.Wewillinsteadlookatwaystoparseviaefficientmethodsthathave restrictionsabouttheformofthegrammar,butusuallythoserequirementsarenotso onerousthatwecannotrearrangeaprogramminglanguagegrammartomeetthem. TopDownPredictiveParsing First,wewillfocusinontopdownparsing.Wewilllookattwodifferentwaysto implementanonbacktrackingtopdownparsercalledapredictiveparser.Apredictive parserischaracterizedbyitsabilitytochoosetheproductiontoapplysolelyonthebasis ofthenextinputsymbolandthecurrentnonterminalbeingprocessed.Toenablethis, thegrammarmusttakeaparticularform.WecallsuchagrammarLL(1).Thefirst"L" meanswescantheinputfromlefttoright;thesecond"L"meanswecreatealeftmost derivation;andthe1meansoneinputsymboloflookahead.Informally,anLL(1)hasno leftrecursiveproductionsandhasbeenleftfactored.Notethatthesearenecessary conditionsforLL(1)butnotsufficient,i.e.,thereexistgrammarswithnoleftrecursionor commonprefixesthatarenotLL(1).Notealsothatthereexistmanygrammarsthat cannotbemodifiedtobecomeLL(1).Insuchcases,anotherparsingtechniquemustbe employed,orspecialrulesmustbeembeddedintothepredictiveparser. RecursiveDescent Thefirsttechniqueforimplementingapredictiveparseriscalledrecursivedescent.A recursivedescentparserconsistsofseveralsmallfunctions,oneforeachnonterminalin thegrammar.Asweparseasentence,wecallthefunctionsthatcorrespondtotheleft sidenonterminaloftheproductionsweareapplying.Iftheseproductionsarerecursive, weendupcallingthefunctionsrecursively. LetsstartbyexaminingsomeproductionsfromagrammarforasimplePascallike programminglanguage.Inthisprogramminglanguage,allfunctionsareprecededby thereservedwordFUNC:
program function_list function > > > function_list function_list function | function FUNC identifier ( parameter_list ) statements

WhatmighttheCfunctionthatisresponsibleforparsingafunctiondefinitionlooklike? ItexpectstofirstfindthetokenFUNC,thenitexpectsanidentifier(thenameofthe function),followedbyanopeningparenthesis,andsoon.Asitpullseachtokenfrom theparser,itmustensurethatitmatchestheexpected,andifnot,willhaltwithanerror. Foreachnonterminal,thisfunctioncallstheassociatedfunctiontohandleitspartofthe parsing.Checkthisout:


void ParseFunction() { if (lookahead != T_FUNC) { // anything not FUNC here is wrong printf("syntax error \n"); exit(0); } else lookahead = yylex(); // global 'lookahead' holds next token ParseIdentifier(); if (lookahead != T_LPAREN) { printf("syntax error \n"); exit(0); } else lookahead = yylex(); ParseParameterList(); if (lookahead!= T_RPAREN) { printf("syntax error \n"); exit(0); } else lookahead = yylex(); ParseStatements(); }

Tomakethingsalittlecleaner,letsintroduceautilityfunctionthatcanbeusedtoverify thatthenexttokeniswhatisexpectedandwillerrorandexitotherwise.Wewillneed thisagainandagaininwritingtheparsingroutines.


void MatchToken(int expected) { if (lookahead != expected) { printf("syntax error, expected %d, got %d\n", expected,lookahead); exit(0); } else // if match, consume token and move on lookahead = yylex(); }

NowwecantidyuptheParseFunctionroutineandmakeitclearerwhatitdoes:
void ParseFunction() { MatchToken(T_FUNC); ParseIdentifier(); MatchToken(T_LPAREN); ParseParameterList(); MatchToken(T_RPAREN); ParseStatements(); }

Thefollowingdiagramillustrateshowtheparsetreeisbuilt:
program function_list function identifier parameter_list by the call to ParseParameterList statements by the call to ParseStatements

FUNC call to

ParseIdentifier

Hereistheproductionforanifstatementinthislanguage:
if_statement > IF expression THEN statement ENDIF | IF expression THEN statement ELSE statement ENDIF

Topreparethisgrammarforrecursivedescent,wemustleftfactortosharethecommon parts:
if_statement > IF expression THEN statement close_if close_if > ENDIF | ELSE statement ENDIF

Now,letslookattherecursivedescentfunctionstoparseanifstatement:
void ParseIfStatement() { MatchToken(T_IF); ParseExpression(); MatchToken(T_THEN); ParseStatement(); ParseCloseIf(); } void ParseCloseIf() { if (lookahead == T_ENDIF) lookahead = yylex(); else { MatchToken(T_ELSE); ParseStatement(); MatchToken(T_ENDIF); } }

// if we immediately find ENDIF // predict close_if -> ENDIF // otherwise we look for ELSE // predict close_if -> ELSE stmt ENDIF

Whenparsingtheclosingportionoftheif,wehavetodecidewhichofthetworight handsideoptionstoexpand.Inthiscase,itisnttoodifficult.Wetrytomatchthefirst tokenagainENDIFandonnonmatch,wetrytomatchtheELSEclauseandifthatdoesnt match,itwillreportanerror.

Navigatingthroughtwochoicesseemedsimpleenough,however,whathappenswhere wehavemanyalternativesontherightside?
statement > assg_statement | return_statement | print_statement | null_statement | if_statement | while_statement | block_of_statements

WhenimplementingtheParseStatementfunction,howarewegoingtobeableto determinewhichofthesevenoptionstomatchforanygiveninput?Remember,weare tryingtodothiswithoutbacktracking,andjustonetokenoflookahead,sowehavetobe abletomakeimmediatedecisionwithminimalinformationthiscanbeachallenge! Tounderstandhowtorecognizeandsolveproblem,weneedadefinition: Thefirstsetofasequenceofsymbolsu,writtenasFirst(u)isthesetofterminals whichstartthesequencesofsymbolsderivablefromu.Abitmoreformally, considerallstringsderivablefromu.Ifu=>*v,wherevbeginswithsome terminal,thatterminalisinFirst(u). Ifu=>*,then isinFirst(u). Informally,thefirstsetofasequenceisalistofallthepossibleterminalsthatcouldstart astringderivedfromthatsequence.Wewillworkanexampleofcalculatingthefirst setsabitlater.Fornow,justkeepinmindtheintuitivemeaning.Findingourlookahead tokeninoneofthefirstsetsofthepossibleexpansionstellsusthatisthepathtofollow. Givenaproductionwithanumberofalternatives:A > u1 | u2 | ...,wecanwritea recursivedescentroutineonlyifallthesetsFirst(ui)aredisjoint.Thegeneralformof sucharoutinewouldbe:
void ParseA() { // case below not quite legal C, need to list symbols individually switch (lookahead) { case First(u1): // predict production A -> u1 /* code to recognize u1 */ return; case First(u2): // predict production A -> u2 /* code to recognize u2 */ return; .... default: printf("syntax error \n"); exit(0); } }

Ifthefirstsetsofthevariousproductionsforanonterminalarenotdisjoint,apredictive parserdoesn'tknowwhichchoicetomake.Wewouldeitherneedtorewritethe grammaroruseadifferentparsingtechniqueforthisnonterminal.Forprogramming languages,itisusuallypossibletorestructuretheproductionsorembedcertainrules intotheparsertoresolveconflicts,butthisconstraintisoneoftheweaknessesofthetop downnonbacktrackingapproach. Itisabittrickierifthenonterminalwearetryingtorecognizeisnullable.Anonterminal AisnullableifthereisaderivationofAthatresultsin (i.e.,thatnonterminalwould completelydisappearintheparsestring)i.e., First(A).InthiscaseAcouldbereplaced bynothingandthenexttokenwouldbethefirsttokenofthesymbolfollowing Ainthe sentencebeingparsed.ThusifAisnullable,ourpredictiveparseralsoneedstoconsider thepossibilitythatthepathtochooseistheonecorrespondingto A=>*.Todealwith thiswedefinethefollowing: ThefollowsetofanonterminalAisthesetofterminalsymbolsthatcanappear immediatelytotherightofAinavalidsentence.Abitmoreformally,forevery validsentenceS =>* uAv,wherevbeginswithsometerminal,andthatterminalis inFollow(A). Informally,youcanthinkaboutthefollowsetlikethis:Acanappearinvariousplaces withinavalidsentence.Thefollowsetdescribeswhatterminalscouldhavefollowed thesententialformthatwasexpandedfromA.Wewilldetailhowtocalculatethefollow setabitlater.Fornow,realizefollowsetsareusefulbecausetheydefinetheright contextconsistentwithagivennonterminalandprovidethelookaheadthatmightsignal anullablenonterminalshouldbeexpandedto. Withthesetwodefinitions,wecannowgeneralizehowtohandle A > u1 | u2 | ...,ina recursivedescentparser.Inallsituations,weneedacasetohandleeachmemberin First(ui).Inadditionifthereisaderivationfromanyuithatcouldyield(i.e.ifitis nullable)thenwealsoneedtohandlethemembersinFollow(A).

void ParseA() { switch (lookahead) { case First(u1): /* code to recognize u1 */ return; case First(u2): /* code to recognize u2 */ return; ... case Follow(A): // predict production A->epsilon if A is nullable /* usually do nothing here */ default: printf("syntax error \n"); exit(0); } }

Whataboutleftrecursiveproductions?Nowweseewhythesearesuchaproblemina predictiveparser.Considerthisleftrecursiveproductionthatmatchesalistofoneor morefunctions.


function_list function > > function_list function | function FUNC identifier ( parameter_list ) statement

void ParseFunctionList() { ParseFunctionList(); ParseFunction(); }

Suchaproductionwillsendarecursivedescentparserintoaninfiniteloop!Weneedto removetheleftrecursioninordertobeabletowritetheparsingfunctionfora function_list.


function_list > function_list function | function

becomes
function_list > function function_list | function

thenwemustleftfactorthecommonparts
function_list > more_functions > function more_functions function more_functions |

Andnowtheparsingfunctionlookslikethis:

void ParseFunctionList() { ParseFunction(); ParseMoreFunctions(); // may be empty (i.e. expand to epsilon) }

Computingfirstandfollow Thesearethealgorithmsusedtocomputethefirstandfollowsets: Calculatingfirstsets.TocalculateFirst(u)whereuhastheformX1X2...Xn,dothe following: a) IfX1isaterminal,addX1toFirst(u)andyou'refinished. b) ElseX1isanonterminal,soaddFirst(X1) - toFirst(u). a. IfX1isanullablenonterminal,i.e.,X1=>*,addFirst(X2) - toFirst(u). Furthermore,ifX2canalsogoto,thenaddFirst(X3)- andsoon,throughallXn untilthefirstnonnullablesymbolisencountered. b. IfX1X2...Xn =>*,addtothefirstset. Calculatingfollowsets.Foreachnonterminalinthegrammar,dothefollowing: 1. PlaceEOFinFollow(S)whereSisthestartsymbolandEOFistheinput'sright endmarker.Theendmarkermightbeendoffile,newline,oraspecialsymbol, whateveristheexpectedendofinputindicationforthisgrammar.Wewill typicallyuse$astheendmarker. 2. ForeveryproductionA > uBvwhereuandvareanystringofgrammarsymbols andBisanonterminal,everythinginFirst(v)exceptisplacedinFollow(B). 3. ForeveryproductionA > uB,oraproductionA > uBvwhereFirst(v)contains (i.e.visnullable),theneverythinginFollow(A) isaddedtoFollow(B). Hereisacompleteexampleoffirstandfollowsetcomputation,startingwiththis grammar:
S A B C > > > > AB Ca | BaAC | c b|

NoticewehavealeftrecursiveproductionthatmustbefixedifwearetouseLL(1) parsing:
B > BaAC | c

becomes

B B'

> cB' > aACB' |

Thenewgrammaris:
S A B B' C > > > > > AB Ca | cB' aACB' | b|

Ithelpstofirstcomputethenullableset(i.e.,thosenonterminalsX thatX=>* ),since youneedtorefertothenullablestatusofvariousnonterminalswhencomputingthefirst andfollowsets:


Nullable(G) = {A B' C}

Thefirstsetsforeachnonterminalare:
First(C) = {b } First(B') = {a } First(B) = {c} First(A) = {b a } StartwithFirst(C) - ,add a

(sinceCisnullable)and (sinceAitselfisnullable)

First(S) = {b a c} StartwithFirst(A) - ,add First(B) (sinceAisnullable).Wedontadd(sinceS itselfisnotnullableAcangoaway,butBcannot)

Itisusuallyconvenienttocomputethefirstsetsforthenonterminalsthatappeartoward thebottomoftheparsetreeandworkyourwayupwardsincethenonterminalstoward thetopmayneedtoincorporatethefirstsetsoftheterminalsthatappearbeneaththem inthetree. Tocomputethefollowsets,takeeachnonterminalandgothroughalltherightside productionsthatthenonterminalisin, matchingtothestepsgivenearlier:

Follow(S) = {$} S doesntappearintherighthandsideofanyproductions.Weput $inthe followsetbecauseSisthestartsymbol. Follow(B) = {$} B appearsontherighthandsideoftheS > AB production.Itsfollowsetisthe sameasS. Follow(B') = {$} B' appearsontherighthandsideoftwoproductions.TheB' > aACB' productiontellsusitsfollowsetincludesthefollowsetofB',whichis tautological.FromB > cB', welearnitsfollowsetisthesameasB. Follow(C) = {a $} C appearsintherighthandsideoftwoproductions.TheproductionA > Ca tellsusaisinthefollowset.FromB' > aACB' , weaddthe First(B')whichis justaagain.Because B' isnullable,wemustalsoadd Follow(B') whichis$. Follow(A) = {c b a $} A appearsintherighthandsideoftwoproductions.From S > AB weadd First(B) whichisjust c. B isnotnullable. From B' > aACB' , weadd First(C) whichis b. SinceCisnullable,sowealsoinclude First(B') whichis a. B'isalso nullable,soweinclude Follow(B') whichadds $.

Itcanbeconvenienttocomputethefollowssetsforthenonterminalsthatappeartoward thetopoftheparsetreeandworkyourwaydown,butsometimesyouhavetocircle aroundcomputingthefollowsetsofothernonterminalsinordertocompletetheone youreon. Thecalculationofthefirstandfollowsetsfollowmechanicalalgorithms,butitisvery easytogettrippedupinthedetailsandmakemistakesevenwhenyouknowtherules. Becareful! TabledrivenLL(1)Parsing Inarecursivedescentparser,theproductioninformationisembeddedintheindividual parsefunctionsforeachnonterminalandtheruntimeexecutionstackiskeepingtrackof ourprogressthroughtheparse.Thereisanothermethodforimplementingapredictive parserthatusesatabletostorethatproductionalongwithanexplicitstacktokeeptrack ofwhereweareintheparse. Thisgrammarforadd/multiplyexpressionsisalreadysetuptohandleprecedenceand associativity:
E T F > > > E+T|T T*F|F (E) | int

Afterremovalofleftrecursion,weget:
E E' T T' F > > > > > TE' + TE' | FT' * FT' | (E) | int

Onewaytoillustratetheprocessistostudysometransitiongraphsthatrepresentthe grammar:

Apredictiveparserbehavesasfollows.Letsassumetheinputstringis3 + 4 * 5. ParsingbeginsinthestartstateofthesymbolEandmovestothenextstate.This transitionismarkedwithaT,whichsendsustothestartstateforT.Thisinturn,sends ustothestartstateforF.Fhasonlyterminals,sowereadatokenfromtheinputstring. Itmusteitherbeanopenparenthesisoranintegerinorderforthisparsetobevalid.We consumetheintegertoken,andthuswehavehitafinalstateinthe Ftransitiondiagram, sowereturntowherewecamefromwhichistheTdiagram;wehavejustfinished processingtheFnonterminal.WecontinuewithT',andgotothatstartstate.The currentlookaheadis+whichdoesntmatchthe*requiredbythefirstproduction,but+ isinthefollowsetforT'sowematchthesecondproductionwhichallowsT'todisappear entirely.WefinishT'andreturntoT,wherewearealsoinafinalstate.WereturntotheE diagramwherewehavejustfinishedprocessingtheT.WemoveontoE',andsoon. Atabledrivenpredictiveparserusesastacktostoretheproductionstowhichitmust return.Aparsingtablestorestheactionstheparsershouldtakebasedontheinput tokenandwhatvalueisontopofthestack.$istheendofinputsymbol.

Input/ Top of parse stack E E' T T' F

int E> TE'

( E> TE'

E'> +TE' T> FT' T'> F> int T'> *FT' F> (E) T> FT'

E'>

E'>

T'>

T'>

Tracing Hereishowapredictiveparserworks.Wepushthestartsymbolonthestackandread thefirstinputtoken.Astheparserworksthroughtheinput,therearethefollowing possibilitiesforthetopstacksymbolXandtheinputtokenausingtableM: 1. If X = aanda=endofinput($),parserhaltsandparsecompletedsuccessfully. 2. IfX = a anda!=$,successfulmatch,popXandadvancetonextinputtoken. Thisiscalledamatchaction. 3. IfX != a andXisanonterminal,popXandconsulttableatM[X,a]toseewhich productionapplies,pushrightsideofproductiononstack.Thisiscalleda predictaction. 4. Ifnoneoftheprecedingcasesappliesorthetableentryfromstep3isblank, therehasbeenaparseerror. Hereisanexampleparseofthestringint + int * int:
PARSESTACK REMAININGINPUT PARSERACTION

E$

int + int * int$ Predict E > TE', pop E from stack, push TE', no change in input int + int * int$ Predict T> FT' int + int * int$ Predict F> int int + int * int$ Match int, pop from stack, move ahead in input + int * int$ Predict T'> + int * int$ Predict E'> +TE' + int * int$ Match +, pop int * int$ Predict T>FT' int * int$ Predict F> int

TE'$ FT'E$ intT'E'$

T'E'$ E'$ +TE'$ TE'$ FT'E'$

intT'E'$ T'E'$ *FT'E'$ FT'E'$ intT'E'$ T'E'$ E'$ $

int * int$ Match int, pop * int$ Predict T'> *FT' * int$ Match *, pop int$ Predict F> int int$ Match int, pop $ Predict T'> $ Predict E'> $ Match $, pop, success!

Suppose,instead,thatweweretryingtoparsetheinput+$.Thefirststepoftheparse wouldgiveanerrorbecausethereisnoentryatM[E, +]. ConstructingTheParseTable Thenexttaskistofigureouthowwebuiltthetable.Theconstructionofthetableis somewhatinvolvedandtedious(theperfecttaskforacomputer,buterrorpronefor humans).Thefirstthingweneedtodoiscomputethefirstandfollowsetsforthe grammar:


E E' T T' F > > > > > TE' + TE' | FT' * FT' | (E) | int

First(E) = First(T) = First(F) = { ( int } First(T') = { * } First(E') = { + } Follow(E) = Follow(E') { $ ) } Follow(T) = Follow(T') = { + $ ) } Follow(F) = { * + $ ) }

Oncewehavethefirstandfollowsets,webuildatableMwiththeleftmostcolumn labeledwithallthenonterminalsinthegrammar,andthetoprowlabeledwithallthe terminalsinthegrammar,alongwith$.Thefollowingalgorithmfillsinthetablecells: 1. ForeachproductionA > uofthegrammar,dosteps2and3 2. ForeachterminalainFirst(u),addA > utoM[A,a] 3. If inFirst(u),(i.e.Aisnullable)addA > utoM[A,b]foreachterminalbin Follow(A),If inFirst(u),and$isinFollow(A),addA > utoM[A,$] 4. Allundefinedentriesareerrors TheconceptusedhereistoconsideraproductionA > uwithainFirst(u).Theparser shouldexpandA to uwhenthecurrentinputsymbolisa.Itsalittletrickierwhenu=

oru=>*.Inthiscase,weshouldexpandA to uifthecurrentinputsymbolisin Follow(A),orifthe$attheendoftheinputhasbeenreached,and$isinFollow(A). Iftheprocedureevertriestofillinanentryofthetablethatalreadyhasanonerror entry,theprocedurefailsthegrammarisnotLL(1). LL(1)Grammars:Properties Thesepredictivetopdowntechniques(eitherrecursivedescentortabledriven)require agrammarthatisLL(1).OnefullygeneralwaytodetermineifagrammarisLL(1)isto buildthetableandseeifyouhaveconflicts.Insomecases,youwillbeabletodetermine thatagrammarisorisn'tLL(1)viaashortcut(suchasidentifyingobviousleftfactors). TogiveaformalstatementofwhatisrequiredforagrammartobeLL(1): o Noambiguity o Noleftrecursion o AgrammarGisLL(1)iffwheneverA > u | varetwodistinctproductionsofG,the followingconditionshold: o fornoterminaladobothu andvderivestringsbeginningwitha(i.e.,first setsaredisjoint) o atmostoneofu andvcanderivetheemptystring o ifv=>*thenudoesnotderiveanystringbeginningwithaterminalin Follow(A) (i.e.,firstandfollowmustbedisjointifnullable) AllofthistranslatesintuitivelythatwhentryingtorecognizeA,theparsermustbeable toexaminejustoneinputsymboloflookaheadanduniquelydeterminewhich productiontouse. ErrorDetectionandRecovery Afewgeneralprinciplesapplytoerrorsfoundregardlessofparsingtechniquebeing used: o Aparsershouldtrytodeterminethatanerrorhasoccurredassoonaspossible. Waitingtoolongbeforedeclaringanerrorcancausetheparsertolosetheactual locationoftheerror. o Asuitableandcomprehensivemessageshouldbereported.Missingsemicolon online36ishelpful,unabletoshiftinstate425isnot. o Afteranerrorhasoccurred,theparsermustpickareasonableplacetoresumethe parse.Ratherthangivingupatthefirstproblem,aparsershouldalwaystryto parseasmuchofthecodeaspossibleinordertofindasmanyrealerrorsas possibleduringasinglerun.

o Aparsershouldavoidcascadingerrors,whichiswhenoneerrorgeneratesa lengthysequenceofspuriouserrormessages. Recognizingthattheinputisnotsyntacticallyvalidcanberelativelystraightforward.An errorisdetectedinpredictiveparsingwhentheterminalontopofthestackdoesnot matchthenextinputsymbolorwhennonterminalAisontopofthestack,aisthenext inputsymbolandtheparsingtableentryM[A,a]isempty. Decidinghowtohandletheerrorisbitmorecomplicated.Byinsertingspecificerror actionsintotheemptyslotsofthetable,youcandeterminehowapredictiveparserwill handleagivenerrorcondition.Attheleast,youcanprovideapreciseerrormessagethat describesthemismatchbetweenexpectedandfound. Recoveringfromerrorsandbeingabletoresumeandsuccessfullyparseismoredifficult. Theentirecompilationcouldbeabortedonthefirsterror,butmostuserswouldliketo findoutmorethanoneerrorpercompilation.Theproblemishowtofixtheerrorin somewaytoallowparsingtocontinue. Manyerrorsarerelativelyminorandinvolvesyntacticviolationsforwhichtheparser hasacorrectionthatitbelievesislikelytobewhattheprogrammerintended.For example,amissingsemicolonattheendofthelineoramisspelledkeywordcanusually berecognized.Formanyminorerrors,theparsercan"fix"theprogrambyguessingat whatwasintendedandreportingawarning,butallowingcompilationtoproceed unhindered.Theparsermightskipwhatappearstobeanerroneoustokenintheinput orinsertanecessary,butmissing,tokenorchangeatokenintotheoneexpected (substitutingBEGINforBGEIN).Formoremajororcomplexerrors,theparsermayhave noreliablecorrection.Theparserwillattempttocontinuebutwillprobablyhavetoskip overpartoftheinputortakesomeotherexceptionalactiontodoso. Panicmodeerrorrecoveryisasimpletechniquethatjustbailsoutofthecurrent construct,lookingforasafesymbolatwhichtorestartparsing.Theparserjustdiscards inputtokensuntilitfindswhatiscalledasynchronizingtoken.Thesetofsynchronizing tokensarethosethatwebelieveconfirmtheendoftheinvalidstatementandallowusto pickupatthenextpieceofcode.ForanonterminalA,wecouldplaceallthesymbolsin Follow(A)intoitssynchronizingset.IfAisthenonterminalforavariabledeclarationand thegarbledinputissomethinglikeduoble d;theparsermightskipaheadtothesemi colonandactasthoughthedeclarationdidntexist.Thiswillsurelycausesomemore cascadingerrorswhenthevariableislaterused,butitmightgetthroughthetrouble spot.Wecouldalsousethesymbolsin First(A) asasynchronizingsetforrestartingthe parseofA.Thiswouldallowinputjunk double d;toparseasavalidvariable declaration.

Bibliography A. Aho, R. Sethi, J. Ullman, Compilers: Principles, Techniques, and Tools. Reading, MA: Addison-Wesley, 1986. J.P. Bennett, Introduction to Compiling Techniques. Berkshire, England: McGraw-Hill, 1990. D. Cohen, Introduction to Computer Theory. New York: Wiley, 1986. C. Fischer, R. LeBlanc, Crafting a Compiler. Menlo Park, CA: Benjamin/Cummings, 1988. K. Loudon, Compiler Construction. Boston, MA: PWS, 1997.

You might also like