CD Unit - 1

You might also like

Download as pdf
Download as pdf
You are on page 1of 38
UNIT __ INTRODUCTION To A COMPILER DESIGN AND la, LEXICAL ANALYSIS SIA GROUP PART-A SHORT QUESTIONS WITH SOLUTIONS Qt. Whats language processing system? Answer : Model Papers, 213) Language processing system isa system that translates the source language which is taken a input into machine language. This translation can be done by dividing the source file into modules. These modules are as follows, 1. Preprocessor 2. Compiler 3. Assembler 4__Linker/loader, Q2. Define compiler and interpreter. Answer : Model Papert, 13) Compiler, A compiler is a program that converts a source program into a target program. Source Target xt Comptes | progam Figure Interpreter An interpreter is also a program that reads the source program and exccutes the program line by line Output Figure Q3. "List the algebric identities of regular expression. Answer : ‘Two regular expressions 4 and B are said to be equivalent if both of th gular expression. The identity rules of regula represent the same set of s ‘pressions are given below. Thi SPECTRUM ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS —— SIA GROUP {73 1.2 COMPILER DESIGN [JNTU-ANANTAPURI eoRR, esyraes RAR SOR ert Rs (Ry tetRe Re +R, RAR RARER, + B)R,=AR,+ BR, and RUA + B)= RATRB (4+ B)t= (at Bt = Uae + BEY «By 4. Describe the following sets by regular expressions, (a) (101) (©) {abba} (e) (01, 10) (a) {@, ab). Answer : (@) oy Regular expression of the set {101} Cleary, the language denoted by the required regular expression accepts only 101 So, regular expression is 101 (b) abba} Regular expression ofthe set {abba} Cleary, the language denoted by the required regular expression accepts only abba. So, regular expression is abba. (©) (01,10) Regular expression forthe set {01, 10} It is clear that only strings O1 and 10 are accepted. 2. The regular expression is OL, 10. () taaby Regular expression for the set (4, ab} Its clear that only strings a and ab are accepted 2 The rogular expression is a, ah G5. Prove (a +b)" =a" (ba")*. Answer : Consider 81.5 = a*(hut)* Now, L(R.H-5)= Lfa*(ba*)*] Reande +R, FRR TARY See ee ee eee Model Papers, 10) = Ua*) Liha’ UR, Re] LAR.) LCR LLR*} = (LERD*] = Mat) [L(ba") = Lat] (Lebytta*))* tea, aa, aaa,....} {O1E,a, aa, aaa,....}}* = t€.a,aa,aaa,...-} tb, ba, baa, baad... }* = t€,a,aa, aaa...) 46. b, ba, bb, baba, baad...) = tea, bob, ba, aa, bb, <3 = La +b)") = LHS) Hence, LS ~ RILSis proved Look for the SUA GROUP Loco {¥ on the TITLE COVER before you buy UNIT-1 Introduction to Compiler Design and Lexical Analysis, 13 ‘G6__List the phases of compiler. Answer + Model Papert, Qs) ‘The six phases of compiler ate, Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator, Q7.. Differentiate between single pass and multipass compiler. Answer : Model Papert, 21(0) Single Pass Compiler ‘Assingle pass compiler scans the entire source Multipass Compiler ‘A multipass compiler seans the source code several times, code only once, Execution time of the code for a single pass Execution time of the eode for a multipass translator is more, translator is less Debugging of the translated code is difficult. Debugging of the translated code is easy. Memory requirement is more to design a single Memory requirement is relatively less. pass compiler. ‘The generated code is less efficiem, ‘The generated code is more efficient ing back to the previously read source Backtracking to previously scanned source code is allowed. Program is not allowed. Itis also called a narrow compiler. It is also called a wide compiler. Pascal and C uses a single pass compiler, Java requires a multipass compiler. Q8, Write the transition diagram for an unsigned integer. Answer The transition diagram for an unsigned integer is as follows, Start © ‘The above transition diagram is a set of ive tuple, Mm 1O.83.45 0, @- Set of states = = (digit, other} 4, = Star state = q, f= Final state = g, SPECTRUM ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS SIA GROUP {3 14 COMPILER DESIGN [JNTU-ANANTAPUR] Q9. Discuss the regular definitions for the language construct. Answer : Model Papert, 1%) Regular Definition Let °E* denotes an alphabet of basic symbols then the regular definition is defined as, aor, Where. <1 230m 1. ‘a represents a new symbol which do not include symbols in alphabet, © and itis different from other a's 2. “re; represents the regular expression over © U (a, 2a) ‘An example of regular definition for °c’ idemifies (c identifiers are strings of letters, digits and underscores) is shown below. Alphabet > A/BI..../Zialb../2/ Number 041/.../9/ identifier —> Alphabet_(Alphabet_/digt)* ‘Gi0. Define the following terms, (a) Strings (b) Comments: {c) Sequences. Answer : () Strings An alphabet or character is any non empty finite set of symbols. Letters and characters are examples of symbols. The set (0.1) is an example of binary alphabet. A finite sequence of symbols drawn from an alphabet isa string. For example if the alphabet, Y= {0, 1} then string, s drawn from this alphabet are 01100, 1110. The length ofa string is written a isis the number of symbols ins. The string 01100 is a string of length five, The string of length zero i called an empty string. denoted by €. (b) Comments. ‘The comments in *C* programming language are written between delimiters /* and */. The comments can include any ‘number of characters ic. alphabets, digits and special symbols. The regular definition for comments in °C" is, Digits + [09] Letters + [A-Za-z] Splsymbols > */@- Stet — [letter + digit + splsymbol]* Comment /*stnt*/ ‘The NFA for this regular expression is N'=(Q. 3, 8.4. F) Where, @ > 12345) English alphabets. digits and special symbols Sis given by the following transition diagram letter. digit, special symbol O+O-G+O+-O Figure (©) Sequences [A sequence is list of characters that follows a specific/paticular onde. Look for the SIA GROUP Loco {Y} on the TITLE COVER before you buy UNIT-1_ Introduction to Compiler Design and Lexical Analysis 1S PART-B ESSAY QUESTIONS WITH SOLUTIONS 1.1 INTRODUCTION 4.14. Language Processors Q11. What are the basic functions of a language translator? Answer : Mode Papert, a2) Translator ‘A anslato isa system which conversa language from one form to another form. Need for Translator Language conversion is needed from one form to another form whenever two different systems want to communicate. It looks very absurd that communication between two different systems which can understand their own language. This is possible only when a mediator who knows both the languages and their equivalent translations. e., if a person “4” knows only English and person °B" knows only marathi and 4 and 8 want to communicate they can use actions and expressions to a certain extent, however they need a translator from English to marathi and vice versa tg: enn Perma Tt ‘The translator in the above figure isa two-way translator. However, in ease of computer and human interaction a one-way translator is used. Human beings can leam and understand programming languages like C, C++ etc., however computer systems, ‘can understand only 0s and 1 that is binary language. Hence, a translator can be defined as, A program that takes a source program written in a source language and generates or produces an equivalent program in ‘machine understandable language known as object language. ‘Types of Translator A translator can be of two types. 1 Compilers 2. Assemblers, ‘Transbtor ‘Assembler Compiler Figure Basie Functions of a Translator 1. A translator should convert a program from one form to another form, 2 It should not change the meaning ofthe source program while converting, 3. It should convert the source program in an easy and understandable form tothe computer 4. The speed with which a translator converts the source program should he same or atleast match the computer's speed 3. It should not only translate but also locate, repair to some extent and report errors to the programmers. This error handling should be user friendly, so that the user can do modifications easily. SPECTROM ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS ——— SIA GROUP {3 1.6 COMPILER DESIGN [JNTU-ANANTAPUR] Q12. Write the chief difference between compiler and interpreter. Dee.A3!Jan-14, Set'3, (0) oR Differentiate between compiler and interpreter. Now. Set.2, (0) Answer : Compiler ‘A.compiler is a program that converts a source program into a target program. Interpreter An interpreter is also a program that reads the source program and executes the program line by line. sauce — ous aaa Teale] tar ope Figure: Compiler and an interpreter Difference between Compiler and Interpreter & Compiler converts or translates the program from one form of language to another form. An interpreter reads the line, checks the errors if any and reports it to the user. Ifno errors detected then gives the output ‘A compiler does not produce output, whereas an interpreter will do so line by line A program which is compiled, needs to be run by using another program to produce output. Whereas an interpreter takes the source program along data and produces output line by line. © —A-compiler may need to scan the program twice whereas an interpreter will scan the program. —Acompiler is user friendly as it does not disturb the user with errors line after line. It gives a complete set of errors in an understandable way to the user. An interpreter will disturb the user after checking each line if there is any error. Even, error reporting is not done in simple english. It can be understood only by a system level programmer. © Compilers are not portable as they produce target code which is specific to a machine. Interpreters are portable as they produce code which is machine independent. & —_Acompiled program needs more additional programs like assembler, linker/loader. Whereas, an interpreter does not need Compilers can be implemented casily in any programming language however implementing interpreters is not an easy task & For aprogram to be executed by using an interpreter, the time taken is more, as the output is produced line by line Ina compiler programmer can use tral and error methods to correct the errors, Whereas in an interpreter the programmer nceds to correct the incorrect line and only then the interpreter will check the next line. Q13. Give and explain the diagrammatic representation of a language processing system. Answer : Model Paper, a2(0) Language Processing System Language processing system isa system that translates the source language which is taken as input into machine language ‘This translation can be done by dividing the source file into modules. These modules are as follows, 1. Preprocessor Compiler Assembler Linker/Loader. Look for the SLA GROUP Loco {J on the TITLE COVER before you buy UNIT-1_ Introduction to Compiler Design and Lexical Analysis 17 “The typical language processing sysicm is shown inthe igure below soe ba Tat ramen eee | Cae Tt ttn bce ay Coe | aiertanief tei [sa | Feo Figure: Language Processing System 1. Preprocessor It is a special program processing the code prior to the actual translation to perform necessary functions like deleting ‘comments, adding necessary files and doing macro substitutions. 2 Compiler ‘A.compiler is a program that converts a sources program into a target program. 3. Assembler Assembler is a translator which translates assembly language program into object code. This program specifies symbolic form of the machine language of the computer. 4. Linker and Loader Linkers Its a program that links the two object files containing the compiled or assembled code to form a single fle which can bbe directly executable. It is also responsible for performing the following functions, 1. Linking the object program with that ofthe code for standard library functions and 2. Resources provided by the operating system like memory allocators and input and output devices. Loaders Itis a program which loads or resolves all the code (relocable code) whose principal memory references have undetermined initial locations present anywhere in the memory. It resolves with respect to the given base or initial address Q14. Describe the functionality of compilers in language processing. Answer : Functionality of Compiler in Language Processing, Acompiler is a software that takes preprocessed source file written in higher language like C and C++ as input and convert them into target language or machine language known as assembly language files. The translation of input from source code 10 large assembly code can be done by dividing the task of compiler into two distinct phases. 1 Front end 2. Back end. 1. Front End ‘The front end will include those phases of compiler which are dependent on source language and independent on the tanget machine is mainly responsible for, Analyzing the source input 4 Verifying the symax ofthe input © Checking the semantics of the input ‘© Generating the intermediate code of the input. SPECTRUM ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS ——— SIA GROUP {3 @ o) © @ © 2 COMPILER DESIGN [JNTU-ANANTAPUR] “The Front end phase of the compiler is subdivided ino Six phases, These phases areas follows, (2) Lexical analysis'scanner (©) Semantic analysis (©) Syntax analysisparser (4) Intermediate code generation {c) Intermediate code optimization. Lexical Analysis'Scanner For answer refer Unit, Q17, Topic: Lexical Analyzer (Scanner) ‘Semantic Analysis For answer refer Unit. Q17, Topic: Semantic Analyzer. Syntax Analysis/Parser For answer refer Unit. Q17, Topic: Syntax Analyzer (Parser) Intermediate Code Generation For answer refer Unit1, Q17, Topic: Intermediate Code Generar. Code Optimizer For answer refer Unit, QI7, Topie: Code Optimizer Back En ‘The hack end includes those phases which are independent on source language and dependent on target machine. The back end of compiler is responsible for generating target assembly language code from machine code. The back end phase of compiler consists of two phases. These are as follows, @ ) (a) Target code generation (b) Target code optimization, ‘Target Code Generation In this phase of compilation, a machine or assembly language code is generated from the intermediate code. ‘Target Code Optimization In this phase of compilation. the target code is optimized to produce more efficient target code, Q15. What is compiler and what is cross compiler? Answer: Compiler ‘A compiler isa program that converts the source program into a target program. Souwce prowam| Compiler | Tange program Figure (1): Compiler (Cross Compiler ‘The compiler which takes input from one machine and produces the target code as an output on another machine is called as cross compiler. Basically there are thrce type of languages that exist in, 1. Source language 2. Target language 3. Implementation language. Look for the SIA GROUP Loco {YJ on the TITLE COVER before you buy UNIT-1_ Introduction to Compiler Design and Lexical Analysis 19 ‘Cross compilation isa technique which helps in attaining portabil (1) which is as follows. ‘Cross compiler is represented with the help of Figure Figure (1): Diagram Let Tu, wibe languages where Tis the source language, us the language in which compiler is written and v isthe target language. The Tdiagram for language is shown below, Figure (2: Cross Compiler In figure (2) T-diagram compiles the compiler u and get a compiler T.y which compiles To v when run on w, which is generated by T.y~ 1.4.2 The Structure of a Compiler Q16. Write about machine independent phases of a compiler structure. Answer : ‘A compiler can be broadly divided into two parts based on whether the phase is 1. Language dependent 2. Machine dependent 1.10 COMPILER DESIGN [JNTU-ANANTAPUR] ‘The figure clearly shows that the first three phases of the compilers canner, parser and semantic analyzer are language dependent and machine independent. Similarly, the last three phases i.¢., intermediate code generator, code optimizer and code generator are machine dependent and language independent ‘The phases which are language dependent are grouped as Front end and the phases which are machine dependent are grouped as Back end. Input Symbol table Error handler Output Figure lependent Phases (Language Dependent) er For answer refer Unit-, Q17, Topic: Lexical Analyzer (Scanner), 2 Parser For answer refer Unit-l, Q17, Topic: Syntax Analyzer (Parser), 3. Semantic Analyzer For answer refer Unit. Q17. Topic: Semantic Analyzer. Q17. With a neat diagram, explain in detail the different phases of a compiler. Dec.-AiJon-15, Set, Qa) OR Explain with neat diagram, the various phases of a compiler. Mention the input and output for each phase. Dec-talJan-18,Set2,010) OR Explain the phases of compilation with example. (Model Papers, Q2 | Dec-14Jan.-18, Set-3, Q1(0)) ympiler ‘The phases of a compiler are as follows, Lexical analyzer (scanner) ‘Syntax analyzer (parser) Semantic analyzer Intermediate code generator Code optimizer Code generator. ‘These six phases perform their specified tasks and two additional activities, 1. Symbol table contact or interaction 2. Error handling ‘The block diagram comprising all six phases and two additional activities is shown in figure below. The compiler takes input program, docs the necessary actions and generates an output program. Look for the SUA GROUP Loco {J on the TITLE COVER before you buy UNIT-1_ Introduction to Compiler Design and Lexical Analysis 1 Target program Figure: A General Compiler Format To describe how a scanner works, the following things must be known, * Leveme Lexeme is a smallest logical part of a program. Token ‘Atoken isa class defined for similar lexemes of a program. For example, keyword, identifier. © Pattern To check whether a token is valid in a language, a pattem is used. Hence, pattern is a rule for describing a token. For example, “an integer can be eight characters long and can have only digits” is a pattern for token “number” 1. Lexical Analyzer (Scanner) ‘The scanner reads the input string and generates a series of tokens which will be used by the parser. Thus, the scanner ean ‘only convert an input string into tokens, however it cannot check whether the tokens give any meaning and are they syntactically correct according to the grammar of a language, Example wile(x < 10) In the above piece of code, the scanner cannot identify or report that wible is a misspelled form of while in C language. It treats wible to be a valid identifier, and generates a token for the same with a pointer to symbol table. i.e. However, the scanner can identify any extra space, new line character and delete them. Italso identifies any invalid special character and reports to the user about the same. Generation of tokens is the main function ofa scanner. However, it also interacts with symbol table whenever necessary and report errors when encountered, ‘ow, when coming to error reporting, the scanner deteets, locates, tries to repair and report errors. SPECTROM ALLIN-ONE JOURNAL FOR ENGINEERING STUDENTS SIA GROUP 1.12 COMPILER DESIGN [JNTU-ANANTAPUR] 2 Syntax Analyzer (Parser) ‘The parser takes the tokens generated by the scanner and checks the syntax of the input stream Line by line according to the given language and then generates a syntax tree. For example. if we consider the previous code, wible(x < 10) t , “The parser will detect an error when it read the fis token inline number 1. Because according to the syntax in C language 4 procedure cannot have condition as its parameter, Hence, it reports the error stating that “wible” is nota valid predefined funetion. 3. Semantic Analyzer also checks whether the input gives any meaning or not. The syntax analyzer looks for semantical errors. In addition to this it also performs type checking, which means it checks that an operation is performed on operands of the samme type as specified by the language (input). The semantic analysis phase also performs type coercion (that is automatic type conversion). Example int a= 12, b= 10; Aoat c= 0.0; © =a + b; //type coercion is performed here ‘The semantic analyzer will convert the integer variables "a" and °b" to real type as the destination is of real type. 4. Intermediate Code Generator ‘The intermediate code generator performs a conversion of syntax tree into a code called target code, The intermediate ccode should be easily transferred to target code and intermediate code generation should not take much time. ‘The intermediate code can be of three forms, a) Postfix (b) Triples (©) Quadruple. ‘The most general and commonly used is “three address code". In this type of code, cach instruction has atmost three operands ‘The intermediate code should have the following basic propertis, @ Each instruction can have only one operator in addition to the assignment operator. & Compiler should generate temporary locations to store the valve evaluated in each instruction, @ The code (3 address code) may have less than 3 operands in an instruction Code Optimizer ‘The code generated by the 4* phase is optimized to the extent possible by the code optimizer. However, while doing, 0 it does not change the meaning of the program. The level of optimization performed on the code differs from compilers to compilers. The compiler that spends time lavishly on this phase is called as optimizing compilers ‘Commonly used code optimizing techniques are, (a) Common subexpression climination (b) Copy propagation (©) Dead code elimination (2) Constant folding. 6. Code Generator ‘Code gencraton isthe last but most important phase of a compiler. It converts the intermediate code into machine or assembly code. Allocation of memory to variables in source program is also done by this phase. The code generator assigns registers to variables in the program. Look for the SUA GROUP Loco {J on the TITLE COVER before you buy UNIT-1. Introduction to Compiler Design and Lexical Analysis Example Let's takea statement in C language, inttoreal function is used to convert integer to real. Inthe statement the variables are all real variables however the constant 18 is not. Hence, the semantic analyzer coerces 15 to 15.0. MOVF means that the operation ison float or real variables. f preceding 15.0 means itis a constant. Reg,, Reg, are the registers. Q18. Show the translation made by each of the phases of the compiler for the statement position = initial + rate*60. Where position, initial and rate are real numbers. Dee-1aJan-18,Set-4,.0%0) oR Describe the output for the various phases of compiler with respect to the following statement, position : = initial + rate * 60. May-3, Set2,01 Answer : The given statement is position : ~ ini te 60 SPECTROM ALLIN-ONE JOURNAL FOR ENGINEERING STUDENTS ‘SIA GROUP {3 1.14 COMPILER DESIGN [JNTU-ANANTAPUR] “The translated statement is shown in figure. The input is scanned by the scanner or lexical analyzer. It recognizes position, initial and rate as valid tokens, Not only ths, it also makes an entry into the symbol table for each token, provided that they do not already exist in the symbol table. id,, i, and id, are the intemal representations of the tokens position, initial and rate respectively ‘The parser or syntax analyzer takes the output from the scanner, checks the syntax of the statement, and generates a parse or syntax tree, “The symtax tree is taken by the semantic analyzer. t does semantic checks on the tree, and makes necessary corrections in the luce. Thus, a8 an output semantic analyzer also generates a tree. For the given statement, semantic analyzer adds another node “in- ttoreal” to convert an integer constant to areal Figure: Processing Done by the Phases of a Compiler Look for the SIA GROUP Loco {J on the TITLE COVER before you buy UNIT-1_ Introduction to Compiler Design and Lexical Analysi 1.15 “The intermediate code generator converts the tre from semantic analyzer to an intermediate representation of the source program. For this, it uses temporary variables, For the given statement, the intermediate code generator uses three temporary variables ,.(, and [An optional phase in the compiler is the code optimizer. Its job is to optimize the code. Optimization refers to minimize the numberof instructions, operations, variables or temp-oraries et, without changing the meaning of the statements generated by the intermediate code generator. In figure, the number of temporary Variables is reduced from three to one. Moreover, the inttoreal' function is avoided and the value 60 is directly used as 60.0. The number of instructions is also reduced from four to two ‘The final phase in the compiler is the code generator. It generates assembly code or relocatable machine code. Since, the ttiven instruction is a floating-point type, the code gencrator uses float version ofthe instructions such as MOV, MUL and ADD. A constant is represented with a preceding ‘#". The most important task done by the code generator is assigning variables to registers. ant Explain the differences between pass and phase. Answer: Model Paper, 24s) Differences between Pass and Phase Pass ‘A pass is one complete scan of a program, which includes reading an input source program and converting the source program {into computer or machine understandable form. Figure (1): Logical Structure of a Pass "Now, any compiler cannot be designed as a single pass compiler due to future references or jumps. Example int a int add, ), int adit x int y) , Here, the compiler cannot assign or fill a value for variable ‘a’, as it depends on the return value of function add( ). The compiler needs to keep the value empty and then whenever it encounters the value in the source program, it should return to the line, where ‘a’ is declared and fill in its value. Hence, a compiler should be designed properly in order to save compilation time, ‘SPECTROM ALLIN-ONE JOURNAL FOR ENGINEERING STUDENTS SIA GROUP (73. 1.16 COMPILER DESIGN [JNTU-ANANTAPUR] Advantages 1. Atwo pass compiler, where one pass takes the input and generates an intermediate code and passes it to the second pass Which then generates the target code, has an advantage over the time spent in generating intermediate code. 2. Imassingle pass compiler, the program is rad only once 3. The-exccution time for a two pass compiler is very short 4. Pascal and C uses a single pass compile: Disadvantages 1 Fora single pass compiler the memory required is more. Code generated by a single pass compiler is not efficient 3. Complexity in designing a pass based compiler is more, Phase ‘A.compiler is divided into number of parts, segments or sections and each section is known as a phase. A general compiler has six phases. ‘Target program Figure (2: Phases of a Compiler In the figure (2), the six portions or phases comprises a compiler. The symbol table supervisor and error handler are not the part of a compiler, however they are responsible for secondary functions done by a compiler. All these six phases perform the job assigned to them and keep the symbol table and error handler in touch during the process. Now, the compiler can be broadly divided into two parts. They are, (i) Front end (ii) Back end. Look for the SIA GROUP Loco Qi on the TITLE COVER before you buy UNIT-1_ Introduction to Compiler Design and Lexical Analysi 1.17 “The Front end will include those phases of a compiler which are dependent on the source language and independent of the target machine, The back end will include those phases which are independent ofthe source language and dependent on the hay Sea By dividing the compiler into two parts we can do two things. For ingle front end we can attach a different back end to compile a language (same) on a machine, (ot Ea] tt pen eee Figure (3: Compiling Same Source Language on Different Machines In the above figure, M,. M,, (ti) To compile different languages on the same machine, take a back end and attach different front ends as necessary. Note that in a compiler we can have only one front end and one back end, Source progam a i hang M, are machines. Source progam ee i bg Figure (4: Comping Different Languages on the Same Machine Advantages 1. The compilation process is divided into a number of steps 2. The efficiency ofthe code generated is more. Disadvantages 1 Memory required for a phase-based compiler is relat ‘output that acts as input for another phase 2.__Ifany phase of a compiler enters into an infinite loop then the compiler gets stuck which may result into a system crash. 20. Explain the phases in detail. Write down the output of each phase for the expression, arsb+e* 50, Answer : (Model Papert, @2(0) | Nov/Dee.-12, Set-4, 2N()) Phases of Compiler For answer refer Unit-l, QU7 [Exclude Example}. ely high than a pass-based compiler as each phase generates an jiven expression i, a=brer So SPECTRUM ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS SIA GROUP {3 1.18 COMPILER DESIGN [JNTU-ANANTAPUR] "The output of the expression a= b+ 50 is shown in the figure below, Look for the SUA GROUP Loco {J on the TITLE COVER before you buy UNIT-1_ Introduction to Compiler Design and Lexical Analysis 1.19 G21. Explain the data structures used in a compiler. Answer: A data structure isa type of structure used to store data in a machine. {Let us consider the data structures of © language as the implementation of ° compiler can be performed using C language. ‘The list of data structures that are used in a compiler are as follows, 1 Amays Linked lst 3. Stack, “The compilation process looks very simple when we look atthe above figure. However, the compiler is also a software which needs to be written in a programming language. Now the language would need some sort of data structures to store huge amount of data perform operations on data and represent the output from few phases. 1 Arrays ‘An array isa data structure that i used in a compiler «is defined asa collection of similar data items ina linear or sequential form. Arrays can be single, two or multidimensional. Now, we have looked atthe definition for an array, but how docs an array relate to a data structure used in a compiler? ‘The six phases of the compiler interacts with symbol table, which stores not only tokens and references but, also the values of tokens, Table: Symbol Table of a Compiler If we closely look at symbol table it uses a number (huge) of arrays in itself. 2 Linked List AA linked list is another data structure that is used in a compiler. It can be considered as an extended or sophisticated form of an array. That means, a linked list isa linear data structure as an array but there exists relations or links between the elements stored. Hence, the name linked lis. Figure: A Linked List for 3 Elements If we look at the above figure, it is the extended or highest version of a linked list. A normal linked list or just a linked list has two portions or parts init. The first portion consists of data and the second portion of a pointer to next clement in the list. However, im the above figure each packet consists of two pointers SPECTROM ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS SIA GROUP {3 1.20 COMPILER DESIGN [JNTU-ANANTAPUR] Example and P, where P, points to the previous packet and P, to next packet In compiler, the extended form of linked lists are used to represent syntax tree, which i the output of syntax and semantic analyzer. Let us take an example using linked lists, Example For the string 19-5 +20, the syntax tc could be, J sain LN tb im mim | | wos Figure: Sytax Troe Now; the equivalent representation ofthe above syntax tee in the form ofa linked lists show in figure below Figure: Linked List Representation ‘The leaf nodes do not make use of their packet completely, because they are the child nodes and has no children. Thus, instead of wasting space, make use of extended linked list at all the levels of the syntax tree and an array at the leaf level. 3. Stack AA stack is an array or collection of unrelated data. Which means we can store any type of data in a stack. The type of data in one cell of stack may be different fiom that ofthe another cell, Stacks are used to perform operations on data like push, pop. Figure: Stack Representation ‘means either push can be done or pop but not both, Now, the operations push and pop are mutually exclusive, t ‘Stacks are used in the syntax analysis phase of a compiler ‘The property that any stack has is thatthe last element pushed has to be poped first (IFO), which is commonly referred to as Last In First Out Look for the SIA GROUP Loco {Yj on the TITLE COVER before you buy UNIT-1. Introduction to Compiler Design and Lexical Analysis 1.21 1.4.3 The Science of Building a Compiler 22. Define the term code optimization. Discuss briefly the design objectives of compiler optimization. Answer : Code Optimization ‘The term ‘code optimization’ refers to program transformation method in the synthesis phase that tiesto generate a machine code which can runs faster using fewer resources like CPU and memory. Objectives of Compiler Optimization ‘The design objectives or goals of compiler optimization process are as follows, 1. Correctness ‘The compiled program in any way ned to maintain the meaning of the program i... it requires correct optimization, 2. Performance ‘The optimization need to improve or enhance the performance of the program. In this, performance means execution speed of the program. 3. Compilation Time ‘The time required for compil 4. Maintenance Cost ig the program need to be acceptable so as to support development and debugging eycle {In general compiler systems involve complexity. Hence. the systems need to be simple so that the engineering effort and maintenance costs of the compiler are manageable to solve critical problems. 1.2 LEXICAL ANALYSIS 1.2.1 The Role of the Lexical Analyzer 23. What is the role of lexical analyzer? Nower2, Set, a1ta) oR With necessary error recovery actions, explain in detail the role of lexical analyzer. Nov-11, se-4, a1) oR Explain the functions of lexical analyzer with its implementations. Answer: Model Paper, 22a) Lesieal Analyzer Lexical analysis is the process of reading the input string. identifying tokens, deleting white spaces and locating. repairing and reporting errors if there are any. ‘The lexical analysis is implemented by the first phase of a compiler. ele ‘The scanner ofa lexical analyzer is responsible for, 1. Removal of white spaces and comments 2 Mentifying (i) Constants (ii) Mentifiers ‘ ) Keywords, 3. Generating tokens 4. Reporting about errors. SPECTRUM ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS. ‘SIA GROUP (3 1.22 COMPILER DESIGN [JNTU-ANANTAPUR] 1. Removing White Space and Comments Whenever the scanner encounters a blank, tab or a newline character it blindly deletes it. The reason is the parser need ‘not worry about white spaces, as it itself is a complex phase. The parser cannot have regular expression for white spaces. Even ‘comments can be ignored by the parser, as they will be deleted by the scanner, while generating tokens 2 Identifying Constants Scanner, whenever encounters a constant, it makes an entry into symbol table and return the pointer. (it) Identifiers ‘An identifier can be a name of a function, variable, etc. However, the grammar of any language will consider identifier asa token. Example ‘The statement in C, anath; ‘The lexical analyzer would consider the aboves tatement as, iden ~ iden + number; Wherein, iden is one token =, (ii) Keywords ‘keyword in any language will follow all the rules that are imposed on an identifier. The point to be noted is that keywords should be made reserved identifiers so that the scanner do not get confused between an identifier and keyword. number are also treated as single tokens. 3 Generating Tokens Whenever the scanner identifies a valid token it generates a tuple ofthe form, Where, 4 ina valid lexeme (identifier, number, keyword es). 48 the pointer to symbol table for g “The token generation ofthe lexical analysis phase is initiated by parser by using a function namely nexttokent ). A fer recel a signal from parser the scanner starts reading the input string tll it finds a valid token. 4. Reporting about Errors “The lexical analyzer in some compilers is responsible for making a copy of source program and marking errors init. The scanner should be careful about duplication ofa single eror. However, the scanner when encounters something fishy invokes a procedure in eror handler (part of a compiler. ‘The lexical analysis is completely dependent on the source language and independent of machine, on which the program oF source language is being compiled. @24. Explain why the analysis portion of a compiler is separated into lexical analysis and parsing phases. Answer : “The lexical analyzer being the fist phase of a compiler reads or scans the input to the compiler and divide the input into 2 numberof small pats ofa string (token) Look for the SUA GROUP Loco {J on the TITLE COVER before you buy UNIT-1_ Introduction to Compiler Design and Lexical Analysis 1.23 Figure: General Representation of Lexical Analyzer ‘The above figure clearly shows that the lexical analyzer acts only when it is invoked by the parser. ‘There are a number of reasons for which the lexical analyzer is separated from syntax analyzer, Reasons 1. The division of lexical analyzer from syntax analyzer improves the efficiency of the compiler. Each phase is given their responsibilities. “The less the number of tasks, the more is the efficiency” is the motive behind separation. Ifone phase reads from tokens, check syntax. ‘ofa string, then the time consumed will be more and. task may not be performed efficiently. By the division of lexical analyzer and syntax analyzer intotwo phases, the designing of «compiler is made very «easy. The reasons if these phases are put together then, all the functions that are toe performed by the lexical analyzer hast be performed by syntax analyzer. Of al the funetions performed by lexical analyzer. the most difficult is removing the comment, white spaces. new line characters from the input string. A syntax analyzer isin tse the complicated phase (in terms of design ‘which performs syntax chocks acordingto the grammar of any given language. If such a complicated phase performs reading input string. removing spaces looks very absurd. Hence, the division of these two phases makes the design of each phase, in tur design of the compiler easy. 3. Token generation and then storing te input token with its anributes in symbol table is not an casy task. The lexical analyzer not only does token generation, but also makes a copy ofthe source program and then links with it the number of errors if any. For example, consider the following program in C language. finelude void main() ' printf(“Hello”); The lexical analyzer reads the input one by one, forms token with attributes and stores it in symbol table. When it encounters *!” it does show an error message as in some ‘compilers the lexical analyzer is responsible for checking so. Let us take an example, on how a lexical analyzer divides string into tokens, assign attributes and store them in symbol, table. A=BrC The lexical analyzer forms tokens that is any name as in 4, B, student is an identifier, operators like add_op, mult_op, div_op are used whenever such operator is encountered. ‘The format of token given to the parser is, For identifiers. For operators type’ can be any mathe-matical ‘operator. Let us generate tokens forthe following string, ASB+C id, pointer to symbol table for A> id, pointer to symbol table for B> Hence, the division makes, Compiler design casy Increases efficiency Reduces compilation time. Moccover the division also improves portability ofthe compiler. For example, ifany special characters like Tare there in Pascal language, the representation isnot available, so the character ean be isolated. By making lexical analyzer delete non standard character, the compiler becomes fre fom language cor device specific restrictions. The parsris not even aware of any language or device specific restrictions, it works normally irrespective of any device specific constraints SPECTRUM ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS, ‘SIA GROUP (5 1.24 COMPILER DESIGN [JNTU-ANANTAPUR] ‘G25. Define lexeme, token and pattern. Identify the lexeme that make up the token in the following program ‘segment. Indicate the corresponding token and pattern: void swap(int i, int) { int; tei isi ist: , Answer : (@) Tokens Token is a sequence of characters that can be treated as a lexical unit in the programming language. It consists of two parts namely token name and an attribute value. Examples of tokens in a programming language include identifiers, keywords, punctuations and operators. (b) Patterns Pattern isa rule describing the form of a token. In programming language, pattern is used to determine whether a token is valid or not. Regular expressions are important notations for specifying patterns. For example, the patter for toker is id — letter “* (etter “* digity® . keyword is sequence of characters that form the keyword and for token, identifier (id) (©) Lexemes LLexeme is the smallest logical part of a source program that is matched by the pattem for token, Lexical analyzer recognizes Jexeme as an instance of the token. $62,525.00. For instance, token *RELOP® contains lexemes like Consider the following °C’ statement to better understand the difference between tokens, patterns and lexemes. printf “sum ~ Yd" In the above statement, sum and average are lexemes that matches the pattern for token identifier and “sum = ed which is enclosed between ~" is a lexeme matching the pattern for token, literal. average): ‘The table below shows some tokens, their corresponding pattern and lexemes. identifier letter followed by leter or digits sum, average ‘umber any number 0.2.14.3.26 while (Characters w. while RELOP . 7 1.26 COMPILER DESIGN [JNTU-ANANTAPUR] Texieal Errors A lexical analyzer reads input string and generates tokens. In doing so, it may encounter the following types of errors. 1. Amisspelled identifier 2. Amidentifier with mote than specified or defined length ‘These errors can be caused due tothe following reasons (@)— Anextra character than specified or defined length An invalid character Ammissing character ‘Swapped characters or misplaced characters. ‘The majority of errors that can be identified during the lexical analysis phase of a compiler are misspelled variables or ‘identifiers. The lexical analyzer or scanner without any grammar ofa language cannot do more than this. However, while making 2 copy of source program, it cannot only mark errors but also, give line number in which the error has occurred Seta flag in symbol table for variable or identifier. This lag can be set ON if an error has occurred due tothe identifier. By doing so duplication of error messages can be overcome. For example, if in line number 10 is declared as an idemifier and in line number 13 it is used as an array and an error has occurred in line 10, then it should not give an error in line ‘number 13 because of line number 10. Example Let us consider a statement in C language, printfg(“Hello”) In the above statement there are number of errors printfg is nota valid function in language C however itis a valid identifier. Hence, the scanner cannot detect an error. It considers printf to be a valid identifier. But, this error will be detected during syntax analysis phase. As well as after the “(there is a missing quotation and the statement is ending with a colon and not a semicolon. Lexical Analyzer Error Handler ‘Any compiler should not only detect and report errors but also try to handle errors to some extent. But this is very expensive in terms of efficiency and speed, asthe compiler cannot repair assuming what a programmer actually needs, Rather it can try 10 repair tothe extent possible and give a message to the programmer about the repair as well. For example, if an identifier Sname is declared which is an invalid identifier in C language, the lexical analyzer can delete the first character and retain the rest. In the above scenario. the repair was very quick and easy, however in real time scenarios this is not the situation. Errors can be very ambiguous and hard to repair, in such situations the scanner can repair to the possible extent and leave the rest the programmer. Repairing an Error 1. Numeric related errors, can be handled in an easy manner, The scanner should read a numeric constant and ifit exceeds the length then the extra numbers can be truncated and rest retained. The truncation is done from the place where it exceeds the defined length. For example, in C language integer can be of length 2 bytes. Ifthe constant exceeds the size, the scanner will truncate the exceeded portion and inform the same to the programmer. 2. Incase of comments if an extra character appears outside the comment section or boundary. the character will be deleted. 3. Lanillegal or invalid character appears, the same can be deleted. 4. Ifthe scanner is not able to repair an error and is unable to proceed further, it should not get stuck to the error rather it ean delete the characters till it gets a valid token that can be passed to the parser. Such type of error recovery is called parsic mode recovery. Look for the SUA GROUP Loco {YJ} on the TITLE COVER before you buy UNIT-1. Introduction to Compiler Design and Lexical Analysis 1.27 1.2.2 Input Buffering 27. Explain the input buffer scheme for scanning the source program. How the use of sentinels can improve its performance? Describe in detail. Nove, Sett.a2ia) OR What is meant by input buffering? Explain the use of sentinels in recognizing tokens. Answer: Model Papers, 4a) ut Buffering ‘The only phase of a compiler that reads or scans the source program is the scanner or lexical analyzer. The scanning is carried out on a character-by-character basis, thus buffering is a crucial step in making the lexical analyzer to cope up with the spoed of other phases. Input Buffer Scheme Buffering is used in deciding whether a pattern forms a valid token or not. The lexical analyzer reads the input characters from the buffer. Input buffer scheme is one of the buffering techniques. In this technique, the entire buffer is partitioned into two halves each capable of holding °C” (number of characters), as shown in the below figure, 1 halt hall Figure: Two-halves Input Buffer AA single read command is used to read all the °C’ characters into both halves. A special character “eof” is read into the buffer, ifthe number of characters in the input are less than C. That is, “eof” indicates end of source file ‘This technique uses two pointers lex_beg and /(forward pointer). The sequence of characters between these pointers give the current lexeme. Inthe beginning, lex_beg and f point to the same first character. Then, only the pointer f is incremented till a lexeme is found. Once a lexeme is found, the pointer fs set to point to the next character. When current lexeme is processed, both pointers are set to point to the first character of the next lexeme. If poimter fis reached to the end of the first half of the input buffer, and it is about to point to the first character in the second half of the input buffer. then the second half is immediately filled with “C’ new characters. Similarly, when °f"is about to move out ofthe second half, the left or first hal is filled with "C” new characters and ‘f "is made to point to the first character in this half, ‘The input buffer scheme involves moving the forward pointer “(An algorithm for advancing the forward pointer ‘fis as follows, Algorithm : Advance Forward Pointer ‘Stepl:Check whether °f is atthe end of frst half, I'yes, then goto step (2) else goto step (3). ‘Step2: Refill the second half and inerement ‘f by 1. ie., make f= +1 ‘Step3: Check whether */"is at the end of second half. If yes. goto step (4), else goto step (5). ‘Step4: Refill the first half and make f point tothe fist character of the first half ‘Steps: Increment ‘f by 1, i.e.,make f= f+ 1 ‘The two-halve buffering scheme works well for limited lookahead, i. i can recognize a token only if characters making, a token fit into the same buffer. Programs writen in PL are more challenging to these techniques, as the lookahead is too large For example inthe statement, DECLARE(A1, A2, 440, DECLARE can cither mean an array name or a keyword. To decide the exact one, ‘f*has to be moved tll the character next to the right parenthesis is reached. Moreover, for every increment of pointer ‘f we are required to perform two tests. SPECTRUM ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS SIA GROUP 23 1.28 COMPILER DESIGN [JNTU-ANANTAPUR] Buffering with Sentinels ‘The inefficiency of performing two tests for each increment of pointer ‘fin the input butler scheme, can be eliminated by sing sentinels, A sentine! isa special character usually eof. Sentinels are neither the part of the source program nor counts to the ‘number of characters in the buffer. At the end of each half, a sentinel ic., cof is used to indieate the end of that half. Moreover. ced of the source program is also indicated by another eof. Thus, three eof are used when the frst half is full and the second half is not completely full accof [b+ 1 cof cof eee wt L Swag if eri Figure: input Butter With Sentinels, With this scheme, most of the times perform only one test to check whether fis pointing to the eof only. When °f reaches either of the eof, i. the first half ofa buffer, the second half of a buffer or the end of the source program required to perform additional tests. The improved algorithm for incrementing ‘f using sentinels is as follows, Algorithm : Improved_Advance_Forward_Pointer ‘Step1: Increment the forward pointer 'f' by | ic. f= f+ 1 Step2:Check whether is pointing to a sentinel ic, cof Hfyes, then goto step (3), else exit, ‘Step3: Check whether ‘/ is at the cof of the first half. If yes, then goto step (4) clse goto step (5). ‘Step4: Refill the second half and increment 'f by 1. ‘StepS: Check whether / is pointing to the end of the second half. If yes, then goto step (6), else goto step (7). Stepé: Refill the first hal and make“ point tothe fist character inthe fst half. ‘tep7: End ofthe source program is reached, thus, terminate lexical analysis. 1.2.3 Specification of Tokens 28. What is a language? Explain operations on language. Answer : Languages ‘The language denoted by Lisa set of strings over the alphabet E. For example, English isa language in which letters E,N. G,L. 1S, H are of strings over the alphabet. Similarly, any programming language like Cis a language which contain program as subset of strings formed over the alphabet of the language. However these languages are dificult to specify. Moreover the symbols like @, or empty set are also languages. Some examples of languages are as follows, & The sct of strings consisting of equal numberof a's and b's : {€, ab, aabb, babaab....} & The empty language over any alphabet : {6} © For any ¥, ¥°is a language. Operations on Languages ‘The operations on languages are called the Regular Operations. These are as follows, 1 Union 2 Concatenation 3. Closure or Kleen star 1. Union Union is the simplest operation on two languages. If L, and L, are two languages then the union, denoted £, UL, i8 a language containing al strings (w) from both languages. L,OL,* bel winL, or wisinL,} Look for the SIA GROUP Loco {Jf on the TITLE COVER before you buy UNIT-1 Introduction to Compiler Design and Lexical Analysis Examples (Let, £,= fa.ab, ba, bb} and 4, > tacab, aby then L,U2,= {a.ab,aab, ba, bb} ii) Let, L, = 0,00, 000,...} and 1, = {0,000, 0000,..} 1,0 L,= 10} 2 Concatenation ‘The concatenation of two languages L, and L,. denoted by L, L, is a language containing the strings from L, followed by strings from L., 1, Ly fst |, im L, ands isin ab, ba, b} and L, = {by 1, L,~ tab, abb, bab, bb} Gil) Let, £ be a language, Then Lo = 9 = 0 Kicen Star/Closure Kleen star of a language (L,), denoted L*, isa language ‘containing all the strings obtained by concatenating zero ‘or more strings from L. Ue Where, t= 12 Example 6 isthe empty rng Let L= (0, Hot = {60,101 10,000,001.) Positive closure ofa language, denoted Lisa language ‘containing all the strings obtained by concatenating one ‘or more strings from L. e-Ue Example Let = {0,1} then £* isthe set ofall strings over the alphabet © = {0,1} having a least one or more digi 229. Write short notes on token specification. Answer + ‘Token represent a categorized block of text. Tokens can bbe specified using the following, 1 Sings Languages Regular st 3. 4. Regular expression. 1.29 ‘Strings A string is a finite collection of symbols chosen from some alphabets. ‘The set of strings over an alphabet ¥~ {a,b} is denoted by E*, which is a set containing empty (€) and all combinations of @ and b. E*= tE,a,b,ab, ba, 1 Languages The language denoted by L is a set of strings over the alphabet Z. For example, English isa language in which letters E, N.G, L. I, S, H are of strings over the alphabet. Similarly, any programming language like C is a language which contain program as subset of strings formed over the alphabet ofthe language. However these languages are difficult to specify. Moreover the symbols like ¢, or empty set are also languages. Some examples of languages are as follows. % The set of strings consisting of equal number of a’sand b's: {€, ab, aabb, babaab.... © The empty language over any alphabet : {9} @ — Forany 5, © isa language Regular Set Let Ebea finite alphabet. Then a class containing sets of strings over called regular sets are recursively defined as follows, (a) The empty set is a regular set over (b) _&. the sting of length 0 is a regular set over 5 (©) Each input symbol say ain ¥ is a regular set over = (4) If Pand Q are two regular sets over ¥. then © PU Qisa regular set over ¥ © PQisa regular set over S. & Pris aregular set over Examples 1. Let {0}, them the set of strings 0, 00,000...) isa regular set Let = {0, 1}, then the set of strings {01, 10} is a regular set. Let = {a} then the set of strings starting with ‘a and ending with b isa regular set. Regular Expression A regular expression is a concise notation for denoting, regular sets, Regular expressions describe the language accepted by a finite automata, ‘SPECTRUM ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS. ‘SIA GROUP {5 1.30 COMPILER DESIGN [JNTU-ANANTAPUR] Tanguages Assoclated with RE, Let ¥ be a finite alphabet, Then the regular expressions over ¥ denoting the regular sets are defined recursively defined as follows, 1.9 is a regular expression denoting the regular set {0} 2. € isa regular expression denoting the regular set {¢ | 3. qisa regular expression denoting the regular set {a} IF and Q are regular sets of languages L, and L..then & (a+ b) is a regular expression denoting the set PU O © pq isa regular expression denoting the set PO & Pisa regular expression denoting the set P* Example (0 + 1)*1 is a regular expression denoting the set of all strings over {0, 1} starting with 0 and ending with 1. Q30. Find regular expressions representing the following sets, (a) The set of all strings over {0, 1} having atmost one pair of 0's or atmost of one pair of 1's. (b) The set of all strings over {a, b} in which the number of occurrences of a is divisible by 3. (c)_ The set of all strings over (a, b} in which there are at least two occurrences of b between any two occurrences of a. (d)_ The set of all strings over {a, b} with three consecutive b's. Answer: (a) ‘The Set of Al Strings Over (0, 1} Having Atmost One Pair of 0's or Atmost of One Pair of 1's ‘The set of strings ever {0, 1} having atmost one pair of 0's means the set of strings may contain no pair of O's or may contain exactly one pair of 0's. => Regular expression for no pair of 0's, aon = Regular expression for exactly one pair of 0's, (+ anto00 + 01)* ‘Similarly, the set of strings over {0, 1} having atmost one pair of 1's means the set of strings may contain no pair of 1's ‘of may contain exactly one pair of I's => Regular expression for no pair of 1's, (0+ 10) = Regular expression for exactly one pair of I's, (0+ 10)*11(0 + 10)* So, the regular expression forthe given set is obtained by combining all the above expressions, (1+ 01) + (1 + O1)*00(1 + O1)* + (0+ 10)* = (0+ 10)*11(0 + 10)* (b) ‘The Set of AMl Strings Over (2, bj in Which the Number of Occurrences of ais Divisible by 3 ‘The number of occurrences of ‘a’ is divisible by 3 means the string may have zero a's three a's, six a’s, nine a’s and so => Regular expression for zero a's, i. = Regular expression for 34's, 6s 9's (btab*ab™ab Look for the SIA GROUP Loco {J on the TITLE COVER before you buy UNIT-1 Introduction to Compiler Design and Lexical Analysis 1.31 Te emt rela expresion can be obtained by combining ihe regular expression of few a's 3a Ga 90s ee ich is given as, bt + (bab abtab*y* (©) The Set of: ofa Strings Over {a, bj in which There are at Least two Occurrences of b Between any Two Occurrences > Regular expression of the string having no as is b* = Regular expression for only one a is b*ab* => Regular expression for atleast two occurrences of b and only one a, (b+ abby* = Regular expression for at least two occurrences of h between any two occurrences of @ will be ofthe form, (b+ abbytab> So, the regular expression for given set is b* + (b + abbytab* (d) ‘The Set of All Strings Over (a, b} with Three Consecutive b's ‘The example string for this language are bbb, abbba, bbbab, ababbb etc. The regular expression for three consecutive b's is the form, (5, bbb(S,) Where S,. 5, can be any number of a's or b's. So, regular expression for any numberof as oF b's is, (a+ ‘Therefore, regular expression forgiven set is (a+ b) *hbb(a ~ b*) 1.2.4 Recognition of Tokens Q31. Write a transition diagram to recognize the token relop (corresponding to relational operators in C++ language). oR What are the transition diagrams? Explain how they are used for token recognition with an example. Answer : Model Papert, 030) ‘Transition Diagrams ‘The transition diagrams are the diagrams obtained in the process of transforming a regular expression pattern into a flow chart. These diagrams are collection of nodes and edges, wherein nodes/circles represent states and edges labeled with a symbol or set of symbols show direction from one state to another. Each state in the transition diagram corresponds to a condition in input scanning process, of a lexeme to be found that matches with a regular expression pattern, For instance, the transition diagram for regular expression, abe’ Figure: Transition Diagram ‘The conventions about transition diagram are as follows, 1. Intransition diagram, accepting states or final states are represented by double circle, The final states represent that the required lexeme has been found. Final states include an action if it wants to retum an action typically like a token and the corresponding attribute value to the parser. 2. Intransition diagram, as there are many transitions there isa possibility of failure to occur. If this happens then forward pointer is retracted back to start state and then the next transition diagram is activate. 3. Intransition diagram, an edge marked with statis referred to as “start state" or “initial state”. Transition diagram always begins from the “start state” after which only all the subsequent input symbols are read. SPECTROM ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS SIA GROUP 13 1.32 COMPILER DESIGN [JNTU-ANANTAPUR] Examples For instance, consider the transition diagram identifying lexemes for “relop” ‘Transition diagram begins at start state q, Ifthe first input symbol recognized is > then among all the lexemes i... >=, <>, =)only lexemes (>= , >) are to be recognized. ‘Transition diagram leads > and enter into state q, If the next character is = "then lexeme > is recognized and the state 4, is entered where the token and the corresponding attribute value is returned. Ifin state q, the next characteris > or < then the pattern identified is either >< or >> which are not in relop so we attach a * to the final state and retract the forward pointer one position (ie, lexeme do not include the symbol that got us to final state) Ifthe first character in state g, is*=" then this one character must be the lexeme and state a, is entered and its correspond- ing token and attribute is returned. Finally, if the first character in q, is < then state q, is entered. If in q, the next character is “= then the lexeme is ree- ognized as <~= and enter into “a,” by retuming token and attribute value. If< > is recognized then state g, is entered and token, attribute value is returned. In state q, if other character, lexeme is << which do not match with any of relop. So we attach * and retract the forward pointer, Q32. Write a regular expressions and NFA for the following patterns. Use auxiliary definitions where convenient. (i) The set of words having a, ¢, i, 0, u appearing in that order, although not having necessarily con- secutively. (ii) Comments in c. Answer : (i) The Set of Words having a, ¢, i 0, u Appearing in that Order, Although not Having Necessarily Consecutively A word can be any number of a ej, and consonants. Hence, the definition that can be formed for consonants is, Look for the SIA GROUP Loco {¥ on the TITLE COVER before you buy UNIT-1_ Introduction to Compiler Design and Lexical Analysis 1.33 Consonant —> b/d) fl gihipn| pane Word —> (Consonant | a)* (Consonant | e)* (Consonant | i)* (Consonant | 0)* (Consonant | u)* Now, an NFA can be drawn like this, i) Comments in C If we consider, (a) Single line commemts in C, then it begins with // and followed by characters (letters) or/and digits, and at the end a blank, tab or newline character. Example {Commer design foe C compiler ther we havea new line character The regular definition first for white space is, delimiter —> blank | tab | newline white space delimiter Comments —> /Mletter| digit |@| S| ~|+| *|~| | white spacey - A'NFA for the comments (single line) in Cis, fewer | digit | @ |S ~| + *.. |, white space Q () Start t / () -@+-©) (b) For multiline comments, the format in C is, hoon Compiler design in C*/ * The NFA would be, letter | digit | @ |S ..... | whitespace Q 0.9 + + -9 © 133. What are reserved words and identifiers? Explain how they are recognized with an example. Answer: ‘An identifier can be a name of a function, variable, constant ete. A keyword in any language will follow all the rules imposed on an identifier. In Lexical Analysis, keywords are recognized as identifiers. So keywords should be made reserved ‘identifier so thatthe scanner do not get confused between an identifier and keyword, For example, If, then and else are keywords, but scanner treats them as identifiers. ‘SPECTRUM @LLIN-ONE JOURNAL FOR ENGINEERING STUDENTS SIA GROUP {5 1.34 COMPILER DESIGN [JNTU-ANANTAPUR] © Onnienanneat ey Figure: Transition Diagram for kdentitiers and Keywords ‘The above figure is a transition diagram for searching an identifier lexeme. The diagram also recognizes keywords such as if, then, else as they look as identifiers. Keywords which look like identifiers can be handled using two methods. Initially a symbol table i created in which information about keywords is stored. Two return statments called dmstall_ld () and gettoken ( are used in order to obtain attribute value and token respectively. The functionality of install_id( ) is to search fora lexeme in symbol table and make its entry. Ifthe lexeme present in a symbol table isa keyword then install id (©) returns ‘0° and if the lexeme found is a variable then install_id returns the attribute value ic. pointer to table entry Ifthe lexeme is not found in a symbol table then itis placed in symbol table and a pointer to table entry is returned. The function of gettoken ( )is to return the corresponding token ic, either identifier or keyword for the lexeme found. In second technique, a transition diagram is created foreach keyword. fn this transition diagram, test for non letter oF digit is applied. For example, considera transition diagram for keyword “ELSE” en ee! OO O7-O*-O-#-D) ‘Figure: Transition Diagram for Keyword Else As scen in the above transition diagram it ends with a state non letterdigit. It means that after any identifier it can be either a digit or a non leter but can not be a character. It is compulsory to check whether identifier has ended or not otherwise a token “ ELSE ™ is returned for the lexeme like elsewhere. Q34. Draw the transition diagrams for the following strings, (a) bbbb, babbb and babaa (b) construct, consequent, sequent (©) (afb)* b (alby* a (a/b). Answer : o bbb, babbb, babaa ‘Transition diagram forthe given strings is, Construct Consequent Sequent ‘Transition diagram forthe given strings is, Look for the SUA GROUP Loco {YJ on the TITLE COVER before you buy UNIT-1_ Introduction to Compiter Design and Lexical Analysis 1.35 (©) (aby* b (aby* aay ‘Transition diagram forthe given strings is, 1.2.5 The Lexical Analyzer Generator Lex 35. Explain format for the input or source file of LEX. (Dee 18dan 14, Set2, QtlayNov-t2, Set, a1a)) oR Write about lexical analyzer generator. Dee -t3isan-14, Set, 10) oR Develop a lexical analyzer to recognize a few patterns in PASCAL, C and FORTRAN (eg: identifiers, constants and operators). Answer : (Model Papert, a3) | Nov-12, Set-2, Q1()) LEX program is a tool that is used to generate lexical analyzer or scanner. The tool is often referred as lex compiler and its input specification as lex language. Lex Specification Any lex program constitutes three parts or sections init ee “Translation rules section ee ‘Auxiliary procedures Lex progam Figure (1): Structure of a Lex Program Each section is separated by °%%, Declarations Section This section is used to declare variables, manifest constants and regular definitions. ‘A manifest constant is an identifier that is used to represent a constant. SPECTRUM ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS SIA GROUP {3 1.36 COMPILER DESIGN [JNTU-ANANTAPUR] Example In C manifest constants are declared a ‘define MAX = 10 ‘A regular definition is a sequence of regular expressions where each regular expression is given a name for notational cconvenicnce. If re, 18,..f6, is a set of regular expressions used to denote a language and n,n... is the set of names for the regular expressions respectively, then a regular definition is given as, re, —r, n> re, ‘Translation Rules Section ‘The translation rules is a set of statements, which defines the action to be performed for each regular expression given. ‘Below isthe form of statements in translation rules section, RE, {method} RE, {method,) RE, method, RE, isthe regular expression and method, isthe program, which defines the steps tobe taken whenever the lexical analyzer finds that a lexeme matching the regular expression RE,. The methods are usually implemented in language C, however they ean ‘be implemented in any language Auuiliary Procedures ‘These procedures are used by the methods in translation rules section. The auniliary procedures can be compiled separately, however loaded with the scanner. Ieput Tokens. SS Figure (2: Coordination between Seanner and Parser Behavior of Lexical Analyzer (If Generated by LEX) with Parser ‘The process is initiated by parser in the nced of tokens, This invocation makes scanner to read the longest input lexeme that exactly match any of the regular expressions RE, and if the lexeme matches more than one regular expression then the scanner wil select the fist one. After checking forthe match, the respective method or function method, i called or performed. ‘After executing the method the control should be handed over tothe parser. if unable to do so the scanner will scan unless it gets ‘a lexeme whose method (action) will result in a control return to parser. As a result of successful return to parser, the scanner returns a token fora lexeme in the input. Figure (3): Lex Format to Create a Scanner Look for the SIA GROUP Loco {J on the TITLE COVER before you buy UNIT-1. Introduction to Compiter Design and Lexical Analysis 1.37 The Tex compiler takes » Lex program lexgen./ as Input and generates Lexgen yy.C, Which is © program, Then this C program (Lexgen.yy.C) is compiled through the C-compiler to generate an object code file *O.out'. To O.out we ean pass the strings or input to be scanned (or) to ©.out we input a string for which we want tokens to be generated, Hence, O.out becomes the scanner or lexical analyzer Q36. Explain with one example how LEX program performs lexical analysis for the following patterns in identifier, comments, numerical constants, arithmetic operators. oR Write a LEX program for identifying the keywords and identifiers from the file? Answer : ‘The LEX program for the patterns is given below, mt NUM, IDE, AOP. * word Ulster git |) sop By-1ieia delimiter [blank | tb | newline}s white space [delimiter] letter [ozA-z] digit (0-9) identiicr Het} * {etter} {digit} )* num {eit} ( tdigt} +) EE taigit) 7 comments?%% — {/* (word) | digit | white space)'*/ {comments} {/*No action is taken, no value is returned*/} white space} _{/*Noaction performed and no return value*/} fidentifier}—_{yLvalue ~install_iden( ); t return(IDE);} {num} tyLvalue = install_numbert ); taop} ‘return(NUM)s} 16% {return AOP);} install_iden() /* This procedure installs the lexeme, which is pointed by the pointer text and its length is length, in to the symbol table. This procedure returns a pointer into that lexeme*?/ ’ install_mumber(){ install_number() stores the number in symbol table and returns @ pointer to its entry, SPECTRUM ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS SIA GROUP (3 1.38 COMPILER DESIGN [JNTU-ANANTAPUR] Q37. Write a lex program that copies a file, replacing each non-null segments of white space by a single blank. Answer : Detmnter blak | White space [dir oer ie ‘Tramnce { (8) copy fle) ‘aks | white space) {replace nk( fie} coe Me ‘tis Function copies the fie and ‘returns a poier tothe copied fle*/ } fil) Auxiliary ‘eplace_blanki{file) ” Figure: ALEX Program for Replacing White Space by a Blank 1.2.6 Design of a Lexical Analyzer Generator Q38. Discuss briefly the structure of generated analyzer with a neat diagram. Answer Model Paper, ab) ‘The following figure illustrates the structure of the lexical analyzer generated using lex is as shown below, Figure: Structure of Lexical Analyzer Generated by Lex In the above figure, the program that performs functions similar to lexical analyzer contains an automaton that is either Deterministic Finite Automata (DFA) or Non-detetministric Finite Automata (NFA). The remaining part of the lexieal analyzer ‘contains components that are generated from lex program using lex compiler. These components are shown below, L.A transition table which is used by the automata simulator, 2. ‘The functions that are sent directly via lex compiler to the output 3. The actions which are generated from lex program (input). There actions refers to pieces of code that is called or invoked by the automaton simulator a that particular time, Look for the SIA GROUP Loco {J on the TITLE COVER before you buy

You might also like