Purpose of Translator Different Types of Translators: Compiler

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 32

Introduction to Translator

A translator is a programming language processor that converts a computer program from


one language to another.  It takes a program written in source code and converts it into
machine code.  It discovers and identifies the error during translation.
Purpose of Translator
It translates high-level language program into a machine language program that the central
processing unit (CPU) can understand.  It also detects errors in the program.
Different Types of Translators
There are 3 different types of translators as follows:

Compiler
A compiler is a translator used to convert high-level programming language to low-level
programming language.  It converts the whole program in one session and reports errors
detected after the conversion.  Compiler takes time to do its work as it translates high-level
code to lower-level code all at once and then saves it to memory.
A compiler is processor-dependent and platform-dependent.  But it has been addressed by a
special compiler, a cross-compiler and a source-to-source compiler.  Before choosing a
compiler, user has to identify first the Instruction Set Architecture (ISA), the operating
system (OS) and the programming language that will be used to ensure that it will be
compatible.

Interpreter
Just like a compiler, is a translator used to convert high-level programming language to low-
level programming language.  It converts the program one at a time and reports errors
detected at once, while doing the conversion.  With this, it is easier to detect errors than in a
compiler.  An interpreter is faster than a compiler as it immediately executes the code upon
reading the code.
It is often used as a debugging tool for software development as it can execute a single line
of code at a time.  An interpreter is also more portable than a compiler as it is not processor-
dependent, you can work between hardware architectures.

Assembler
An assembler is is a translator used to translate assembly language to machine language.  It
is like a compiler for the assembly language but interactive like an interpreter.  Assembly
language is difficult to understand as it is a low-level programming language.  An assembler
translates a low-level language, an assembly language to an even lower-level language,
which is the machine code.  The machine code can be directly understood by the CPU.

Examples of Translators
Here are some examples of translators per type:

Translator Examples

Compiler Microsoft Visual Studio


GNU Compiler Collection (GCC)
Common Business Oriented Language (COBOL)

OCaml
Interpreter List Processing (LISP)
Python

Fortran Assembly Program (FAP)


Assembler Macro Assembly Program (MAP)
Symbolic Optimal Assembly Program (SOAP)

Advantages and Disadvantages of Translators


Here are some advantages of the Compiler:

 The whole program is validated so there are no system errors.


 The executable file is enhanced by the compiler, so it runs faster.
 User do not have to run the program on the same machine it was created.
Here are some disadvantages of the Compiler:

 It is slow to execute as you have to finish the whole program.


 It is not easy to debug as errors are shown at the end of the execution.
 Hardware specific, it works on specific machine language and architecture.
Here are some advantages of the Interpreter:

 You discover errors before you complete the program, so you learn from your
mistakes.
 Program can be run before it is completed so you get partial results
immediately.
 You can work on small parts of the program and link them later into a whole
program.
Here are some disadvantages of the Interpreter:

 There’s a possibility of syntax errors on unverified scripts.


 Program is not enhanced and may encounter data errors.
 It may be slow because of the interpretation in every execution.
Here are some advantages of the Assembler:

 The symbolic programming is easier to understand thus time-saving for the


programmer.
 It is easier to fix errors and alter program instructions.
 Efficiency in execution just like machine level language.
Here are some disadvantages of the Assembler:

 It is machine dependent, cannot be used in other architecture.


 A small change in design can invalidate the whole program.
 It is difficult to maintain.
Context of Compiler
The high-level language is converted into binary language in various phases. A compiler is a
program that converts high-level language to assembly language. Similarly, an assembler is a
program that converts the assembly language to machine-level language.
Let us first understand how a program, using C compiler, is executed on a host machine.
 User writes a program in C language (high-level language).
 The C compiler, compiles the program and translates it to assembly program (low-level
language).
 An assembler then translates the assembly program into machine code (object).
 A linker tool is used to link all the parts of the program together for execution (executable
machine code).
 A loader loads all of them into memory and then the program is executed.
Before diving straight into the concepts of compilers, we should understand a few other tools that
work closely with compilers.

Preprocessor
A preprocessor, generally considered as a part of compiler, is a tool that produces input for
compilers. It deals with macro-processing, augmentation, file inclusion, language extension, etc.

Interpreter
An interpreter, like a compiler, translates high-level language into low-level machine language.
The difference lies in the way they read the source code or input. A compiler reads the whole
source code at once, creates tokens, checks semantics, generates intermediate code, executes
the whole program and may involve many passes. In contrast, an interpreter reads a statement
from the input, converts it to an intermediate code, executes it, then takes the next statement in
sequence. If an error occurs, an interpreter stops execution and reports it. whereas a compiler
reads the whole program even if it encounters several errors.

Assembler
An assembler translates assembly language programs into machine code.The output of an
assembler is called an object file, which contains a combination of machine instructions as well as
the data required to place these instructions in memory.

Linker
Linker is a computer program that links and merges various object files together in order to make
an executable file. All these files might have been compiled by separate assemblers. The major
task of a linker is to search and locate referenced module/routines in a program and to determine
the memory location where these codes will be loaded, making the program instruction to have
absolute references.

Loader
Loader is a part of operating system and is responsible for loading executable files into memory
and execute them. It calculates the size of a program (instructions and data) and creates memory
space for it. It initializes various registers to initiate execution.

Cross-compiler
A compiler that runs on platform (A) and is capable of generating executable code for platform (B)
is called a cross-compiler.

Source-to-source Compiler
A compiler that takes the source code of one programming language and translates it into the
source code of another programming language is called a source-to-source compiler.

Compiler Architecture
A compiler can broadly be divided into two phases based on the way they compile.

Analysis Phase
Known as the front-end of the compiler, the analysis phase of the compiler reads the source
program, divides it into core parts and then checks for lexical, grammar and syntax errors.The
analysis phase generates an intermediate representation of the source program and symbol
table, which should be fed to the Synthesis phase as input.

Synthesis Phase
Known as the back-end of the compiler, the synthesis phase generates the target program with
the help of intermediate source code representation and symbol table.
A compiler can have many phases and passes.
 Pass : A pass refers to the traversal of a compiler through the entire program.
 Phase : A phase of a compiler is a distinguishable stage, which takes input from the
previous stage, processes and yields output that can be used as input for the next stage. A
pass can have more than one phase.

Phases of Compiler
The compilation process is a sequence of various phases. Each phase takes input from its
previous stage, has its own representation of source program, and feeds its output to the next
phase of the compiler. Let us understand the phases of a compiler.

Lexical Analysis
The first phase of scanner works as a text scanner. This phase scans the source code as a
stream of characters and converts it into meaningful lexemes. Lexical analyzer represents these
lexemes in the form of tokens as:
<token-name, attribute-value>

Syntax Analysis
The next phase is called the syntax analysis or parsing. It takes the token produced by lexical
analysis as input and generates a parse tree (or syntax tree). In this phase, token arrangements
are checked against the source code grammar, i.e. the parser checks if the expression made by
the tokens is syntactically correct.
Semantic Analysis
Semantic analysis checks whether the parse tree constructed follows the rules of language. For
example, assignment of values is between compatible data types, and adding string to an integer.
Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether
identifiers are declared before use or not etc. The semantic analyzer produces an annotated
syntax tree as an output.

Intermediate Code Generation


After semantic analysis the compiler generates an intermediate code of the source code for the
target machine. It represents a program for some abstract machine. It is in between the high-level
language and the machine language. This intermediate code should be generated in such a way
that it makes it easier to be translated into the target machine code.

Code Optimization
The next phase does code optimization of the intermediate code. Optimization can be assumed
as something that removes unnecessary code lines, and arranges the sequence of statements in
order to speed up the program execution without wasting resources (CPU, memory).

Code Generation
In this phase, the code generator takes the optimized representation of the intermediate code and
maps it to the target machine language. The code generator translates the intermediate code into
a sequence of (generally) re-locatable machine code. Sequence of instructions of machine code
performs the task as the intermediate code would do.

Symbol Table
It is a data-structure maintained throughout all the phases of a compiler. All the identifier's names
along with their types are stored here. The symbol table makes it easier for the compiler to
quickly search the identifier record and retrieve it. The symbol table is also used for scope
management.
Organization of the Compiler
The C compilation system consists of a compiler, an assembler, and a link editor. The cc command invokes each of these
components automatically unless you use command-line options to specify otherwise.

Table A-15 discusses all the options available with cc.

The following figure shows the organization of the C compilation system.

Figure 1-1 Organization of the C Compilation System

The following table summarizes the components of the compilation system


Cousins of Compiler

1) Preprocessor

It converts the HLL (high level language) into pure high level language. It includes all the header
files and also evaluates if any macro is included. It is the optional because if any language which
does not support #include and macro preprocessor is not required.

2) Compiler

It takes pure high level language as a input and convert into assembly code.

3) Assembler

It takes assembly code as an input and converts it into assembly code.

4) Linking and loading

It has four functions

1. Allocation:
It means get the memory portions from operating system and storing the object data.
2. Relocation:
It maps the relative address to the physical address and relocating the object code.
3. Linker:
It combines all the executable object module to pre single executable file.
4. Loader:
It loads the executable file into permanent storage.
What is Lexical Analysis?
Lexical Analysis is the very first phase in the compiler designing. A Lexer takes the modified
source code which is written in the form of sentences . In other words, it helps you to convert a
sequence of characters into a sequence of tokens. The lexical analyzer breaks this syntax into a
series of tokens. It removes any extra space or comment written in the source code.
Programs that perform Lexical Analysis in compiler design are called lexical analyzers or lexers.
A lexer contains tokenizer or scanner. If the lexical analyzer detects that the token is invalid, it
generates an error. The role of Lexical Analyzer in compiler design is to read character streams
from the source code, check for legal tokens, and pass the data to the syntax analyzer when it
demands.

Example
How Pleasant Is The Weather?
See this Lexical Analysis example; Here, we can easily recognize that there are five words How
Pleasant, The, Weather, Is. This is very natural for us as we can recognize the separators, blanks,
and the punctuation symbol.
HowPl easantIs Th ewe ather?
Now, check this example, we can also read this. However, it will take some time because
separators are put in the Odd Places. It is not something which comes to you immediately.

Basic Terminologies
What’s a lexeme?
A lexeme is a sequence of characters that are included in the source program according to the
matching pattern of a token. It is nothing but an instance of a token.

What’s a token?
Tokens in compiler design are the sequence of characters which represents a unit of information
in the source program.

What is Pattern?
A pattern is a description which is used by the token. In the case of a keyword which uses as a
token, the pattern is a sequence of characters.

Lexical Analyzer Architecture: How tokens are recognized


The main task of lexical analysis is to read input characters in the code and produce tokens.

Lexical analyzer scans the entire source code of the program. It identifies each token one by one.
Scanners are usually implemented to produce tokens only when requested by a parser. Here is
how recognition of tokens in compiler design works-
Lexical Analyzer Architecture

1. “Get next token” is a command which is sent from the parser to the lexical analyzer.
2. On receiving this command, the lexical analyzer scans the input until it finds the next
token.
3. It returns the token to Parser.

Lexical Analyzer skips whitespaces and comments while creating these tokens. If any error is
present, then Lexical analyzer will correlate that error with the source file and line number.

Roles of the Lexical analyzer


Lexical analyzer performs below given tasks:

 Helps to identify token into the symbol table


 Removes white spaces and comments from the source program
 Correlates error messages with the source program
 Helps you to expands the macros if it is found in the source program
 Read input characters from the source program

Lexical Errors
A character sequence which is not possible to scan into any valid token is a lexical error.
Important facts about the lexical error:

 Lexical errors are not very common, but it should be managed by a scanner
 Misspelling of identifiers, operators, keyword are considered as lexical errors
 Generally, a lexical error is caused by the appearance of some illegal character, mostly at
the beginning of a token.

Error Recovery in Lexical Analyzer


Here, are a few most common error recovery techniques:

 Removes one character from the remaining input


 In the panic mode, the successive characters are always ignored until we reach a well-
formed token
 By inserting the missing character into the remaining input
 Replace a character with another character
 Transpose two serial characters

Lexical Analyzer vs. Parser


Lexical Analyser Parser
Scan Input program Perform syntax analysis
Identify Tokens Create an abstract representation of the code
Insert tokens into Symbol Table Update symbol table entries
It generates lexical errors It generates a parse tree of the source code

Why separate Lexical and Parser?


 The simplicity of design: It eases the process of lexical analysis and the syntax analysis
by eliminating unwanted tokens
 To improve compiler efficiency: Helps you to improve compiler efficiency
 Specialization: specialized techniques can be applied to improves the lexical analysis
process
 Portability: only the scanner requires to communicate with the outside world
 Higher portability: input-device-specific peculiarities restricted to the lexer

Advantages of Lexical analysis


 Lexical analyzer method is used by programs like compilers which can use the parsed
data from a programmer’s code to create a compiled binary executable code
 It is used by web browsers to format and display a web page with the help of parsed data
from JavsScript, HTML, CSS
 A separate lexical analyzer helps you to construct a specialized and potentially more
efficient processor for the task

Disadvantage of Lexical analysis


 You need to spend significant time reading the source program and partitioning it in the
form of tokens
 Some regular expressions are quite difficult to understand compared to PEG or EBNF
rules
 More effort is needed to develop and debug the lexer and its token descriptions
 Additional runtime overhead is required to generate the lexer tables and construct the
tokens
What is Syntax Analysis?
Syntax Analysis is a second phase of the compiler design process in which the given input
string is checked for the confirmation of rules and structure of the formal grammar. It analyses
the syntactical structure and checks if the given input is in the correct syntax of the
programming language or not.
Syntax Analysis in Compiler Design process comes after the Lexical analysis phase. It is also
known as the Parse Tree or Syntax Tree. The Parse Tree is developed with the help of pre-
defined grammar of the language. The syntax analyser also checks whether a given program
fulfills the rules implied by a context-free grammar. If it satisfies, the parser then creates the
parse tree of that source program. Otherwise, it will display error messages.

Syntax Analyser Process

Why do you need Syntax Analyser?


 Check if the code is valid grammatically
 The syntactical analyser helps you to apply rules to the code
 Helps you to make sure that each opening brace has a corresponding closing balance
 Each declaration has a type and that the type must be exists

Important Syntax Analyser Terminology


Important terminologies used in syntax analysis process:

 Sentence: A sentence is a group of character over some alphabet.

 Lexeme: A lexeme is the lowest level syntactic unit of a language (e.g., total, start).

 Token: A token is just a category of lexemes.

 Keywords and reserved words – It is an identifier which is used as a fixed part of the
syntax of a statement. It is a reserved word which you can’t use as a variable name or
identifier.
 Noise words – Noise words are optional which are inserted in a statement to enhance the
readability of the sentence.

 Comments – It is a very important part of the documentation. It mostly display by, /* */,
or//Blank (spaces)

 Delimiters – It is a syntactic element which marks the start or end of some syntactic unit.
Like a statement or expression, “begin”…”end”, or {}.

 Character set – ASCII, Unicode

 Identifiers – It is a restrictions on the length which helps you to reduce the readability of
the sentence.

 Operator symbols – + and – performs two basic arithmetic operations.


 Syntactic elements of the Language

Why do we need Parsing?

A parse also checks that the input string is well-formed, and if not, reject it.
Following are important tasks perform by the parser in compiler design:

 Helps you to detect all types of Syntax errors


 Find the position at which error has occurred
 Clear & accurate description of the error.
 Recovery from an error to continue and find further errors in the code.
 Should not affect compilation of “correct” programs.
 The parse must reject invalid texts by reporting syntax errors

Parsing Techniques
Parsing techniques are divided into two different groups:

 Top-Down Parsing,
 Bottom-Up Parsing
Top-Down Parsing:
In the top-down parsing construction of the parse tree starts at the root and then proceeds
towards the leaves.

Two types of Top-down parsing are:

1. Predictive Parsing:

Predictive parse can predict which production should be used to replace the specific input string.
The predictive parser uses look-ahead point, which points towards next input symbols.
Backtracking is not an issue with this parsing technique. It is known as LL(1) Parser

2. Recursive Descent Parsing:

This parsing technique recursively parses the input to make a prase tree. It consists of several
small functions, one for each nonterminal in the grammar.

Bottom-Up Parsing:
In the bottom up parsing in compiler design, the construction of the parse tree starts with the
leave, and then it processes towards its root. It is also called as shift-reduce parsing. This type of
parsing in compiler design is created with the help of using some software tools.

Error – Recovery Methods


Common Errors that occur in Parsing in System Software

 Lexical: Name of an incorrectly typed identifier


 Syntactical: unbalanced parenthesis or a missing semicolon
 Semantical: incompatible value assignment
 Logical: Infinite loop and not reachable code

A parser should able to detect and report any error found in the program. So, whenever an error
occurred the parser. It should be able to handle it and carry on parsing the remaining input. A
program can have following types of errors at various compilation process stages. There are five
common error-recovery methods which can be implemented in the parser

Statement mode recovery


 In the case when the parser encounters an error, it helps you to take corrective steps. This
allows rest of inputs and states to parse ahead.
 For example, adding a missing semicolon is comes in statement mode recover method.
However, parse designer need to be careful while making these changes as one wrong
correction may lead to an infinite loop.

Panic-Mode recovery
 In the case when the parser encounters an error, this mode ignores the rest of the
statement and not process input from erroneous input to delimiter, like a semi-colon. This
is a simple error recovery method.
 In this type of recovery method, the parser rejects input symbols one by one until a single
designated group of synchronizing tokens is found. The synchronizing tokens generally
using delimiters like or.

Phrase-Level Recovery:
 Compiler corrects the program by inserting or deleting tokens. This allows it to proceed
to parse from where it was. It performs correction on the remaining input. It can replace a
prefix of the remaining input with some string this helps the parser to continue the
process.

Error Productions
 Error production recovery expands the grammar for the language which generates the
erroneous constructs. The parser then performs error diagnostic about that construct.

Global Correction:
 The compiler should make less number of changes as possible while processing an
incorrect input string. Given incorrect input string a and grammar c, algorithms will
search for a parse tree for a related string b. Like some insertions, deletions, and
modification made of tokens needed to transform an into b is as little as possible.

Grammar:
A grammar is a set of structural rules which describe a language. Grammars assign structure to
any sentence. This term also refers to the study of these rules, and this file includes morphology,
phonology, and syntax. It is capable of describing many, of the syntax of programming
languages.

Rules of Form Grammar

 The non-terminal symbol should appear to the left of the at least one production
 The goal symbol should never be displayed to the right of the::= of any production
 A rule is recursive if LHS appears in its RHS

Notational Conventions
Notational conventions symbol may be indicated by enclosing the element in square brackets. It
is an arbitrary sequence of instances of the element which can be indicated by enclosing the
element in braces followed by an asterisk symbol, { … }*.

It is a choice of the alternative which may use the symbol within the single rule. It may be
enclosed by parenthesis ([,] ) when needed.
Two types of Notational conventions area Terminal and Non-terminals

1.Terminals:

 Lower-case letters in the alphabet such as a, b, c,


 Operator symbols such as +,-, *, etc.
 Punctuation symbols such as parentheses, hash, comma
 0, 1, …, 9 digits
 Boldface strings like id or if, anything which represents a single terminal symbol

2.Nonterminals:

 Upper-case letters such as A, B, C


 Lower-case italic names: the expression or some

Context Free Grammar


A CFG is a left-recursive grammar that has at least one production of the type. The rules in a
context-free grammar are mainly recursive. A syntax analyser checks that specific program
satisfies all the rules of Context-free grammar or not. If it does meet, these rules syntax
analysers may create a parse tree for that programme.
expression -> expression -+ term
expression -> expression – term
expression-> term
term -> term * factor
term -> expression/ factor
term -> factor factor
factor -> ( expression )
factor -> id
Error Handling

Error handling is concerned with failures due to many causes: errors in the compiler or its environment
(hardware, operating system), design errors in the program being compiled, an incomplete understanding of
the source language, transcription errors, incorrect data, etc. The tasks of the error handling process are to
detect each error, report it to the user, and possibly make some repair to allow processing to continue. It
cannot generally determine the cause of the error, but can only diagnose the visible symptoms. Similarly,
any repair cannot be considered a correction (in the sense that it carries out the user’s intent); it merely
neutralizes the symptom so that processing may continue.

Keywords
Error Report Error Recovery Input String Source Program Correct Program 
These keywords were added by machine and not by the authors. This process is experimental
and the keywords may be updated as the learning algorithm improves.
Type Systems
SEMANTIC CHECKING. A compiler must perform many semantic checks on a source program.

 Many of these are straightforward and are usually done in conjunction with syntax-
analysis.
 Checks done during compilation are called static checks to distinguish from the dynamic
checks done during the execution of the target program.
 Static checks include:

Type checks.
The compiler checks that names and values are used in accordance with type rules of
the language.
Type conversions.
Detection of implicit type conversions.
Dereferencing checks.
The compiler checks that dereferencing is applied only to a pointer.
Indexing checks.
The compiler checks that indexing is applied only to an array.
Function call checks.
The compiler checks that a function (or procedure) is applied to the correct number and
type of arguments.
Uniqueness checks.
In many cases an identifier must be declared exactly once.
Flow-of-control checks.
The compiler checks that if a statement causes the flow of control to leave a construct,
then there is a place to transfer this flow. For instance when using break in C.
Symbol Table is an important data structure created and maintained by the compiler in
order to keep track of semantics of variables i.e. it stores information about the scope
and binding information about names, information about instances of various entities
such as variable and function names, classes, objects, etc. 
 It is built-in lexical and syntax analysis phases.
 The information is collected by the analysis phases of the compiler and is used
by the synthesis phases of the compiler to generate code.
 It is used by the compiler to achieve compile-time efficiency.
 It is used by various phases of the compiler as follows:- 
1. Lexical Analysis: Creates new table entries in the table, for example
like entries about tokens.
2. Syntax Analysis: Adds information regarding attribute type, scope,
dimension, line of reference, use, etc in the table.
3. Semantic Analysis: Uses available information in the table to check
for semantics i.e. to verify that expressions and assignments are
semantically correct(type checking) and update it accordingly.
4. Intermediate Code generation: Refers symbol table for knowing how
much and what type of run-time is allocated and table helps in adding
temporary variable information.
5. Code Optimization: Uses information present in the symbol table for
machine-dependent optimization.
6. Target Code generation: Generates code by using address
information of identifier present in the table.
Symbol Table entries – Each entry in the symbol table is associated with attributes that
support the compiler in different phases. 
Items stored in Symbol table: 
 Variable names and constants
 Procedure and function names
 Literal constants and strings
 Compiler generated temporaries
 Labels in source languages
Information used by the compiler from Symbol table:  
 Data type and name
 Declaring procedures
 Offset in storage
 If structure or record then, a pointer to structure table.
 For parameters, whether parameter passing by value or by reference
 Number and type of arguments passed to function
 Base Address
Operations of Symbol table – The basic operations defined on a symbol table include: 
 
INTERMEDIATE CODE
 Intermediate code is the interface between front-end and back-end in a compiler
 Ideally the details of source language are confined to the front-end and the details
of target machines to the back-end
 A source code can directly be translated into its target machine code
Why at all we need to translate the source code into an intermediate code which is
then translated to its target code?
 Why Intermediate Code?
 If a compiler translates the source language to its target machine language without
having the option for generating intermediate code, then for each new machine, a
full native compiler is required
 Intermediate code eliminates the need of a new full compiler for every unique
machine by keeping the analysis portion same for all the compilers
 The second part of compiler, synthesis, is changed according to the target machine
 It becomes easier to apply the source code modifications to improve code
performance by applying code optimization techniques on the intermediate code .
Why Intermediate Code?
 While generating machine code directly from source code is possible, it entails
problems
 With m languages and n target machines, we need to write m front ends, m x n
optimizers, and m x n code generators
 The code optimizer which is one of the largest and very- difficult-to-write
components of a compiler, cannot be reused
 By converting source code to an intermediate code, a machine-independent code
optimizer may be written 
 This means just m front ends, n code generators and 1 optimizer.
Intermediate Representation

 Intermediate codes can be represented in a variety of ways and they have their own
benefits
 High Level IR - High-level intermediate code representation is very close to the
source language itself. They can be easily generated from the source code and we
can easily apply code modifications to enhance performance. But for target machine
optimization, it is less preferred
 Low Level IR -This one is close to the target machine, which makes it suitable for
register and memory allocation, instruction set selection, etc. It is good for machine-
dependent optimizations
 Intermediate code can be either language specific (e.g., Byte Code for Java) or
language independent (three-address code).
Intermediate Code Generation
 Intermediate code must be easy to produce and easy to translate to machine code
 A sort of universal assembly language  Should not contain any machine-specific
parameters (registers, addresses, etc.)
 Intermediate code is represented in three-address space but the type of
intermediate code implementation is based on the compiler designer
 Quadruples, triples, indirect triples are the classical forms used for machine-
independent optimizations and machine code generation
 Static Single Assignment form (SSA) is a recent form and enables more effective
optimizations
Quadruples
 Has four fields: op, arg1, arg2 and result
 Temporaries are used
Triples
 Temporaries are not used and instead references to instructions are made
Indirect triples
 In addition to triples we use a list of pointers to triples
Quadruples
 In quadruples representation, there are four fields for each instruction: op, arg1,
arg2, result
 Binary ops have the obvious representation
 Unary ops do not use arg2
 Operators like Operators like param does not use either arg does not use either arg2
result
 Jumps put the target label into the result
 The quadruples implement the three address space for the expression a = b * (-c) + b
* (-c)

 Triples
 Triples has only three fields for each instruction: op, arg1, arg2
 The result of an operation x op y is referred by its position
 Triples are equivlant to signature of nodes in DAG or syntax tree
 Triples and DAG are equivalent representations only for expressions
 Ternary operation like X[i] = Y requires two entries in the triple structure;
similarly for Y = X[i]

Indirect Triples
 These consist of a listing of pointers to triples; rather than a listing of the triples themselves
 The triples consists of three fields: op, arg1, arg2
 The arg1 or arg2 could be pointers
All Example
Run time enviroment
A translation needs to relate the static source text of a program to the dynamic actions
that must occur at runtime to implement the program. The program consists of names for
procedures, identifiers etc., that require mapping with the actual memory location at
runtime.
Runtime environment is a state of the target machine, which may include software
libraries, environment variables, etc., to provide services to the processes running in the
system.
Attention reader! Don’t stop learning now.  Practice GATE exam well before the actual
exam with the subject-wise and overall quizzes available in GATE Test Series Course .
Learn all GATE CS concepts with Free Live Classes on our youtube channel.
SOURCE LANGUAGE ISSUES

Activation Tree
A program consist of procedures, a procedure definition is a declaration that, in its
simplest form, associates an identifier (procedure name) with a statement (body of the
procedure).  Each execution of procedure is referred to as an activation of the procedure.
Lifetime of an activation is the sequence of steps present in the execution of the
procedure. If ‘a’ and ‘b’ be two procedures then their activations will be non-overlapping
(when one is called after other) or nested (nested procedures). A procedure is recursive if
a new activation begins before an earlier activation of the same procedure has ended. An
activation tree shows the way control enters and leaves activations.
Properties of activation trees are :-
 Each node represents an activation of a procedure.
 The root shows the activation of the main function.
 The node for procedure ‘x’ is the parent of node for procedure ‘y’ if and only if
the control flows from procedure x to procedure y.
Example – Consider the following program of Quicksort
main() {

Int n;
readarray();
quicksort(1,n);
}

quicksort(int m, int n) {

Int i= partition(m,n);
quicksort(m,i-1);
quicksort(i+1,n);
}
The activation tree for this program will be:
Function of preprocessor
A preprocessor produce input to compilers. They may perform the following functions.
1. Macro processing:  A preprocessor may allow a user to define macros that are short hands for longer
constructs.
2.  File inclusion: A preprocessor may include header files into the program text.
3. Rational preprocessor: these preprocessors augment older languages with more modern flow-of-control and
data structuring facilities.
4. Language Extensions: These preprocessor attempt to add capabilities to the language by certain amounts to
build-in macro.
Transition Table
The transition table is basically a tabular representation of the transition function. It takes two
arguments (a state and a symbol) and returns a state (the "next state").

A transition table is represented by the following things:

o Columns correspond to input symbols.


o Rows correspond to states.
o Entries correspond to the next state.
o The start state is denoted by an arrow with no source.
o The accept state is denoted by a star.

Example 1:

Solution:

Transition table of given DFA is as follows:

Difference between JDK, JRE, and JVM

Present State Next state for Input 0 Next State of Input 1

→q0 q1 q2

q1 q0 q2

*q2 q2 q2

Explanation:

o In the above table, the first column indicates all the current states. Under column 0 and 1,
the next states are shown.
o The first row of the transition table can be read as, when the current state is q0, on input 0
the next state will be q1 and on input 1 the next state will be q2.
o In the second row, when the current state is q1, on input 0, the next state will be q0, and on
1 input the next state will be q2.
o In the third row, when the current state is q2 on input 0, the next state will be q2, and on 1
input the next state will be q2.
o The arrow marked to q0 indicates that it is a start state and circle marked to q2 indicates
that it is a final state.

Example 2:

Solution:

Transition table of given NFA is as follows:

Present State Next state for Input 0 Next State of Input 1

→q0 q0 q1

q1 q1, q2 q2

q2 q1 q3

*q3 q2 q2

Explanation:

o The first row of the transition table can be read as, when the current state is q0, on input 0
the next state will be q0 and on input 1 the next state will be q1.
o In the second row, when the current state is q1, on input 0 the next state will be either q1 or
q2, and on 1 input the next state will be q2.
o In the third row, when the current state is q2 on input 0, the next state will be q1, and on 1
input the next state will be q3.
o In the fourth row, when the current state is q3 on input 0, the next state will be q2, and on 1
input the next state will be q2.
TYPE CHECKING
Introduction:

The compiler must perform static checking (checking done at compiler time).This ensures that certain types
of programming errors will be detected and reported.

A compiler must check that the source program follows both the syntactic and semantic conversions of the
source language. This checking is called static checking example of static checks include.

Some example of static checks is:

Type checks: A compiler should report an error if an operator is applied to an incompatible operand.

Flow-of-control checks: Statements that cause flow of control to leave a construct must have some place
to which to transfer flow of control. For example, branching to non-existent labels.

Uniqueness checks: Objects should be defined only once. This is true in many languages.

Name-related checks: Sometimes, the same name must appear two or more times. For example, in Ada
the name of a block must appear both at the beginning of the block and at the end.

You might also like