Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 32

1.

Describe the following with respect to Language Specification:

A) Fundamentals of Language Processing

Language processing = Analysis of source program + Synthesis of Target program.

This definition motivates a generic model of language processing activities. We refer to the
collection of language processor components engaged in analyzing a source program as the
analysis phase of the language processor. Components engaged in synthesizing a target
program.

A specification of the source language forms the basis of source program analysis. This
specification consists of three components.

1. Lexical rule which govern the formation of valid lexical units in the source language.

2. Syntax rule which govern the formation of valid statements in the source language.

3. Semantic rule which associate meaning with valid statements of the language.

Thus analysis of a source statement consist of lexical, syntax and semantic analysis

Lexical analysis(Scanning)

it identifies the lexical unit in a source statement. It then classifies the units into different lexical
classes e.g.: ids, constants etc. and enter them into different tables. Lexical analysis build a
descriptor called a token, for each lexical unit.

Syntax analysis (Parsing)

It process the strings of token build by lexical analysis by determine the statement class, e.g.
Assignment statement, if statement etc. It then builds an IC which represents the structure of
the statement. The IC is passed to semantic analysis to determine the meaning of the statement

Semantic analysis

Semantic analysis of declaration statements differs from the semantic analysis of imperative
statements. The former results in addition of information to the symbol table eg. type. length etc.
The latter identifies the sequence of actions necessary to implement the meaning of a source
statement. In both cases the structure of a source statement guides the application of semantic
rules.
B) Language Processor development tools

There are two LPDTs widely used in practice. These are lexical analyzer generator LEX and the
parser generator YACC. The input to this tool is a specification of the lexical and syntactic
constructs of L and the semantic actions to be performed on recognizing the constructs.

Compiler or interpreter for a programming language is often decomposed into two parts.

1) Read the source program and discover its structure.

2) Process the structure, e.g.to generates the target program.

Lex and Yacc can generate program fragments that solve the first task.

The task of discovering the source structure again is decomposed into sub task

1) Split the source file into token(Lex)

2) Find the hierarchical structure of the program(Yacc)

Lex - A lexical Analyzer Generator

Lex helps write programs whose control flow is directed by instances of regular expressions in
the input stream. It is well suited for editor-script type transformations and for segmenting input
in preparation for a parsing routine.

Lex source is a table of regular expressions and corresponding program fragments. The table is
translated to a program which reads an input stream, copying it to an output stream and
partitioning the input into strings which match the given expressions. As each such string is
recognized the corresponding program fragment is executed. The recognition of the
expressions is performed by a deterministic finite automaton generated by Lex. The program
fragments written by the user are executed in the order in which the corresponding regular
expressions occur in the input stream.

The lexical analysis programs written with Lex accept ambiguous specifications and choose the
longest match possible at each input point. If necessary, substantial look ahead is performed on
the input, but the input stream will be backed up to the end of the current partition, so that the
user has general freedom to manipulate it.
Yacc: Yet Another Compiler-Compiler

Computer program input generally has some structure; in fact, every computer program that
does input can be thought of as defining an input "language" which it accepts. An input
language may be as complex as a programming language, or as simple as a sequence of
numbers. Unfortunately, usual input facilities are limited, difficult to use, and often are lax about
checking their inputs for validity.

Yacc provides a general tool for describing the input to a computer program. The Yacc user
specifies the structures of his input, together with code to be invoked as each such structure is
recognized. Yacc turns such a specification into a subroutine that handles the input process;
frequently, it is convenient and appropriate to have most of the flow of control in the user's
application handled by this subroutine.

2. Define the following:

A) Addressing modes for CISC (Motorola and Intel)

The 68000 addressing (Motorala) modes

1) Register to Register

2) Register to memory

3) Memory to register

4) Memory to Memory

68000 Supports a wide variety of addressing mode

1) Immediate mode

This is the simplest form of addressing. Here, the operand is given in the instruction itself. This
mode is used to define constant or set initial values of variables. The advantage of this mode is
that no memory reference other than instruction fetch is required to obtain operand. The
disadvantage is that the size of the number is limited to the size of the address field, which most

instruction sets is small compared to word length.


2) Absolute address

The address (in either the short 16-bit form or long 32-bit form) of the operand immediately
follows the instruction

3) Program counter relative with displacement - A displacement value is added to the program
counter to calculate the operand’s address. The displacement can be positive or negative.

4) Program counter relative with index and displacement - the instruction contains both the
identity of an index register and a trailing displacement value. The content of the index register,
the displacement value and the program counter are added together to get the final address.

5) Register direct - The operand is contained in an address or data register

6) Address register indirect - An address register contains the address of the operand.

7) Address register indirect with predecrement of postdecrement - An address register contains


the address of the operand in memory. With the predecrement option set, a predetermined
value is subtracted from the register before the address is used. With postincrement option set,
a predetermined value is added to the register after the operation completes

8) Address register indirect with displacement - A displacement value is added to the registers
contents to calculate the operands address. The displacement can be positive or negative
9) Address register relative with index and displacement - The instruction contains both the
identity of an index register and a trailing displacement value. The contents of the index register,
the displacement value and the specified address register are added together to get the final
address

B) Addressing modes for RISC Machines.

Simple addressing modes(3)

Immediate Mode : operand is part of the instruction

E.g. mov ah,09h

mov dx, offset Prompt

Regular Addressing : operand is contained in register


Example: add ax,bx

Direct : Operand field of instruction contains effective address

3. Explain the design of single pass and mutli pass assemblers.

LC processing and construction of the symbol table proceed as in two pass translation. The
problem of forward references is tackled using a process called backpatch-ing. The operand
field of an instruction containing a forward references is left blank initially. The address of the
forward referenced symbol is put into this field when its definition is encountered

Design of a Two Pass Assembler

Tasks performed by the passes of a two pass assembler are as follows

Pass !

1. Separate the symbol, mnemonic opcode and operand fields.

2) Build the symbol table.

3) Perform LC Processing

4) Construct intermediate representation

Pass II Synthesize the target program

Pass I performs analysis of the source program and synthesis of a the intermediate
representation while pass II processes the intermediate representation to synthesize the target
program. The design details of assembler passes are discussed after introducing advanced
assembler directive and their influence on LC processing

4. Explain the following with respect to Macros and Macro Processors

A) Macro Definition and Expansion

Ans.

Definition: macro
A macro name is an abbreviation, which stands for some related lines of code. Macros are useful for the
following purposes:

· To simplify and reduce the amount of repetitive coding

· To reduce errors caused by repetitive coding

· To make an assembly program more readable.

A macro consists of name, set of formal parameters and body of code. The use of macro name with set
of actual parameters is replaced by some code generated by its body. This is called macro expansion.

Macros allow a programmer to define pseudo operations, typically operations that are generally
desirable, are not implemented as part of the processor instruction, and can be implemented as a
sequence of instructions. Each use of a macro generates new program instructions, the macro has the
effect of automating writing of the program.

Macros can be defined used in many programming languages, like C, C++ etc. Example macro in C
programming. Macros are commonly used in C to define small snippets of code. If the macro has
parameters, they are substituted into the macro body during expansion; thus, a C macro can mimic a C
function. The usual reason for doing this is to avoid the overhead of a function call in simple cases,
where the code is lightweight enough that function call overhead has a significant impact on
performance.

For instance, #define max (a, b) a>b? A: b defines the macromax, taking two arguments a and b. This
macro may be called like any C function, using identical syntax. Therefore, after preprocessing

z = max(x, y); Becomes z = x>y? X:y; While this use of macros is very important for C, for instance to
define type-safe generic data-types or debugging tools, it is also slow, rather inefficient, and may lead to
a number of pitfalls.

C macros are capable of mimicking functions, creating new syntax within some limitations, as well as
expanding into arbitrary text (although the C compiler will require that text to be valid C source code, or
else comments), but they have some limitations as a programming construct. Macros which mimic
functions, for instance, can be called like real functions, but a macro cannot be passed to another
function using a function pointer, since the macro itself has no address.

In programming languages, such as C or assembly language, a name that defines a set of commands that
are substituted for the macro name wherever the name appears in a program (a process called macro
expansion) when the program is compiled or assembled. Macros are similar to functions in that they can
take arguments and in that they are calls to lengthier sets of instructions. Unlike functions, macros are
replaced by the actual commands they represent when the program is prepared for execution. function
instructions are copied into a program only once.

Macro Expansion
A macro call leads to macro expansion. During macro expansion, the macro statement is replaced by
sequence of assembly statements

B) Conditional Macro Expansion

Means that some sections of the program may be optional, either included or not in the final program,
dependent upon specified conditions. A reasonable use of conditional assembly would be to combine
two versions of a program, one that prints debugging information during test executions for the
developer, another version for production operation that displays only results of interest for the average
user. A program fragment that assembles the instructions to print the Ax register only ifDebug is true is
given below. Note that true is any non- zero value.

Here is a conditional statements in C programming, the following statements tests the expression
"BUFSIZE == 1020", where "BUFSIZE" must be a macro.

#if BUFSIZE == 1020

printf ("Large buffers!n");

#endif /* BUFSIZE is large */

C) Macros Parameters

Macros may have any number of parameters, as long as they fit on one line. Parameter names are local
symbols, which are known within the macro only. Outside the macro they have no meaning!

Syntax: <macro name> MACRO <parameter 1>…….<parameter n>

<body line 1>

<body line 2>

..<body line m>

ENDM

Valid macro arguments are

1. arbitrary sequences of printable characters, not containing blanks, tabs, commas, or semicolons

2. quoted strings (in single or double quotes)

3. Single printable characters, preceded by ‘!’ as an escape character


4. Character sequences, enclosed in literal brackets < … >, which may be arbitrary sequences of valid
macro blanks, commas and semicolons

5. Arbitrary sequences of valid macro arguments

6. Expressions preceded by a ‘%’ character

During macro expansion, these actual arguments replace the symbols of the corresponding formal
parameters, wherever they are recognized in the macro body. The first argument replaces the symbol of
the first parameter, the second argument replaces the symbol of the second parameter, and so forth.
This is called substitution.

5. Describe the process of Bootstrapping in the context of Linkers.

Ans.

In computing, bootstrapping refers to a process where a simple system activates another more
complicated system that serves the same purpose. It is a solution to the Chicken-and-egg
problem of starting a certain system without the system already functioning. The term is most
often applied to the process of starting up a computer, in which a mechanism is needed to
execute the software program that is responsible for executing software programs (the
operating system)

Bootstrap loading The discussions of loading up to this point have all presumed that there’s
already an operating system or at least a program loader resident in the computer to load the
program of interest. The chain of programs being loaded by other programs has to start
somewhere, so the obvious question is how is the first program loaded into the computer?

In modern computers, the first program the computer runs after a hardware reset invariably is
stored in a ROM known as bootstrap ROM. as in "pulling one’s self up by the bootstraps." When
the CPU is powered on or reset, it sets its registers to a known state. On x86 systems, for
example, the reset sequence jumps to the address 16 bytes below the top of the system’s
address space. The bootstrap ROM occupies the top 64K of the address space and ROM code
then starts up the computer. On IBM-compatible x86 systems, the boot ROM code reads the
first block of the floppy disk into memory, or if that fails the first block of the first hard disk, into
memory location zero and jumps to location zero. The program in block zero in turn loads a
slightly larger operating system boot program from a known place on the disk into memory, and
jumps to that program which in turn loads in the operating system and starts it. (There can be
even more steps, e.g., a boot manager that decides from which disk partition to read the
operating system boot program, but the sequence of increasingly capable loaders remains.)

Why not just load the operating system directly? Because you can’t fit an operating system
loader into 512 bytes. The first level loader typically is only able to load a single-segment
program from a file with a fixed name in the top-level directory of the boot disk. The operating
system loader contains more sophisticated code that can read and interpret a configuration file,
uncompress a compressed operating system executable, address large amounts of memory (on
an x86 the loader usually runs in real mode which means that it’s tricky to address more than
1MB of memory.) The full operating system can turn on the virtual memory system, loads the
drivers it needs, and then proceed to run user-level programs.

Many Unix systems use a similar bootstrap process to get user- mode programs running. The
kernel creates a process, then stuffs a tiny little program, only a few dozen bytes long, into that
process. The tiny program executes a system call that runs /etc/init, the user mode initialization
program that in turn runs configuration files and starts the daemons and login programs that a
running system needs.

None of this matters much to the application level programmer, but it becomes more interesting
if you want to write programs that run on the bare hardware of the machine, since then you
need to arrange to intercept the bootstrap sequence somewhere and run your program rather
than the usual operating system. Some systems make this quite easy (just stick the name of
your program in AUTOEXEC.BAT and reboot Windows 95, for example), others make it nearly
impossible. It also presents opportunities for customized systems. For example, a single-
application system could be built over a Unix kernel by naming the application /etc/init.

Software Bootstraping & Compiler Bootstraping:

Bootstrapping can also refer to the development of successively more complex, faster
programming environments. The simplest environment will be, perhaps, a very basic text editor
(e.g. ed) and an assembler program. Using these tools, one can write a more complex text
editor, and a simple compiler for a higher-level language and so on, until one can have a
graphical IDE and an extremely high-level programming language.

Compiler Bootstraping:

In compiler design, a bootstrap or bootstrapping compiler is a compiler that is written in the


target language, or a subset of the language, that it compiles. Examples include gcc, GHC,
OCaml, BASIC, PL/I and more recently the Mono C# compiler.

6. Describe the procedure for design of a Linker.

Ans.

Design of a linker

Relocation and linking requirements in segmented addressing

The relocation requirements of a program are influenced by the addressing structure of the computer
system on which it is to execute. Use of the segmented addressing structure reduces the relocation
requirements of program.

Implementation Examples: A Linker for MS-DOS

Example: Consider the program of written in the assembly language of intel 8088. The ASSUME
statement declares the segment registers CS and DS to the available for memory addressing. Hence all
memory addressing is performed by using suitable displacements from their contents. Translation time
address o A is 0196. In statement 16, a reference to A is assembled as a displacement of 196 from the
contents of the CS register. This avoids the use of an absolute address, hence the instruction is not
address sensitive. Now no relocation is needed if segment SAMPLE is to be loaded with address 2000 by
a calling program (or by the OS). The effective operand address would be calculated as <CS>+0196,
which is the correct address 2196. A similar situation exists with the reference to B in statement 17. The
reference to B is assembled as a displacement of 0002 from the contents of the DS register. Since the DS
register would be loaded with the execution time address of DATA_HERE, the reference to B would be
automatically relocated to the correct address.

Though use of segment register reduces the relocation requirements, it does not completely eliminate
the need for relocation. Consider statement 14 .

MOV AX, DATA_HERE

Which loads the segment base of DATA_HERE into the AX register preparatory to its transfer into the DS
register. Since the assembler knows DATA_HERE to be a segment, it makes provision to load the higher
order 16 bits of the address of DATA_HERE into the AX register. However it does not know the link time
address of DATA_HERE, hence it assembles the MOV instruction in the immediate operand format and
puts zeroes in the operand field. It also makes an entry for this instruction in RELOCTAB so that the
linker would put the appropriate address in the operand field. Inter-segment calls and jumps are
handled in a similar way.

Relocation is somewhat more involved in the case of intra-segment jumps assembled in the FAR format.
For example, consider the following program :

FAR_LAB EQU THIS FAR ; FAR_LAB is a FAR label

JMP FAR_LAB ; A FAR jump

Here the displacement and the segment base of FAR_LAB are to be put in the JMP instruction itself. The
assembler puts the displacement of FAR_LAB in the first two operand bytes of the instruction , and
makes a RELOCTAB entry for the third and fourth operand bytes which are to hold the segment base
address. A segment like

ADDR_A DW OFFSET A

(which is an ‘address constant’) does not need any relocation since the assemble can itself put the
required offset in the bytes. In summary, the only RELOCATAB entries that must exist for a program
using segmented memory addressing are for the bytes that contain a segment base address.

For linking, however both segment base address and offset of the external symbol must be computed by
the linker. Hence there is no reduction in the linking requirements.

Assignment Set II

1. Describe the basic functions of a loader

Ans.

Loader - Basic Loader Functions


To execute an object program, we needs

Loading and Allocation, which allocates memory location and brings the object program into memory
for execution

Relocation, which modifies the object program so that it can be loaded at an address different from the
location originally specified

Linking, which combines two or more separate object programs and supplies the information needed to
allow references between them

Assemble-and-go Loader

Characteristic: The object code is stored in memory after assembly, Single JUMP instruction.

Advantage - Simple, developing environment.

Disadvantage - Whenever the assembly program is to be executed, it has to be assembled again .


Programs have to be coded in the same language

Design of an Absolute Loader

Absolute Program

Advantage: Simple and efficient

Disadvantage: The need for programmer to specify the actual address, Difficult to use subroutine
libraries.

Algorithm for an absolute loader


Object Code Representation

Fig. 3.1(a) (Character


Representation)
Each byte of assembled code is given
using its hexadecimal representation in
character form
A Easy to read
Simple Bootstrap Loader by human beings
It is used in this book
Bootstrap Loader

When a computer is first tuned on or restarted, a special type of absolute loader, called bootstrap

In general (Binary Representation)


loader is executed .This bootstrap loads the first program to be run by the computer -- usually an
operating system

Each byte of object code is stored as a


Example (SIC bootstrap loader)
single byte
The bootstrap itself begins at address 0

It Most machine
loads the OS starting address 0x80 store object programs in

a binary form
No header record or control information, the object code is consecutive bytes of memory

Begin

X=0x80 (the address of the next memory location to be loaded)

Loop

AGETC (and convert it from the ASCII character code to the value of the hexadecimal digit)

save the value in the high-order 4 bits of S

A <- GETC

combine the value to form one byte A <- (A+S)


store the value (in A) to the address in register X

X <-X+1

End

GETC A<-read one character

if A=0x04 then jump to 0x80

if A<48 then GETC

A <- A-48 (0x30)

if A<10 then return

A <- A-7

Return

2. Write about Deterministic and Non-Deterministic Finite Automata with suitable


numerical examples.

Ans.

Nondeterministic Finite Automata

In the theory of computation, a nondeterministic finite state machine or nondeterministic finite


automaton (NFA) is a finite state machine where for each pair of state and input symbol there
may be several possible next states. This distinguishes it from the deterministic finite automaton
(DFA), where the next possible state is uniquely determined. Although the DFA and NFA have
distinct definitions, it may be shown in the formal theory that they are equivalent, in that, for any
given NFA, one may construct an equivalent DFA, and vice-versa: this is the powerset
construction. Both types of automata recognize only regular languages. Non-deterministic finite
state machines are sometimes studied by the name subshifts of finite type. Non-deterministic
finite state machines are generalized by probabilistic automata, which assign a probability to
each state transition.
Nondeterministic finite automata were introduced in 1959 by Michael O. Rabin and Dana Scott,
[1] who also showed their equivalence to deterministic finite automata.

Intuitive introduction

An NFA, similar to a DFA, consumes a string of input symbols. For each input symbol it
transitions to a new state until all input symbols have been consumed.

Unlike a DFA, it is non-deterministic in that, for any input symbol, its next state may be any one
of several possible states. Thus, in the formal definition, the next state is an element of the power
set of states. This element, itself a set, represents some subset of all possible states to be
considered at once.

An extension of the NFA is the NFA-lambda (also known as NFA-epsilon or the NFA with
epsilon moves), which allows a transformation to a new state without consuming any input
symbols. For example, if it is in state 1, with the next input symbol an a, it can move to state 2
without consuming any input symbols, and thus there is an ambiguity: is the system in state 1, or
state 2, before consuming the letter a? Because of this ambiguity, it is more convenient to talk of
the set of possible states the system may be in. Thus, before consuming letter a, the NFA-epsilon
may be in any one of the states out of the set {1,2}. Equivalently, one may imagine that the NFA
is in state 1 and 2 'at the same time': and this gives an informal hint of the powerset construction:
the DFA equivalent to an NFA is defined as the one that is in the state q={1,2}. Transformations
to new states without consuming an input symbol are called lambda transitions or epsilon
transitions. They are usually labeled with the Greek letter λ or ε.

The notion of accepting an input is similar to that for the DFA. When the last input symbol is
consumed, the NFA accepts if and only if there is some set of transitions that will take it to an
accepting state. Equivalently, it rejects, if, no matter what transitions are applied, it would not
end in an accepting state.

Formal definition

Two similar types of NFAs are commonly defined: the NFA and the NFA with ε-moves. The
ordinary is defined as a 5-tuple, (Q, Σ, T, q0, F), consisting of

a finite set of states Q

a finite set of input symbols Σ

a transition function T : Q × Σ → P(Q).

an initial (or start) state q0 ∈ Q

a set of states F distinguished as accepting (or final) states F ⊆ Q.


Here, P(Q) denotes the power set of Q. The NFA with ε-moves (also sometimes called NFA-
epsilon or NFA-lambda) replaces the transition function with one that allows the empty string ε
as a possible input, so that one has instead

T : Q × (Σ ∪{ε}) → P(Q).

It can be shown that ordinary NFA and NFA with epsilon moves are equivalent, in that, given
either one, one can construct the other, which recognizes the same language.

Properties

The machine starts in the specified initial state and reads in a string of symbols from its alphabet.
The automaton uses the state transition function T to determine the next state using the current
state, and the symbol just read or the empty string. However, "the next state of an NFA depends
not only on the current input event, but also on an arbitrary number of subsequent input events.
Until these subsequent events occur it is not possible to determine which state the machine is in"
[2]. If, when the automaton has finished reading, it is in an accepting state, the NFA is said to
accept the string, otherwise it is said to reject the string.

The set of all strings accepted by an NFA is the language the NFA accepts. This language is a
regular language.

For every NFA a deterministic finite state machine (DFA) can be found that accepts the same
language. Therefore it is possible to convert an existing NFA into a DFA for the purpose of
implementing a (perhaps) simpler machine. This can be performed using the powerset
construction, which may lead to an exponential rise in the number of necessary states. A formal
proof of the powerset construction is given here.

Properties NFA

For all one writes if and only if q can be reached from p by going along zero or
more ε arrows. In other words, if and only if there exists where
such that

For any , the set of states that can be reached from p is called the epsilon-closure or ε-
closure of p, and is written as

For any subset , define the ε-closure of P as


.

The epsilon-transitions are transitive, in that it may be shown that, for all and
, if and , then .

Similarly, if and then

Let x be a string over the alphabet Σ∪{ε}. An NFA-ε M accepts the string x if there exist both a
representation of x of the form x1x2 ... xn, where xi ∈ (Σ ∪{ε}), and a sequence of states p0,p1, ...,
pn, where pi ∈ Q, meeting the following conditions:

1. p0 E({q0})
2. pi E(T(pi-1, xi )) for i = 1, ..., n
3. pn F.

Implementation

There are many ways to implement a NFA:

Convert to the equivalent DFA. In some cases this may cause exponential blowup in the size of
the automaton and thus auxiliary space proportional to the number of states in the NFA (as
storage of the state value requires at most one bit for every state in the NFA)Keep a set data
structure of all states which the machine might currently be in. On the consumption of the last
input symbol, if one of these states is a final state, the machine accepts the string. In the worst
case, this may require auxiliary space proportional to the number of states in the NFA; if the set
structure uses one bit per NFA state, then this solution is exactly equivalent to the above.

Create multiple copies. For each n way decision, the NFA creates up to n − 1 copies of the
machine. Each will enter a separate state. If, upon consuming the last input symbol, at least one
copy of the NFA is in the accepting state, the NFA will accept. (This, too, requires linear storage
with respect to the number of NFA states, as there can be one machine for every NFA state.)

Explicitly propagate tokens through the transition structure of the NFA and match whenever a
token reaches the final state. This is sometimes useful when the NFA should encode additional
context about the events that triggered the transition. (For an implementation that uses this
technique to keep track of object references have a look at Tracematches [3].)

Example

The following example explains a NFA M, with a binary alphabet, which determines if the input
contains an even number of 0s or an even number of 1s. (Note that 0 occurrences is an even
number of occurrences as well.) Let M = (Q, Σ, T, s0, F) where

 Σ = {0, 1},
 Q = {s0, s1, s2, s3, s4},
 E({s0}) = { s0, s1, s3 }
 F = {s1, s3}, and
 The transition function T can be defined by this state transition table

0 1 ε
S0 {} {} {S1, S3}
S1 {S2} {S1} {}
S2 {S1} {S2} {}
S3 {S3} {S4} {}
S4 {S4} {S3} {}

The state diagram of M is

M can be viewed as the union of two DFAs one with states {S1, S2} and the other with states {S3,
S4}.

The language of M can be described by the regular language given by this regular expression
Deterministic Finite Automata

In the theory of computation, a deterministic finite state machine—also known as deterministic finite
automaton (DFA)—is a finite state machine accepting strings of symbols (usually letters or numbers).
The list of symbols used by a DFA is called its alphabet. For each state, there is a transition arrow leading
out to a next state for each symbol in the alphabet. This contrasts with a nondeterministic finite
automaton (NFA), in which a state may have more than one transition for the same input symbol. Every
DFA has a start state (denoted graphically by an arrow coming in from nowhere) where computations
begin, and a set of accept states (denoted graphically by a double circle) which help define when a
computation is successful. DFAs recognize exactly the set of regular languages which are, among other
things, useful for doing lexical analysis and pattern matching.

Finite Automata are equivalent to regular expressions as a way of representing regular languages. This
means that it is possible to convert from a DFA to a regular expression and vice versa without losing any
information. [1] A given DFA (or regular expression) describes exactly one regular language. A DFA can
be used in either in an accepting mode to verify that an input string is indeed part of the language it
represents, or a generating mode to create a list of all the strings in the language.

DFAs are often treated as abstract mathematical concepts, but DFA-like state machines have been
implemented in hardware and / or software in order to solve various specific problems. Example of a
software state machine is in deciding whether or not online user-input such as phone numbers and
email addresses are valid. [2] A hardware example is the digital logic circuitry that controls whether an
automatic door is open or closed, using input from motion sensors or pressure pads to decide whether
or not to perform a state transition (see: finite state machine).

Formal definition

A DFA is a 5-tuple, (Q, Σ, δ, q0, F), consisting of

a finite set of states (Q)

a finite set of input symbols called the alphabet (Σ)

a transition function (δ : Q × Σ → Q)

a start state (q0 ∈ Q)

a set of accept states (F ⊆ Q)

Let M be a DFA such that M = (Q, Σ, δ, q0, F), and X = x0x1 ... xn−1 be a string over the alphabet Σ. M
accepts the string X if a sequence of states, r0,r1, ..., rn, exists in Q with the following conditions:

1.r0 = q0

2.ri+1 = δ(ri, xi), for i = 0, ..., n−1

3.rn ∈ F.
In words, the first condition says that the machine starts in the start state q0. The second condition says
that given each character of string X, the machine will transition from state to state according to the
transition function δ. The last condition says that the machine accepts X if the last input of X causes the
machine to halt in one of the accepting states. Otherwise, it is said that the automaton rejects the string.

The set of strings the DFA accepts form a language, which is the language the DFA recognizes.

A DFA without a list of accept states and without a designated starting state is known as a transition
system or semiautomaton.

Accept and Generate modes

A DFA representing a regular language can be used either in an accepting mode to validate that an input
string is part of the language, or in a generating mode to generate a list of all the strings in the language.

In the accept mode an input string is provided which the automaton can read in left to right, one symbol
at a time. The computation begins at the start state and proceeds by reading the first symbol from the
input string and following the state transition corresponding to that symbol. The system continues
reading symbols and following transitions until there are no more symbols in the input, which marks the
end of the computation. If after all input symbols have been processed the system is in an accept state
then we know that the input string was indeed part of the language, and it is said to be accepted,
otherwise it is not part of the language and it is not accepted.

The generating mode is similar except that rather than validating an input string its goal is to produce a
list of all the strings in the language. Instead of following a single transition out of each state, it follows
all of them. In practice this can be accomplished by massive parallelism (having the program branch into
two or more processes each time it is faced with a decision) or through recursion. As before, the
computation begins at the start state and then proceeds to follow each available transition, keeping
track of which branches it took. Every time the automaton finds itself in an accept state it knows that
the sequence of branches it took forms a valid string in the language and it adds that string to the list
that it is generating. If the language this automaton describes is infinite (ie contains an infinite number
or strings, such as "all the binary string with an even number of 0's) then the computation will never
halt. Given that regular languages are, in general, infinite, automata in the generating mode tends to be
more of a theoretical construct

The following example is of a DFA M, with a binary alphabet, which requires that the input contains an
even number of 0s.

The state diagram for MM = (Q, Σ, δ, q0, F) where

Q = {S1, S2},

Σ = {0, 1},

q0 = S1,
F = {S1}, and

δ is defined by the following state transition table:

01

S1 S2 S1

S2 S1 S2

The state S1 represents that there has been an even number of 0s in the input so far, while S2 signifies
an odd number. A 1 in the input does not change the state of the automaton. When the input ends, the
state will show whether the input contained an even number of 0s or not. If the input did contain an
even number of 0s, M will finish in state S1, an accepting state, so the input string will be accepted.

The language of M is the regular language given by the regular expression

where "*" is the Kleene star, e.g., 1* denotes any number (possibly zero) of symbols "1".

Transition monoid

Alternatively a run can be seen as a sequence of compositions of transition function with itself.
Given an input symbol , one may write the transition function as , using
the simple trick of curving, that is, writing δ(q,a) = δa(q) for all . This way, the
transition function can be seen in simpler terms: it's just something that "acts" on a state in Q,
yielding another state. One may then consider the result of function composition repeatedly
applied to the various functions δa, δb, and so on. Using this notion we define
. Given a pair of letters , one may define a new function , by
insisting that , where denotes function composition. Clearly, this process can be
recursively continued. So, we have following recursive definition

where ε is empty string and


where and .

is defined for all words . Repeated function composition forms a monoid. For
the transition functions, this monoid is known as the transition monoid or sometimes the
transformation semigroup. The construction can also be reversed: given a , one can
reconstruct a δ, and so the two descriptions are equivalent

Advantages and disadvantages

DFAs are one of the most practical models of computation, since there is a trivial linear time, constant-
space, online algorithm to simulate a DFA on a stream of input. Given two DFAs there are efficient
algorithms to find a DFA recognizing:

the union of the two DFAs

the intersection of the two DFAs

complements of the languages the DFAs recognize

Because DFAs can be reduced to a canonical form (minimal DFAs), there are also efficient algorithms to
determine:

whether a DFA accepts any strings

whether a DFA accepts all strings

whether two DFAs recognize the same language

the DFA with a minimum number of states for a particular regular language

DFAs are equivalent in computing power to nondeterministic finite automata.

On the other hand, finite state automata are of strictly limited power in the languages they can
recognize; many simple languages, including any problem that requires more than constant space to
solve, cannot be recognized by a DFA. The classical example of a simply described language that no DFA
can recognize is bracket language, that is, language that consists of properly paired brackets, such as (()
()). More formally the language consisting of strings of the form anbn—some finite number of a's,
followed by an equal number of b's. If there is no limit to recursion (i.e., you can always embed another
pair of brackets inside) it would require an infinite amount of states to recognize.

3. Explain with suitable numerical examples the concepts of Moore Machine and Mealay
Machine.

Ans.

Moore machine
A Moore machine is a finite-state machine whose output values are determined solely by its
current state. (This is in contrast to a Mealy machine, whose output values are determined both
by its current state and by the values of its inputs.) The state diagram for a Moore machine
associates an output value with each state (in contrast to the state diagram for a Mealy
machine, which associates an output value with each transition edge).

The Moore machine is named after Edward F. Moore, who presented the concept in a 1956
paper, “Gedanken-experiments on Sequential Machines

Mechanism

Most digital electronic systems are designed as clocked sequential systems. Clocked sequential
systems are a restricted form of Moore machine where the state changes only when the global
clock signal changes. Typically the current state is stored in flip-flops, and a global clock signal
is connected to the "clock" input of the flip-flops. Clocked sequential systems are one way to
solve metastability problems. A typical electronic Moore machine includes a combinational logic
chain to decode the current state into the outputs (lambda). The instant the current state
changes, those changes ripple through that chain, and almost instantaneously the outputs
change (or don't change). There are design techniques to ensure that no glitches occur on the
outputs during that brief period while those changes are rippling through the chain, but most
systems are designed so that glitches during that brief transition time are ignored or are
irrelevant. The outputs then stay the same indefinitely (LEDs stay bright, power stays connected
to the motors, solenoids stay energized, etc.), until the Moore machine changes state again.

Formal definition

A Moore machine can be defined as a 6-tuple ( S, S0, Σ, Λ, T, G ) consisting of the following:

a finite set of states ( S )

a start state (also called initial state) S0 which is an element of (S)

a finite set called the input alphabet ( Σ )

a finite set called the output alphabet ( Λ )

a transition function (T : S × Σ → S) mapping a state and the input alphabet to the next state

an output function (G : S → Λ) mapping each state to the output alphabet


The number of states in a Moore machine will be greater than or equal to the number of states
in the corresponding Mealy machine. This is due to the fact that each transition in a Mealy
machine can be associated with a corresponding, additional state mapping the transition to a
single output in the Moore machine, hence turning a possibly partial machine into a complete
machine.dfd

Mealy machine

Is a finite-state machine whose output values are determined both by its current state and by
the values of its inputs. (This is in contrast to a Moore machine, whose output values are
determined solely by its current state.) The state diagram for a Mealy machine associates an
output value with each transition edge (in contrast to the state diagram for a Moore machine,
which associates an output value with each state).

The Mealy machine is named after George H. Mealy, who presented the concept in a 1955
paper, “A Method for Synthesizing Sequential Circuits.”[1]

Mealy machines provide a rudimentary mathematical model for cipher machines. Considering
the input and output alphabet the Latin alphabet, for example, then a Mealy machine can be
designed that given a string of letters (a sequence of inputs) can process it into a ciphered string
(a sequence of outputs). However, although you could use a Mealy model to describe the
Enigma, the state diagram would be too complex to provide feasible means of designing
complex ciphering machines.

Formal Definition

A Mealy machine is 6-tuple, (S, S0, Σ, Λ, T, G), consisting of the following:

 a finite set of states (S)


 a start state (also called initial state) S0 which is an element of (S)
 a finite set called the input alphabet (Σ)
 a finite set called the output alphabet (Λ)
 a transition function (T : S × Σ → S) mapping pairs of a state and an input symbol to the
corresponding next state.
 an output function (G : S × Σ → Λ) mapping pairs of a state and an input symbol to the
corresponding output symbol.

In some formulations, the transition and output functions are coalesced into a single function (T :
S × Σ → S × Λ).
4. Write about:

A) Interpreters

An interpreter normally means a computer program that executes, i.e. performs,


instructions written in a programming language. An interpreter may be a program that either

1. Executes the source code directly

2. Translates source code into some efficient intermediate representation (code) and immediately
executes this

3. Explicitly executes stored precompiled code[1] made by a compiler which is part of the
interpreter system

Perl, Python, MATLAB, and Ruby are examples of type 2, while UCSD Pascal and Java are type
3: Source programs are compiled ahead of time and stored as machine independent code, which
is then linked at run-time and executed by an interpreter and/or compiler (for JIT systems). Some
systems, such as Smalltalk, BASIC and others, may also combine 2 and 3.

While interpretation and compilation are the two principal means by which programming
languages are implemented, these are not fully distinct categories, one of the reasons being that
most interpreting systems also perform some translation work, just like compilers. The terms
"interpreted language" or "compiled language" merely mean that the canonical implementation
of that language is an interpreter or a compiler; a high level language is basically an abstraction
which is (ideally) independent of particular implementations.

Bytecode interpreters

There is a spectrum of possibilities between interpreting and compiling, depending on the


amount of analysis performed before the program is executed. For example, Emacs Lisp is
compiled to bytecode, which is a highly compressed and optimized representation of the Lisp
source, but is not machine code (and therefore not tied to any particular hardware). This
"compiled" code is then interpreted by a bytecode interpreter (itself written in C). The compiled
code in this case is machine code for a virtual machine, which is implemented not in hardware,
but in the bytecode interpreter. The same approach is used with the Forth code used in Open
Firmware systems: the source language is compiled into "F code" (a bytecode), which is then
interpreted by a virtual machine.

Control tables - that do not themselves necessarily ever need to pass through a compilation
phase - dictate appropriate algorithmic control flow via customized interpreters in similar fashion
to bytecode interpreters
Efficiency

The main disadvantage of interpreters is that when a program is interpreted, it typically runs
more slowly than if it had been compiled. The difference in speeds could be tiny or great; often
an order of magnitude and sometimes more. It generally takes longer to run a program under an
interpreter than to run the compiled code but it can take less time to interpret it than the total
time required to compile and run it. This is especially important when prototyping and testing
code when an edit-interpret-debug cycle can often be much shorter than an edit-compile-run-
debug cycle.

Interpreting code is slower than running the compiled code because the interpreter must
analyze each statement in the program each time it is executed and then perform the desired
action, whereas the compiled code just performs the action within a fixed context determined by
the compilation. This run-time analysis is known as "interpretive overhead". Access to variables
is also slower in an interpreter because the mapping of identifiers to storage locations must be
done repeatedly at run-time rather than at compile time.

There are various compromises between the development speed when using an interpreter and
the execution speed when using a compiler. Some systems (such as some LISPs) allow
interpreted and compiled code to call each other and to share variables. This means that once a
routine has been tested and debugged under the interpreter it can be compiled and thus benefit
from faster execution while other routines are being developed. Many interpreters do not
execute the source code as it stands but convert it into some more compact internal form. For
example, some BASIC interpreters replace keywords with single byte tokens which can be used
to find the instruction in a jump table. An interpreter might well use the same lexical analyzer
and parser as the compiler and then interpret the resulting abstract syntax tree.

Advantages and disadvantages of using interpreters

Programmers usually write programs in high level code which the CPU cannot execute. So this
source code has to be converted into machine code. This conversion is done by a compiler or
an interpreter. A compiler makes the conversion just once, while an interpreter typically converts
it every time a program is executed (or in some languages like early versions of BASIC, every
time a single instruction is executed).

B) Difference between compilers and interpreters


An interpreter reads the source code one instruction or line at a time, converts this line into
machine code and executes it. The machine code is then discarded and the next line is read.
The advantage of this is it's simple and you can interrupt it while it is running, change the
program and either continue or start again. The disadvantage is that every line has to be
translated every time it is executed, even if it is executed many times as the program runs.
Because of this interpreters tend to be slow. Examples of interpreters are Basic on older home
computers, and script interpreters such as JavaScript, and languages such as Lisp and Forth.

A compiler reads the whole source code and translates it into a complete machine code
program to perform the required tasks which is output as a new file. This completely separates
the source code from the executable file. The biggest advantage of this is that the translation is
done once only and as a separate process. The program that is run is already translated into
machine code so is much faster in execution. The disadvantage is that you cannot change the
program without going back to the original source code, editing that and recompiling (though for
a professional software developer this is more of an advantage because it stops source code
being copied). Current examples of compilers are Visual Basic, C, C++, C#, Fortran, Cobol,
Ada, Pascal and so on.

C) Compiler writing tools

A number of tools have been developed specifically to help construct compliers. This tool range
from scanner and parser generation to complex system, variously called compiler - compilers,
compiler - generator or translator - writing system, which produces a compiler from some form
of specification of a source language and target machine. The input specification for these
systems may contain.

1. A description of the lexical and syntactic structure of the source language

2. A description of what output is to be generated for each source language contract.

3. A description of the target machine.

In many cased the specification is merely a collection of program fitted together into a
framework by the compiler-compiler. Some compiler-compilers, however, permits a portion of
the specification of a language to be non-procedural rather than procedural. while a number of
useful compiler-compilers exists, they have limitations. The chief problem is that there is a
tradeoff between how much work the compiler-compiler can do automatically for its user and
how flexible the system can be. Many compiler-compilers do in fact produce fixed lexical
analysis routines for use in the generated compiler.

5. Describe the following with respect to Storage or Memory Allocations:

A) Static Memory Allocations

Static memory allocation refers to the process of allocating memory at compile-time before the
associated program is executed, unlike dynamic memory allocation or automatic memory
allocation where memory is allocated as required at run-time.

The compiler allocates the required memory space for a declared variable. By using the address
of operator, the reserved address is obtained and this address may be assigned to a pointer
variable. Since most of the declared variable have static memory, this way of assigning pointer
value to a pointer variable is known as static memory allocation. Memory is assigned during
compilation time.

B) Stack Based Allocations

Stacks in computing architectures are regions of memory where data is added or removed in a
last-in-first-out manner.

In most modern computer systems, each thread has a reserved region of memory referred to as
its stack. When a function executes, it may add some of its state data to the top of the stack;
when the function exits it is responsible for removing that data from the stack. At a minimum, a
thread's stack is used to store the location of function calls in order to allow return statements to
return to the correct location, but programmers may further choose to explicitly use the stack. If
a region of memory lies on the thread's stack, that memory is said to have been allocated on the
stack.

Because the data is added and removed in a last-in-first-out manner, stack allocation is very
simple and typically faster than heap-based memory allocation (also known as dynamic memory
allocation). Another feature is that memory on the stack is automatically, and very efficiently,
reclaimed when the function exits, which can be convenient for the programmer if the data is no
longer required. If however, the data needs to be kept in some form, then it must be copied from
the stack before the function exits. Therefore, stack based allocation is suitable for temporary
data or data which is no longer required after the creating function exits.

A disadvantage of stack-based memory allocation is that a thread's stack size can be as small
as a few dozen kilobytes. Allocating more memory on the stack than is available can result in a
crash due to stack overflow.

Some processors families, such as the x86, have special instructions for manipulating the stack
of the currently executing thread. Other processor families, including PowerPC and MIPS, do
not have explicit stack support, but instead rely on convention and delegate stack management
to the operating system's Application Binary Interface (ABI).

C) Dynamic Memory Allocations

Dynamic memory allocation is the allocation of memory storage for use in a computer program
during the runtime of that program. It can be seen also as a way of distributing ownership of
limited memory resources among many pieces of data and code.

Dynamically allocated memory exists until it is released either explicitly by the programmer,
exiting a block, or by the garbage collector. This is in contrast to static memory allocation, which
has a fixed duration. It is said that an object so allocated has a dynamic lifetime

It uses functions such as malloc( ) or calloc( ) to get memory dynamically.If these functions are
used to get memory dynamically and the values returned by these functions are assingned to
pointer variables, such assignments are known as dynamic memory allocation.memory is
assined during run time.

6. Describe the following:

A) Heap storage and Garbage Collection

Heap storage is used to allocate storage that has a lifetime not related to the execution of the
current routine; it remains allocated until you explicitly free it or until the enclave terminates. You
can control allocation and freeing of heap storage using Language Environment callable
services, and tune heap storage using the Language Environment run-time options HEAP,
THREADHEAP and HEAPPOOLS; see z/OS Language Environment Programming Reference
for details.

Heap storage is shared among all program units and all threads in an enclave. Any thread can
free heap storage. You can free one element at a time with the CEEFRST callable service, or
you can free all heap elements at once using CEEDSHP. You cannot, however, discard the
initial heap.

Storage can be allocated or freed with any of the HLL storage facilities, such as malloc(),
calloc(), or ALLOCATE, along with the Language Environment storage services. For HLLs with
no intrinsic function for storage management, such as COBOL, you can use the Language
Environment storage services.

Note that when HEAPPOOLS(ON) or HEAPPOOLS(ALIGN) is in effect, the C storage


management intrinsic functions must be used together, that is, if you malloc(), you must use
free() to release the storage, you cannot use CEEFRST. See Using HEAPPOOLS to improve
performance for more information about heap pools.

Heap storage, sometimes referred to as a heap, is a collection of one or more heap segments
comprised of an initial heap segment, which is dynamically allocated at the first request for heap
storage, and, as needed, one or more heap increments, allocated as additional storage is
required. The initial heap is provided by Language Environment and does not require a call to
the CEECRHP service. The initial heap is identified by heap_id=0. It is also known as the user
heap. See Figure 57 for an illustration of Language Environment heap storage.

Heap segments, which are contiguous areas of storage obtained directly from the operating
system, are subdivided into individual heap elements. Heap elements are obtained by a call to
the CEEGTST service, and are allocated within each segment of the initial heap by the
Language Environment storage management routines. When the initial heap segment becomes
full, Language Environment gets another segment, or increment, from the operating system.
The size of the initial heap segment is governed by the init_size parameter of the HEAP run-
time option. (See z/OS Language Environment Programming Reference.) The incr_size
parameter governs the size of each heap increment.

A named heap is set up specifically by a call to the CEECRHP service, which returns an
identifier when the heap is created. Additional heaps can also be created and controlled by calls
to CEECRHP.

Additional heaps provide isolation between logical groups of data in different additional heaps.
Separate additional heaps when you need to group storage objects together so they can be
freed at once (with a single call to CEEDSHP), rather than freed one element at a time (with
calls to CEEFRST).

Library routines occasionally use a heap called the library heap for storage below 16 MB. The
size of this heap is controlled by the BELOWHEAP run-time option. The library heap and the
BELOWHEAP run-time option have no relation to heaps created by CEECRHP. If an application
program creates a heap using CEECRHP, library routines never use that heap (except, of
course, the storage management library routines CEEGTST, CEEFRST, CEECZST, and
CEEDSHP). The library heap can be tuned with the BELOWHEAP run-time option.

The Language Environment anywhere heap and below heap are reserved for run-time library
usage only. Application data and variables are not kept in these heaps. You normally should not
adjust the size of these heaps unless the storage report indicates excessive segments allocated
for the anywhere or below heaps, or if too much storage has been allocated.

Garbage collection

Garbage collection (GC) is a form of automatic memory management. It is a special case of


resource management, in which the limited resource being managed is memory. The garbage
collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are
no longer in use by the program. Garbage collection was invented by John McCarthy around
1959 to solve problems in Lisp.
Garbage collection is often portrayed as the opposite of manual memory management, which
requires the programmer to specify which objects to deallocate and return to the memory
system. However, many systems use a combination of the two approaches, and other
techniques such as stack allocation and region inference can carve off parts of the problem.
There is an ambiguity of terms, as theory often uses the terms manual garbage collection and
automatic garbage collection rather than manual memory management and garbage collection,
and does not restrict garbage collection to memory management, rather considering that any
logical or physical resource may be garbage collected.

Garbage collection does not traditionally manage limited resources other than memory that
typical programs use, such as network sockets, database handles, user interaction windows,
and file and device descriptors. Methods used to manage such resources, particularly
destructors, may suffice as well to manage memory, leaving no need for GC. Some GC systems
allow such other resources to be associated with a region of memory that, when collected,
causes the other resource to be reclaimed; this is called finalization. Finalization may introduce
complications limiting its usability, such as intolerable latency between disuse and reclaim of
especially limited resources, or a lack of control over which thread performs the work of
reclaiming.

B) Java and its Garbage collection mechanism

Garbage collection is one of the most important features of Java. The purpose of garbage
collection is to identify and discard objects that are no longer needed by a program so that their
resources can be reclaimed and reused. A Java object is subject to garbage collection when it
becomes unreachable to the program in which it is used. Garbage collection is also called
automatic memory management as JVM automatically removes the unused variables/objects
(value is null) from the memory. Every class inherits finalize() method from java.lang.Object, the
finalize() method is called by garbage collector when it determines no more references to the
object exists. In Java, it is good idea to explicitly assign null into a variable when no more in use.
In Java on calling System.gc() and Runtime.gc(), JVM tries to recycle the unused objects, but
there is no guarantee when all the objects will garbage collected. Garbage collection is an
automatic process and can't be forced. There is no guarantee that Garbage collection will start
immediately upon request of System.gc().

You might also like