Professional Documents
Culture Documents
Report: Compiler vs. Programming Languages
Report: Compiler vs. Programming Languages
Report: Compiler vs. Programming Languages
Programming Languages
Submitted to: Dr. Qaisar Javaid
Course: Compiler Construction
Submitted by:
Malaika Arshad (0033)
Ramsha Alvi (0069)
Tayyaba Hameed (0070)
February 20, 2021
1 Introduction
Compiler, Computer software that translates or compiles source code written in a high-level
language e.g., C++ into a set of machine-language instructions that can be understood by a
digital computer's CPU. A compiler is a computer program that translates computer code written
in one programming language into another language. The name "compiler" is primarily used for
programs that translate source code from a high-level programming language to a lower level
language to create an executable program.
A Computer understands only binary language and executes instructions coded in binary
language. It cannot execute a single instruction given in any other form. Therefore, we must
provide instructions to the computer in binary language. Means we must write computer
programs entirely in binary language (sequence of 0s and 1s). So, there was a need of a translator
that translates the computer instructions given in English language to binary language. Hence, to
accomplish the job of a translator compiler was invented. A programming language is a formal
language comprising a set of instructions that produce various kinds of output. Programming
languages are used in computer programming to implement algorithms. Most programming
languages consist of instructions for computers. The programmers have to follow all the
specified rules before writing program using programming language. Languages that
programmers use to write code are called "high-level languages. “ This code can be compiled
into a "low-level language," which is recognized directly by the computer hardware
In language processing system the source program is first preprocessed preprocessors. The
modified source program is processed by compiler to form target assembly code which is then
translated by assembler to generate relocatable object codes that are processed by linker and
loader to generate target program. We write programs in high-level language, which is easier for
us to understand and remember. These programs are then fed into a series of tools and OS
components to get the desired code that can be used by the machine. This is known as Language
Processing System. It can be described for C compiler as:
The core compiler reads a program described in a high-level programming language. The
compiler then analyses the program, partitions it into hardware and software, and then generates
data paths for the reconfigurable hardware. It focuses on the basic relationships between
languages and machines. Therefore, the relationships ease the inevitable transitions to new
hardware and programming languages. In parallel, the software part is instrumented with
functions for configuring and exchanging with the reconfigurable hardware. The term
compilation denotes the conversion of an algorithm expressed in a human-oriented source
language to an algorithm expressed in a hardware-oriented target language. Also consider
Conventional programs give priority to knowledge in which competency is flexible and
adaptable and cannot be reduced to an algorithm.
If you have been in high performance computing since its beginning in the 1950s, you have
programmed in several languages during that time. During the 1950s and early 1960s, you
programmed in assembly language. The constraint on memory and slow clock rates made every
instruction precious. With small memories, overall program size was typically small, so
assembly language was sufficient. Toward the end of the 1960s, programmers began writing
more of their code in a high-level language such as FORTRAN. Writing in a high-level language
made your work much more portable, reliable, and maintainable. Given the increasing speed and
capacity of computers, the cost of using a high-level language was something most programmers
were willing to accept. In the 1970s if a program spent a particularly large amount of time in a
particular routine, or the routine was part of the operating system or it was a commonly used
library, most likely it was written in assembly language.
During the late 1970s and early 1980s, optimizing compilers continued to improve to the point
that all but the most critical portions of general-purpose programs were written in high-level
languages. On the average, the compilers generate better code than most assembly language
programmers. This was often because a compiler could make better use of hardware resources
such as registers. In a processor with 16 registers, a programmer might adopt a convention
regarding the use of registers to help keep track of what value is in what register. A compiler can
use each register as much as it likes because it can precisely track when a register is available for
another use.
However, during that time, high performance computer architecture was also evolving. Cray
Research was developing vector processors at the very top end of the computing spectrum.
Compilers were not quite ready to determine when these new vector instructions could be used.
Programmers were forced to write assembly language or create highly hand-tuned FORTRAN
that called the appropriate vector routines in their code. In a sense, vector processors turned back
the clock when it came to trusting the compiler for a while. Programmers never lapsed
completely into assembly language, but some of their FORTRAN started looking rather un-
FORTRAN like. As the vector computers matured, their compilers became increasingly able to
detect when vectorization could be performed. At some point, the compilers again became better
than programmers on these architectures. These new compilers reduced the need for extensive
directives or language extensions. The compiler became an important tool in the processor
design cycle. Processor designers had much greater flexibility in the types of changes they could
make. For example, it would be a good design in the next revision of a processor to execute
existing codes 10% slower than a new revision, but by recompiling the code, it would perform
65% faster. Of course it was important to actually provide that compiler when the new processor
was shipped and have the compiler give that level of performance across a wide range of codes
rather than just one particular benchmark suite.
Enyindah and Okon E. Uko have reviewed use of optimization algorithms in new compilers to
reduce the actual size of code. The new techniques are contrasted with the conventional
optimization methods. The conventional methods like dataflow analysis, local optimization and
global optimization are considered for analysis. The newer compilers use mechanisms like
dataflow analysis, Leaf optimization functions, cross linking optimization, etc. Data-Flow
Analysis brings in fuzziness in data information. It also includes Alias analysis. Reverse In lining
(Procedural Abstraction) aim is to achieve code size reduction. Leaf Function Optimization
involves utilizing leaf functions to reduce code length. Leaf functions are the functions that do
not directly call functions in a program and form the leaves in a call graph. Cross linking
optimization technique is used to factor out codes so as to reduce the code size. These
optimization techniques in new compilers help to utilize memory efficiently, reduce code size
and increase program execution speed.
In this section we shall discuss management of storage for collections of objects, including
temporary variables, during their lifetimes. The important goals are the most economical use of
memory and the simplicity of access functions to individual objects. Source language properties
govern the possible approaches, as indicated by the following questions
1 Is the extent of an object restricted, and what relationships hold between the extents of
distinct objects (e.g. are they nested)?
2 Does the static nesting of the program text control a procedure's access to global objects, or
is access dependent upon the dynamic nesting of calls?
3 Is the exact number and size of all objects known at compilation time?
Frontend, Dependent on source language, Lexical analysis, Parsing Semantic analysis (e.g., type
checking)
1.1.4 Static Storage Management
We speak of static storage management if the compiler can provide fixed addresses for all
objects at the time the program is translated (here we assume that translation includes binding),
i.e. we can answer the first question above with 'yes'. Arrays with dynamic bounds, recursive
procedures and the use of anonymous objects are prohibited. The condition is fulfilled for
languages like FORTRAN and BASIC, and for the objects lying on the outermost contour of an
ALGOL 60 or Pascal program. (In contrast, arrays with dynamic bounds can occur even in the
outer block of an ALGOL 68 program.)If the storage for the elements of an array with dynamic
bounds is managed separately, the condition can be forced to hold in this case also.
Using a Stack All declared values in languages such as Pascal and SIMULA have restricted
lifetimes. Further, the environments in these languages are nested: The extent of all objects
belonging to the contour of a block or procedure ends before that of objects from the
dynamically enclosing contour. Thus we can use a stack discipline to manage these objects:
Upon procedure call or block entry, the activation record containing storage for the local objects
of the procedure or block is pushed onto the stack. At block end, procedure return or a jump out
of these constructs the activation record is popped of the stack. (The entire activation record is
stacked; we do not deal with single objects individually!)An object of automatic extent occupies
storage in the activation record of the syntactic construct with which it is associated. The position
of the object is characterized by the base address, b, of the activation record and the relative
location offset), R, of its storage with in the activation record. R must be known at compile time
but b cannot be known (otherwise we would have static storage allocation). To access the object,
b must be determined at runtime and placed in a register. R is then either added to the register
and the result used as an indirect address, or R appears as the constant in a direct access function
of the form' register + constant'. The extension, which may vary in size from activation to
activation, is often called the second order storage of the activation record.
Error Handling is concerned with failures due to many causes: errors in the compiler or its
environment (hardware, operating system), design errors in the program being compiled, an
incomplete understanding of the source language, transcription errors, incorrect data, etc. The
tasks of the error handling process are to detect each error, report it to the user, and possibly
make some repair to allow processing to continue. It cannot generally determine the cause of the
error, but can only diagnose the visible symptoms. Similarly, any repair cannot be considered a
correction (in the sense that it carries out the user's intent); it merely neutralizes the symptom so
that processing may continue. The purpose of error handling is to aid the programmer by
highlighting in consistencies. It has a low frequency in comparison with other compiler tasks,
and hence the time required to complete it is largely irrelevant, but it cannot be regarded as an
'add-on' feature of a compiler. Its innocence upon the overall design is pervasive, and it is a
necessary debugging tool during construction of the compiler itself. Proper design and
implementation of an error handler, however, depends strongly upon complete understanding of
the compilation process. This is why we have deferred consideration of error handling until now
Errors, Symptoms, Anomalies and Limitations We distinguish between the actual error and its
symptoms. Like a physician, the error handler sees only symptoms. From these symptoms, it
may attempt to diagnose the underlying error. The diagnosis always involves some uncertainty,
so we may choose simply to report the symptoms with no further attempt at diagnosis.
1.2.1 Lexical
The lexical syntax (token structure), which is processed by the lexer and the phrase syntax, is
processed by the parser. The lexical syntax is usually a regular language, whose alphabet consists
of the individual characters of the source code text. The phrase syntax is usually a context-free
language, whose alphabet consists of the tokens produced by the lexer. In computer science,
lexical analysis is the process of converting a sequence of characters into a sequence of tokens,
i.e. meaningful character strings. A program or function that performs lexical analysis is called a
lexical analyzer, lexer, tokenizer, or scanner, though "scanner" is also used for the first stage of a
lexer.
1.2.2 Parser
Within computational linguistics the term is used to refer to the formal analysis by a computer of
a sentence or other string of words into its constituents, resulting in a parse tree showing their
syntactic relation to each other, which may also contain semantic and other information. The
term has slightly different meanings in different branches of linguistics and computer science. In
order to parse natural language data, researchers must first agree on the grammar to be used. The
choice of syntax is affected by both linguistic and computational concerns; traditional sentence
parsing is often performed as a method of understanding the exact meaning of a sentence,
sometimes with the aid of devices such as sentence diagrams. It usually emphasizes the
importance of grammatical divisions such as subject and predicate. Parsing or syntactic analysis
is the process of analyzing a string of symbols, either in natural language or in computer
languages.
While being locked into a specific hardware package has its downsides, compiling a program can
also increase its performance. Users can send specific options to compilers regarding the details
of the hardware the program will be running on. This allows the compiler to create machine
language code that makes the most efficient use of the specified hardware, as opposed to more
generic code. This also allows advanced users to optimize a program's performance on their
computers.
One major advantage of programs that are compiled is that they are self-contained units that are
ready to be executed. Because they are already compiled into machine language binaries, there is
no second application or package that the user has to keep up-to-date. If a program is compiled
for Windows on x 86 architecture, the end user needs only a Windows operating system running
on x 86 architecture. Additionally, a precompiled package can run faster than an interpreter
compiling source code in real time.
Compiled languages are converted directly into machine code that the processor can execute. As
a result, they tend to be faster and more efficient to execute than interpreted languages. They also
give the developer more control over hardware aspects, like memory management and CPU
usage.
5. Programs developed using low level languages are fast and memory efficient.
6. The translation is only done once as a separate process
7. Compiled programs can run on any computer
8. There is no need of any compiler or interpreters to translate the source to machine code.
Thus, cuts the compilation and interpretation time.
9. Low level languages provide direct manipulation of computer registers and storage.
10. It can directly communicate with hardware devices.
1.3.3 Disadvantage: Compile Times
One of the drawbacks of having a compiler is that it must actually compile source code. While
the small programs that many novice programmers code take trivial amounts of time to compile,
larger application suites can take significant amounts of time to compile. When programmers
have nothing to do but wait for the compiler to finish, this time can add up—especially during
the development stage, when the code has to be compiled in order to test functionality and
troubleshoot glitches.
1.3.4 Disadvantage: Hardware Specific
Because a compiler translates source code into a specific machine language, programs have
to be specifically compiled for OS X, Windows or Linux, as well as specifically for 32-bit
or 64-bit architectures. For a programmer or software company trying to get a product out
to the widest possible audience, this means maintaining multiple versions of the source
code for the same application. These results in more time spent on source code
maintenance and extra trouble when updates are released.
1. If error occur whole program have to compile again
2. Programs developed using low level languages are machine dependent and are not portable.
3. It is difficult to develop, debug and maintain
4. Will only run on computer that has same platform
5. Program cannot be changed without going back to the source code
6. Object code needs to be produced before a final executable file; this can be a slow process.
7. Programmer must have additional knowledge of the computer architecture of particular
machine, for programming in low level language.
8. Low level programming usually results in poor programming productivity.
1.4 Conclusion
Besides covering basic compilation issues, the course yields an implemented compiler that can
serve as a test bed for coursework implementation for compiler. We described an improved
approach for a compiler which partitions a high-level language program. Further research will
actually quantify the advantages in relation to the current system. The implementation and source
language is Scheme, and the target language is assembly code.
The compilation process is one of the steps in executing a program. Understanding how
compilers work and what goes on "behind the scenes" will help you get better at developing
software. The need for compilers arises from the need for high-level languages, which are more
relevant for problem solving and software development by humans. Moreover, high-level
languages are machine independent.
1.5 References
1 http://ijirt.org/master/publishedpaper/IJIRT100158_PAPER.pdf
2 https://ijcsmc.com/docs/papers/October2014/V3I10201482.pdf
3 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.727.1786&rep=rep1&type=pdf
4 https://learn.saylor.org/course/view.php?id=74§ionid=705
5 https://www.reddit.com/r/ProgrammingLanguages/comments/9zo2qx/programming_languag
ecompiler_research_ideas/
6 https://www.computer.org/csdl/journal/ts/5555/01/09353261/1r8kzlTfYRy
7 https://www.ijert.org/research/a-literature-survey-on-artificial-intelligence-
IJERTCONV5IS19015.pdf