Report: Compiler vs. Programming Languages

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

Report: Compiler vs.

Programming Languages
Submitted to: Dr. Qaisar Javaid
Course: Compiler Construction
Submitted by:
Malaika Arshad (0033)
Ramsha Alvi (0069)
Tayyaba Hameed (0070)
February 20, 2021
1 Introduction
Compiler, Computer software that translates or compiles source code written in a high-level
language e.g., C++ into a set of machine-language instructions that can be understood by a
digital computer's CPU. A compiler is a computer program that translates computer code written
in one programming language into another language. The name "compiler" is primarily used for
programs that translate source code from a high-level programming language to a lower level
language to create an executable program.

A Computer understands only binary language and executes instructions coded in binary
language. It cannot execute a single instruction given in any other form. Therefore, we must
provide instructions to the computer in binary language. Means we must write computer
programs entirely in binary language (sequence of 0s and 1s). So, there was a need of a translator
that translates the computer instructions given in English language to binary language. Hence, to
accomplish the job of a translator compiler was invented. A programming language is a formal
language comprising a set of instructions that produce various kinds of output. Programming
languages are used in computer programming to implement algorithms. Most programming
languages consist of instructions for computers. The programmers have to follow all the
specified rules before writing program using programming language. Languages that
programmers use to write code are called "high-level languages. “ This code can be compiled
into a "low-level language," which is recognized directly by the computer hardware

1.1 Literature survey


Computer programs are formulated in a programming language and specify classes of computing
processes. Computers, however, interpret sequences of particular instructions, but not program
texts. Therefore, the program text must be translated into a suitable instruction sequence before it
can be processed by a computer. This translation can be automated, which implies that it can be
formulated as a program itself. Typically, a programmer writes language statements in a
language such as Pascal or C one line at a time using an editor. The file that is created contains
what are called the source statement. The programmer then runs the appropriate language
compiler, specifying the name of the file that contains the source statements. Compiler
construction is a widely used software engineering exercise, and hence this report presents a
compiler system for adaptive computing.. In order to develop effective compilation techniques, it
is important to understand the common characteristics of the programs during compilation. High-
level languages are characterized by the fact that objects of programs, for example variables and
functions, are classified according to their type. Therefore, in addition to syntactic rules,
compatibility rules among types of operators and operands define the language. Hence,
verification of whether these compatibility rules are observed by a program is an additional duty
of a compiler. This verification is called type checking. The ability to compile in a single pass
has classically been seen a benefit because it simplifies the job of writing a compiler and one-
pass compilers generally perform compilations faster than multi-pass compilers. Thus, partly
driven by the resource limitations of early systems, many early languages were specifically
designed so that they could be compiled in a single pass. In some cases the design of a language
feature may require a compiler to perform more than one pass over the source. For instance,
consider a declaration appearing on line 20 of the source which affects the translation of a
statement appearing online 10.

The term compilation denotes the conversion of an algorithm expressed in a human-oriented


source language to an algorithm expressed in a hardware-oriented target language. Also consider
Conventional programs give priority to knowledge in which competency is flexible and
adaptable and cannot be reduced to an algorithm. Programming languages are the tools used to
construct formal descriptions consists of finite computations (algorithms), in which each
computation further consists of operations that transform a given initial state into the final state.
In the context of factual information that can consist of, for example, a definition, a theorem, a
hypothesis, a rule, or an algorithm. We shall be concerned with the engineering of compilers.
Besides covering basic compilation issues, the course yields an implemented compiler that can
serve as a test bed for coursework implementation for compiler. We described an improved
approach for a compiler which partitions a high-level language program. Further research will
actually quantify the advantages in relation to the current system. The implementation and source
language is Scheme, and the target language is assembly code. Compiler plays an important role
in most of the famous programming languages. A Java compiler is a program that takes the text
file work of a developer and compiles it into a platform-independent Java file. Generally, Java
compilers are run and pointed to a programmer’s code in a text file to produce a class file for use
by the Java virtual machine (JVM) on different platforms. This byte-code is generic, i.e. it does
not include machine level details. In python compiler first compiles your source code (.py file)
into a format known as byte code. Compiled code is usually stored in .pyc files, and is
regenerated when the source is updated, or when otherwise necessary. Programming languages
are used to develop applications and such languages are used to write code that can enhance and
control application behavior. The compilation of both the languages is similar, C and C++ share
the same basic syntax. Programming languages are used to develop applications and such
languages are used to write code that can enhance and control application behavior. Compiling
involves performing lots of work and early computers did not have enough memory to contain
one program that did all of this work. So compilers were split up into smaller programs which
each made a pass over the source (or some representation of it) performing some of the required
analysis and translations.

In language processing system the source program is first preprocessed preprocessors. The
modified source program is processed by compiler to form target assembly code which is then
translated by assembler to generate relocatable object codes that are processed by linker and
loader to generate target program. We write programs in high-level language, which is easier for
us to understand and remember. These programs are then fed into a series of tools and OS
components to get the desired code that can be used by the machine. This is known as Language
Processing System. It can be described for C compiler as:

1 User writes a program in C language Source Code high - level language


2 The compiler compiles the program and translates it to assembly program low - level
language.
3 An assembler then translates the assembly program into machine code object.
4 A linker tool is used to link all the parts of the program together for execution executable
machine code.
5 A loader loads all of them into memory and then the program is executed.

The core compiler reads a program described in a high-level programming language. The
compiler then analyses the program, partitions it into hardware and software, and then generates
data paths for the reconfigurable hardware. It focuses on the basic relationships between
languages and machines. Therefore, the relationships ease the inevitable transitions to new
hardware and programming languages. In parallel, the software part is instrumented with
functions for configuring and exchanging with the reconfigurable hardware. The term
compilation denotes the conversion of an algorithm expressed in a human-oriented source
language to an algorithm expressed in a hardware-oriented target language. Also consider
Conventional programs give priority to knowledge in which competency is flexible and
adaptable and cannot be reduced to an algorithm.

If you have been in high performance computing since its beginning in the 1950s, you have
programmed in several languages during that time. During the 1950s and early 1960s, you
programmed in assembly language. The constraint on memory and slow clock rates made every
instruction precious. With small memories, overall program size was typically small, so
assembly language was sufficient. Toward the end of the 1960s, programmers began writing
more of their code in a high-level language such as FORTRAN. Writing in a high-level language
made your work much more portable, reliable, and maintainable. Given the increasing speed and
capacity of computers, the cost of using a high-level language was something most programmers
were willing to accept. In the 1970s if a program spent a particularly large amount of time in a
particular routine, or the routine was part of the operating system or it was a commonly used
library, most likely it was written in assembly language.
During the late 1970s and early 1980s, optimizing compilers continued to improve to the point
that all but the most critical portions of general-purpose programs were written in high-level
languages. On the average, the compilers generate better code than most assembly language
programmers. This was often because a compiler could make better use of hardware resources
such as registers. In a processor with 16 registers, a programmer might adopt a convention
regarding the use of registers to help keep track of what value is in what register. A compiler can
use each register as much as it likes because it can precisely track when a register is available for
another use.
However, during that time, high performance computer architecture was also evolving. Cray
Research was developing vector processors at the very top end of the computing spectrum.
Compilers were not quite ready to determine when these new vector instructions could be used.
Programmers were forced to write assembly language or create highly hand-tuned FORTRAN
that called the appropriate vector routines in their code. In a sense, vector processors turned back
the clock when it came to trusting the compiler for a while. Programmers never lapsed
completely into assembly language, but some of their FORTRAN started looking rather un-
FORTRAN like. As the vector computers matured, their compilers became increasingly able to
detect when vectorization could be performed. At some point, the compilers again became better
than programmers on these architectures. These new compilers reduced the need for extensive
directives or language extensions. The compiler became an important tool in the processor
design cycle. Processor designers had much greater flexibility in the types of changes they could
make. For example, it would be a good design in the next revision of a processor to execute
existing codes 10% slower than a new revision, but by recompiling the code, it would perform
65% faster. Of course it was important to actually provide that compiler when the new processor
was shipped and have the compiler give that level of performance across a wide range of codes
rather than just one particular benchmark suite.

1 Sheridan F. Practical testing of a C99 compiler using output comparison. Software


Practice and Experience, 2007, 37(14): 1475–1488. Nagai proposed a random test
method based on this approach which targets arithmetic optimization. It avoids
generating programs with undefined behavior by regenerating new expressions when it
detects expressions that trigger undefined behavior. An implemented test sys-tem
found some bugs in GCC 4.4.1 (i686-pc-linux), etc., but it is not necessarily effective,
for no bugs were detected in GCCs of versions higher than 4.5.0. Possible reasons for
this are that the generated programs were all small or that the generated program only
focused on arithmetic expressions.
2 Neil Gershenfeld was professor in MIT. He has mentioned in his book “When Things
Start to Think” appeared in 1999. He didn’t say the exact term but elaborated where
IoT was leading. 
3 IOT has grown in these years, and nowadays there are many IoT devices available in
the market, many examples are there like, smart Home, smart phone, smart watches,
smart fire alarm, fitness tracker, medical sensors, smart bicycle etc.

1.1.1 Machine Learning Optimization

Enyindah and Okon E. Uko have reviewed use of optimization algorithms in new compilers to
reduce the actual size of code. The new techniques are contrasted with the conventional
optimization methods. The conventional methods like dataflow analysis, local optimization and
global optimization are considered for analysis. The newer compilers use mechanisms like
dataflow analysis, Leaf optimization functions, cross linking optimization, etc. Data-Flow
Analysis brings in fuzziness in data information. It also includes Alias analysis. Reverse In lining
(Procedural Abstraction) aim is to achieve code size reduction. Leaf Function Optimization
involves utilizing leaf functions to reduce code length. Leaf functions are the functions that do
not directly call functions in a program and form the leaves in a call graph. Cross linking
optimization technique is used to factor out codes so as to reduce the code size. These
optimization techniques in new compilers help to utilize memory efficiently, reduce code size
and increase program execution speed.

1.1.2 Optimization For Dynamic Languages


Michael R Jantz et al. explore the various single and multilevel (Just In Time) JIT compilation
policies for modern machines. Dynamic compilation is important for languages such as Java and
C# in order to achieve high performance. In the paper they describe experiments in order to
control the compiler aggressiveness and optimization levels in Oracle Hotspot Java VM. By
analyzing all the various JIT compilation policies, the most effective policy for any particular
application can be identified. It was proven that employing all the free compilation resources
aggressively to compiler more program methods eventually reaches a point of diminishing
returns. At the same time using free resources to reduce the queue backup significantly increases
the performance especially in slower JIT compilers. The paper further shows how prioritizing
JIT method is crucial in systems with smaller hardware budgets.

1.1.3 Storage Management

In this section we shall discuss management of storage for collections of objects, including
temporary variables, during their lifetimes. The important goals are the most economical use of
memory and the simplicity of access functions to individual objects. Source language properties
govern the possible approaches, as indicated by the following questions

1 Is the extent of an object restricted, and what relationships hold between the extents of
distinct objects (e.g. are they nested)?

2 Does the static nesting of the program text control a procedure's access to global objects, or
is access dependent upon the dynamic nesting of calls?

3 Is the exact number and size of all objects known at compilation time?

Frontend, Dependent on source language, Lexical analysis, Parsing Semantic analysis (e.g., type
checking)
1.1.4 Static Storage Management

We speak of static storage management if the compiler can provide fixed addresses for all
objects at the time the program is translated (here we assume that translation includes binding),
i.e. we can answer the first question above with 'yes'. Arrays with dynamic bounds, recursive
procedures and the use of anonymous objects are prohibited. The condition is fulfilled for
languages like FORTRAN and BASIC, and for the objects lying on the outermost contour of an
ALGOL 60 or Pascal program. (In contrast, arrays with dynamic bounds can occur even in the
outer block of an ALGOL 68 program.)If the storage for the elements of an array with dynamic
bounds is managed separately, the condition can be forced to hold in this case also.

1.1.5 Dynamic Storage Management

Using a Stack All declared values in languages such as Pascal and SIMULA have restricted
lifetimes. Further, the environments in these languages are nested: The extent of all objects
belonging to the contour of a block or procedure ends before that of objects from the
dynamically enclosing contour. Thus we can use a stack discipline to manage these objects:
Upon procedure call or block entry, the activation record containing storage for the local objects
of the procedure or block is pushed onto the stack. At block end, procedure return or a jump out
of these constructs the activation record is popped of the stack. (The entire activation record is
stacked; we do not deal with single objects individually!)An object of automatic extent occupies
storage in the activation record of the syntactic construct with which it is associated. The position
of the object is characterized by the base address, b, of the activation record and the relative
location offset), R, of its storage with in the activation record. R must be known at compile time
but b cannot be known (otherwise we would have static storage allocation). To access the object,
b must be determined at runtime and placed in a register. R is then either added to the register
and the result used as an indirect address, or R appears as the constant in a direct access function
of the form' register + constant'. The extension, which may vary in size from activation to
activation, is often called the second order storage of the activation record.

1.1.6 Error Handling

Error Handling is concerned with failures due to many causes: errors in the compiler or its
environment (hardware, operating system), design errors in the program being compiled, an
incomplete understanding of the source language, transcription errors, incorrect data, etc. The
tasks of the error handling process are to detect each error, report it to the user, and possibly
make some repair to allow processing to continue. It cannot generally determine the cause of the
error, but can only diagnose the visible symptoms. Similarly, any repair cannot be considered a
correction (in the sense that it carries out the user's intent); it merely neutralizes the symptom so
that processing may continue. The purpose of error handling is to aid the programmer by
highlighting in consistencies. It has a low frequency in comparison with other compiler tasks,
and hence the time required to complete it is largely irrelevant, but it cannot be regarded as an
'add-on' feature of a compiler. Its innocence upon the overall design is pervasive, and it is a
necessary debugging tool during construction of the compiler itself. Proper design and
implementation of an error handler, however, depends strongly upon complete understanding of
the compilation process. This is why we have deferred consideration of error handling until now
Errors, Symptoms, Anomalies and Limitations We distinguish between the actual error and its
symptoms. Like a physician, the error handler sees only symptoms. From these symptoms, it
may attempt to diagnose the underlying error. The diagnosis always involves some uncertainty,
so we may choose simply to report the symptoms with no further attempt at diagnosis.

1.2 Problem Statement


In this report we will discuss about the use of compiler in different programming languages, why
we use compiler for different famous programming language. To understand how compiler
works and what kind pros and cons will be faced by the use of compiler. Anyone who has done
any computer programming, however little, has faced the tedious task of, firstly, understanding
what the compiler says through the error/warning messages; secondly, guessing what these
messages really mean; thirdly, figuring out what to do to fix such an error/warning; fourthly,
learning how to act to avoid them thereafter; and fifthly, recognizing recurring messages and
remembering how they were fixed in the past. This research area, being at the nexus of systems
research, will be critical to the development of applications that exploit high levels of abstraction
and functionality and achieve high levels of performance.
Programming languages are the tools used to construct formal descriptions consists of finite
computations (algorithms), in which each computation further consists of operations that
transform a given initial state into the final state. In the context of factual information that can
consist of, for example, a definition, a theorem, a hypothesis, a rule, or an algorithm. We shall be
concerned with the engineering of compilers.

1.2.1 Lexical
The lexical syntax (token structure), which is processed by the lexer and the phrase syntax, is
processed by the parser. The lexical syntax is usually a regular language, whose alphabet consists
of the individual characters of the source code text. The phrase syntax is usually a context-free
language, whose alphabet consists of the tokens produced by the lexer. In computer science,
lexical analysis is the process of converting a sequence of characters into a sequence of tokens,
i.e. meaningful character strings. A program or function that performs lexical analysis is called a
lexical analyzer, lexer, tokenizer, or scanner, though "scanner" is also used for the first stage of a
lexer.
1.2.2 Parser
Within computational linguistics the term is used to refer to the formal analysis by a computer of
a sentence or other string of words into its constituents, resulting in a parse tree showing their
syntactic relation to each other, which may also contain semantic and other information. The
term has slightly different meanings in different branches of linguistics and computer science. In
order to parse natural language data, researchers must first agree on the grammar to be used. The
choice of syntax is affected by both linguistic and computational concerns; traditional sentence
parsing is often performed as a method of understanding the exact meaning of a sentence,
sometimes with the aid of devices such as sentence diagrams. It usually emphasizes the
importance of grammatical divisions such as subject and predicate. Parsing or syntactic analysis
is the process of analyzing a string of symbols, either in natural language or in computer
languages.

1.2.3 Lexer Parsing Processor


While using a lexical scanner and a parser together, the parser is the higher level routine. These
generators are a form of domain-specific language, taking in a lexical specification – generally
regular expressions with some mark-up and outputting a lexer. The lexer then scans through the
input recognizing tokens. Automatically generated lexer may lack flexibility, and thus may
require some manual modification or a completely manually written lexer. The next stage is
parsing or syntactic analysis, which is checking that the tokens form an allowable expression.
This is usually done with reference to a context-free grammar which recursively defines
components that can make up an expression and the order in which they must appear. However,
not all rules defining programming languages can be expressed by context-free grammars alone,
for example type validity and proper declaration of identifiers. These rules can be formally
expressed with attribute grammars. This can be done in essentially two ways:

1 Top-down parsing- Top-down parsing can be viewed as an attempt to find left-most


derivations of an input-stream by searching for parse trees using a top-down expansion of the
given formal grammar rules.
2 Bottom-up parsing - A parser can start with the input and attempt to rewrite it to the start
symbol. Intuitively, the parser attempts to locate the most basic elements, then the elements
containing these, and so on. LR parsers are examples of bottom-up parsers. Another term
used for this type of parser is Shift-Reduce parsing.

1.2.4 Optimal Storage Management


The important goals are the most economical use of memory and the simplicity of access
functions to individual objects and hence said Optimal. In Static Storage Management, if the
compiler can provide fixed addresses for all objects at the time the program is translated. This
can be done by or the condition is fulfilled for languages like FORTRAN and BASIC, and for
the objects lying on the outermost contour of an ALGOL 60 or Pascal program. While if the
storage for the elements of an array with dynamic bounds is managed separately, then the
condition can be forced to hold. That is particularly interesting when we have additional
information that certain procedures are not recursive, as in recursive programs the storage is
being determined from analysis of the procedure calls.

1.3 Pros and Cons


Computers read commands from a machine language written in binary, i.e., long strings of zeros
and ones. While computers can read this language efficiently, most human programmers cannot.
That is why programmers work in a programming language they can understand, which they
then translate to the machine language the computer can understand. While many newer
languages use interpreters that translate from one to the other as the program runs, older
programming languages used compilers that did this translation entirely before the computer
executed the program. The translation of code from some human readable form to machine code
must be “correct”, i.e., the generated machine code must execute precisely the same computation
as the source code. In general, there is no unique translation from source language to a
destination language. No algorithm exists for an “ideal translation”. Translation is a complex
process. The source language and generated code are very different. To manage this complex
process, the translation is carried out in multiple passes.
1.3.1 Advantage: Hardware Optimization

While being locked into a specific hardware package has its downsides, compiling a program can
also increase its performance. Users can send specific options to compilers regarding the details
of the hardware the program will be running on. This allows the compiler to create machine
language code that makes the most efficient use of the specified hardware, as opposed to more
generic code. This also allows advanced users to optimize a program's performance on their
computers.

1.3.2 Advantage: Self-Contained and Efficient

One major advantage of programs that are compiled is that they are self-contained units that are
ready to be executed. Because they are already compiled into machine language binaries, there is
no second application or package that the user has to keep up-to-date. If a program is compiled
for Windows on x 86 architecture, the end user needs only a Windows operating system running
on x 86 architecture. Additionally, a precompiled package can run faster than an interpreter
compiling source code in real time.
Compiled languages are converted directly into machine code that the processor can execute. As
a result, they tend to be faster and more efficient to execute than interpreted languages. They also
give the developer more control over hardware aspects, like memory management and CPU
usage.

1. Object code can be saved to disk and run when required


2. Executes faster
3. Object code can be distributed or executed without the compiler
4. Secure as object code cannot be read without reverse engineering

5. Programs developed using low level languages are fast and memory efficient.
6. The translation is only done once as a separate process
7. Compiled programs can run on any computer

8. There is no need of any compiler or interpreters to translate the source to machine code.
Thus, cuts the compilation and interpretation time.
9. Low level languages provide direct manipulation of computer registers and storage.
10. It can directly communicate with hardware devices.
1.3.3 Disadvantage: Compile Times
One of the drawbacks of having a compiler is that it must actually compile source code. While
the small programs that many novice programmers code take trivial amounts of time to compile,
larger application suites can take significant amounts of time to compile. When programmers
have nothing to do but wait for the compiler to finish, this time can add up—especially during
the development stage, when the code has to be compiled in order to test functionality and
troubleshoot glitches.
1.3.4 Disadvantage: Hardware Specific
Because a compiler translates source code into a specific machine language, programs have
to be specifically compiled for OS X, Windows or Linux, as well as specifically for 32-bit
or 64-bit architectures. For a programmer or software company trying to get a product out
to the widest possible audience, this means maintaining multiple versions of the source
code for the same application. These results in more time spent on source code
maintenance and extra trouble when updates are released.
1. If error occur whole program have to compile again
2. Programs developed using low level languages are machine dependent and are not portable.
3. It is difficult to develop, debug and maintain
4. Will only run on computer that has same platform
5. Program cannot be changed without going back to the source code
6. Object code needs to be produced before a final executable file; this can be a slow process.
7. Programmer must have additional knowledge of the computer architecture of particular
machine, for programming in low level language.
8. Low level programming usually results in poor programming productivity.
1.4 Conclusion
Besides covering basic compilation issues, the course yields an implemented compiler that can
serve as a test bed for coursework implementation for compiler. We described an improved
approach for a compiler which partitions a high-level language program. Further research will
actually quantify the advantages in relation to the current system. The implementation and source
language is Scheme, and the target language is assembly code.

The compilation process is one of the steps in executing a program. Understanding how
compilers work and what goes on "behind the scenes" will help you get better at developing
software. The need for compilers arises from the need for high-level languages, which are more
relevant for problem solving and software development by humans. Moreover, high-level
languages are machine independent.

Since development of a compiler is a relatively complex system-development effort, typically


having many users and developers, and will be maintained over a life of many years, a formal
process should be used for its development. The development process should extend from
requirements through verification and validation, and should include reviews, tests, analysis and
measures, quality assurance, configuration control, and key documentation. The development
process is a part of the overall system life cycle process, which additionally, includes
deployment, maintenance, disposal and archival storage. The compiler development process
should consist of procedures for writing and documenting the needs and requirements, the
architecture and design, construction, integration, and verification and validation of the compiler.
Documentation should also include the formal foundations and techniques used, tradeoffs made,
alternatives evaluated, and the chosen alternative for design or implementation. Full coverage of
all of these in detail is beyond the scope of this course. As we proceed through this course,
however, we include high-level needs, requirements, functions, performance considerations, and
verification and validation issues for a compiler and its parts.

1.5 References
1 http://ijirt.org/master/publishedpaper/IJIRT100158_PAPER.pdf
2 https://ijcsmc.com/docs/papers/October2014/V3I10201482.pdf
3 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.727.1786&rep=rep1&type=pdf
4 https://learn.saylor.org/course/view.php?id=74&sectionid=705
5 https://www.reddit.com/r/ProgrammingLanguages/comments/9zo2qx/programming_languag
ecompiler_research_ideas/

6 https://www.computer.org/csdl/journal/ts/5555/01/09353261/1r8kzlTfYRy
7 https://www.ijert.org/research/a-literature-survey-on-artificial-intelligence-
IJERTCONV5IS19015.pdf

You might also like