1 - Introduction - Lecture 01

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

1) Majority of texts, diagrams and tables in the slide is based

on the text book Compilers: Principles, Techniques, and


Tools by Aho, Sethi, Ullman and Lam.
2) Some texts and diagrams are based on MIT Slides on
Compilers Course 6.035.
• Compilers: Principles, Techniques, and Tools by Aho, Sethi, Ullman
and Lam, Addison-Wesley, 2006.
• Modern Compiler Implementation in Java (Tiger book) A.W. Appel
Cambridge University Press, 1998.
• Engineering a Compiler (Ark book) Keith D. Cooper, Linda Torczon
Morgan Kaufman Publishers, 2003.
• Advanced Compiler Design and Implementation (Whale book) Steven
Muchnick, Morgan Kaufman Publishers, 1997.
• Introduction to Compilers and Language Design, Douglas Thain, Lulu,
First Edition, 2019.
• Introduction to compilers, history of compilers, motivation, objectives, language
processing system, background issues and programming language principles.
• Syntax definition, Syntax directed translator, parsing, translator for simple expressions,
etc.
• Lexical analysis, regular expressions, finite-state machines, and scanner-generator tools.
• Parsing methods, top-down (recursive-descent, LL) and bottom-up (LR and its variants).
• Introduces the principal ideas in syntax-directed definitions and syntax-directed
translations.
• Generate intermediate code for a typical programming language.
• Run-time environments, especially management of the run-time stack and garbage
collection.
• Object-code generation. It covers construction of basic blocks, generation of code from
expressions and basic blocks, and register-allocation techniques.
• Code optimization, including ow graphs, data- ow frameworks, and iterative algorithms for
solving these frameworks.
• 1940: Programs were written in assembly language.
• 1951: The first practical compiler was written by Corrado Böhm for his PhD
thesis.
• 1951-2: Grace Hopper developed A-0 system for the UNIVAC I (First
implemented compiler).
• 1952: The first AUTOCODE and compiler in the modern sense were developed
by Alick Glennie at the University of Manchester for the Mark 1 computer.
• 1957: FORTRAN compiler developed by team led by John Backus at IBM.
(Turing award)
• 1958: The first ALGOL 58 compiler was completed by Friedrich L. Bauer et. al.,
for the Z22 computer.
• 1960s: Development of the first bootstrapping compiler for LISP.
• Theory of Computation –
• Finite State Automata, Grammars and Parsing, data-flow
• Algorithms –
• Graph manipulation, dynamic programming
• Data structures –
• Symbol tables, abstract syntax trees
• Systems –
• Allocation and naming, multi-pass systems,
• Computer Architecture –
• Memory hierarchy, instruction selection, interlocks and latencies, parallelism
• Security –
• Detection of and Protection against vulnerabilities
• Software Engineering –
• Software development environments, debugging
• Artificial Intelligence –
• Heuristic based search for best optimizations
• Compiler connects high-level programming and hardware.
• Understanding compilation techniques is essential for understanding how
programming languages and computers work together.
• Many applications contain little languages for customization and flexible control.
• Word macros, layout descriptions, document descriptions.
• Compiler techniques are needed to properly design and implement these extension
languages.
• Compilers influence the design of programming languages and computer
architectures.
• Most programs need to do some text processing.
• A useful software design technique for large projects is to develop a special-purpose
language that makes the project easy to implement.
• Compilers offers to get experience with development of a larger piece of software.
• How about natural languages?
• English??
• “Open the directory where a couple of pdf files are kept.”
• “I am sorry Ravi, am afraid cannot do that”
• We are not there yet!!

• Natural Languages:
• Powerful, but…
• Ambiguous
• Same expression describes many possible actions
• Properties
• need to be precise
• need to be concise
• need to be expressive
• need to be at a high-level (lot of abstractions)
PROJECT MANAGERS
“I want UVT project get done by next week.”

PROJECT LEADERS
“Get fixed-deposit and rollback modules done within next couple of days.”

SOFTWARE ENGINEER/PROGRAMMER
“Just follow the instructions”
• Write a program using a programming language
• High-level Abstract Description
• Microprocessors talk in assembly language
• Low-level Implementation Details

• Compiler does the translation


• Read and understand the program
• Precisely determine what actions it require
• Figure-out how to carry-out those actions
• Instruct computer to carry out those actions
• Standard Imperative Language (C, C++, Java)
• State
• Variables
• Structures,
• Arrays
• Computation
• Expressions (arithmetic, logical, etc.)
• Assignment statements
• Control flow (conditionals, loops)
• Procedures
• State
• Registers
• Memory with Flat Address Space
• Machine code
• load/store architecture
• Load, store instructions
• Arithmetic, logical operations on registers
• Branch instructions
• A compiler is a program that can read a program in one language and
translate it into an equivalent program in another language.
• An important role of the compiler is to report any errors in the source
program that it detects during the translation process.
• Compiler analyzes all program statements in one go or use multiple passes.
• Interpreter analyzes the source program by considering one statement at a time.
• Compiler takes more time to analyze the entire source program.
• Interpreter takes less time to analyze.
• Compiler displays all errors concurrently and detection of errors of large source
program is quite troublesome.
• Interpreter presents the errors of each statement one after another.
• The execution time of a program is less as the code is already translated into
machine code.
• The time of execution is comparatively slower than compiler.
• Compiler requires more memory as it generates object code which needs linking.
• Interpreter is memory efficient as object code is not generated and unlike compiler, it does
not generate any intermediate representation.
Language processing system:
Process of execution of programs
• Preprocessor
• Compiler
• Assembler
• Linker/Loader
Source Program Loader Loads
Executable Machine
Code on Memory for
Execution
Analysis of Program

Linker Links All pieces


of Code into Executable
Error Reporting Machine Code

Assembly Code
Machine Code
Generation
• Language processing is an important component of programming.
• A large number of systems software and application programs require
structured input.
Operating Systems (command line processing)
Databases (Query language processing)
Type setting systems like Latex
• Software quality assurance and software testing.
• Where ever input has a structure one can think of language
processing
• To learn how to design and implement compilers.
• Understand, modify and maintain compilers.
• Learn to use language processing technology for software development
• Compilers enable programming language instead of machine
instructions.
• Flexibility, Portability, Modularity, Simplicity, Programmer Productivity
• Also Efficiency and Performance
• Compiler can be treated as a single box that maps a source program into a
semantically equivalent target program. (Mapping: analysis and synthesis)
• Analysis: The analysis part breaks up the source program into constituent
pieces and imposes a grammatical structure on them.
• It then uses this structure to create an intermediate representation of the
source program.
• Symbol table
• Synthesis: The synthesis part constructs the desired target program from
the intermediate representation and the information in the symbol table.
• The analysis part is often called the front end of the compiler; the
synthesis part is the back end.
• The lexical analyzer reads the stream of characters making up the source
program and groups the characters into meaningful sequences called
lexemes.
• For each lexeme, the lexical analyzer produces as output a token of the
form
<Token-name, attribute-value>
• For example, suppose a source program contains the assignment
statement
position = initial + rate * 60
• The characters in the above assignment could be grouped into lexemes
and mapped into tokens passed on to the syntax analyzer.
<id, 1> <=> <id, 2> <+> <id, 3> <*> <60>
Translation of an Assignment Statement
• The parser uses the first components of the tokens produced by the
lexical analyzer to create a tree-like intermediate representation that
depicts the grammatical structure of the token stream.
• A typical representation is a syntax tree in which each interior node
represents an operation and the children of the node represent the
arguments of the operation.
• Use context-free grammars to specify the grammatical structure of
programming languages and construct efficient syntax analyzers
automatically from certain classes of grammars.
• The semantic analyzer uses the syntax tree and the information in the
symbol table to check the source program for semantic consistency with
the language definition.
• It also gathers type information and saves it in either the syntax tree or
the symbol table, for subsequent use during intermediate-code
generation.
• An important part of semantic analysis is type checking, where the
compiler checks that each operator has matching operands.
• The language specification may permit some type conversions called
coercions.
• In the process of translating a source program into target code, a
compiler may construct one or more intermediate representations,
which can have a variety of forms.
• Syntax trees are a form of intermediate representation; they are
commonly used during syntax and semantic analysis.
• Generate an explicit low-level or machine-like intermediate
representation.
• This intermediate representation should have two important properties:
it should be easy to produce and it should be easy to translate into the
target machine.
• Consider an intermediate form called three-address code.
• The machine-independent code-optimization phase attempts to improve
the intermediate code so that better target code will result.
• A simple intermediate code generation algorithm followed by code
optimization is a reasonable way to generate good target code.
• There is a great variation in the amount of code optimization different
compilers perform.
• The code generator takes as input an intermediate representation of the
source program and maps it into the target language.
• If the target language is machine code, registers or memory locations are
selected for each of the variables used by the program.
• Then, the intermediate instructions are translated into sequences of
machine instructions that perform the same task.
• A crucial aspect of code generation is the judicious assignment of
registers to hold variables.
• An essential function of a compiler is to record the variable names used
in the source program and collect information about various attributes
of each name.
• These attributes may provide information about the storage allocated
for a name, its type, its scope and in the case of procedure names, such
things as the number and types of its arguments, the method of passing
each argument (for example, by value or by reference), and the type
returned.
• The symbol table is a data structure containing a record for each variable
name, with fields for the attributes of the name.
• Phases deal with the logical organization of a compiler.
• In an implementation, activities from several phases may be grouped
together into a pass that reads an input file and writes an output.
• For example, the front-end phases of lexical analysis, syntax analysis,
semantic analysis, and intermediate code generation might be grouped
together into one pass.
• Code optimization might be an optional pass.
• Then there could be a back-end pass consisting of code generation for a
particular target machine.
• Compiler construction tools*** use specialized languages for specifying and
implementing specific components. Some commonly used compiler-
construction tools include –
• Parser generators that automatically produce syntax analyzers from a grammatical
description of a programming language.
• Scanner generators that produce lexical analyzers from a regular-expression description of
the tokens of a language.
• Syntax-directed translation engines that produce collections of routines for walking a parse
tree and generating intermediate code.
• Code-generator generators that produce a code generator from a collection of rules for
translating each operation of the intermediate language into the machine language for a
target machine.
• Data-flow analysis engines that facilitate the gathering of information about how values
are transmitted from one part of a program to each other part. Data-flow analysis is a key
part of code optimization.
• Compiler-construction toolkits that provide an integrated set of routines for constructing
various phases of a compiler.

***(language editors, debuggers, version managers, profilers, test harnesses)


• Incremental Compiler: It performs the recompilation of only modified source
rather than compiling the whole source program.
• Cross Compiler: There may be a case that source, target and compiler languages
are different. The source program runs on one machine and produce target
code for another machine. Such compiler is called cross compiler.
• Bootstrapping
• Write the compiler for new language B, in existing language A.
• Compile the compiler for language B, using the existing compiler for language A,
and verify its correctness.
• Rewrite the compiler for new language B, in language B (since you now have a
compiler for language B).
• Compile the rewritten compiler for language B, and verify its correctness.
• Replace the B compiler that you originally wrote in language A, with the B compiler
that you rewrote in language B.
• You now have a compiler for B that is capable of compiling itself
• Source to source compiler: These type of compilers translate the source
code of one programming language into the source code of another
programming language.
• Decompiler: The reverse of the compiler is known as Decompiler. It
translates the machine code into the source code.
• Implementation of High-Level Programming Languages
• Optimizations for Computer Architectures
• Parallelism
• Memory hierarchies
• Design of New Computer Architectures
• RISC, CISC
• Specialized architectures – VLIW, SIMD
• Program Translations
• Binary translation
• Hardware synthesis
• Database query interpreters
• Compiled simulations
• Software Productivity Tools
• Type checking
• Bounds checking
• Memory management tools (Purify tools) – Garbage collection

You might also like