Compiler Design Assignment

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

DEBRE TABOR UNIVERSTY

GAFAT INSTITUTE OF TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE
Compiler design ASSIGNMENT
GROUP MEMBERS
NAME ID NO
1. TEWODROS ABERE------------------------------ 1637
2. DANIEL TIGISTU----------------------------------

Submitted to: Mr. Yirga

Submission date: 03/10/2016 E.C

1
1) What are the three types of compilers and briefly explain them?

Single-Pass Compiler: A single-pass compiler processes the source code in one pass. This
means it reads, analyzes, and generates the target code in a single traversal of the source code.
The single-pass compiler is typically used for simple programming languages where the entire
code can be effectively analyzed and translated in one go. An example of a single-pass compiler
is the early Pascal compiler.

Advantages:

1. Speed: As it reads the source code only once, it is typically faster than multi-pass
compilers.
2. Memory Efficiency: Requires less memory since it doesn’t need to store the
intermediate representation of the entire program.

Disadvantages:

1. Limited Optimization: Because it processes the source code in one pass, there are fewer
opportunities for optimization compared to multi-pass compilers.
2. Complexity in Handling Forward Declarations: Managing forward declarations and
references can be more complex as all symbols need to be known in the first pass.

Multi-Pass Compiler: A multi-pass compiler processes the source code in multiple passes. Each
pass performs a specific phase of the compilation process such as lexical analysis, syntax
analysis, semantic analysis, optimization, and code generation. For instance, the first pass might
be responsible for lexical analysis and syntax analysis, while the second pass performs semantic
analysis and code generation.

Advantages:

1. Enhanced Optimization: Multiple passes allow the compiler to perform detailed and
complex optimizations.
2. Better Error Handling: Errors can be detected at various stages, leading to more precise
and helpful error messages.
3. Modularity: Each pass can focus on a specific task, leading to a modular and
manageable compilation process.

Disadvantages:

1. Slower Compilation: Requires more time as it processes the source code multiple times.
2. Higher Memory Usage: Needs to store intermediate representations of the source code
across passes.

Just-In-Time (JIT) Compiler: The JIT compiler compiles the code at runtime, translating
bytecode or intermediate code into machine code just before execution. This approach is used in

1
managed runtime environments like Java (Java Virtual Machine - JVM) and .NET (Common
Language Runtime - CLR).

Advantages:

1. Adaptive Optimization: Can optimize the code based on runtime information and usage
patterns.
2. Portability: Intermediate code can be executed on any platform with a compatible JIT
compiler, providing platform independence.

Disadvantages:

1. Runtime Overhead: Introduces a compilation overhead at runtime which can


affect performance.
2. Complex Implementation: Requires sophisticated techniques to manage and
optimize code during execution.

Another common way to categorize compilers is based on the target code they generate. Here are
three different types of compilers based on this criterion:

1. Native Code Compilers:


o Description: These compilers translate source code directly into machine code
that can be executed by a specific type of processor. The machine code is tailored
for a specific hardware architecture.
o Use Case: Native code compilers are used when high performance is critical, as
they generate highly optimized machine code for a particular platform.
o Advantages: They can produce very efficient and fast executables because they
leverage specific hardware features.
o Example: GCC (GNU Compiler Collection) when it targets native binaries for
x86, ARM, or other architectures.
2. Cross Compilers:
o Description: These compilers generate machine code for a different architecture
or platform than the one on which the compiler is running. This allows developers
to create software for embedded systems or different operating systems from a
single development environment.

1
o Use Case: Cross compilers are commonly used in embedded systems
development, where the target platform is different from the development
platform.
o Advantages: They enable the development of software for platforms that may not
have the resources or capability to run a compiler.
o Example: The ARM GCC compiler can be run on a Windows or Linux machine
to generate code for ARM-based devices.
3. Source-to-Source (Trans-compilers) or Trans-pilers:
o Description: These compilers translate source code written in one high-level
programming language into another high-level programming language. They do
not generate machine code directly.
o Use Case: Trans-pilers are useful for porting code between different
programming languages or making code more readable or maintainable.
o Advantages: They allow leveraging existing codebases and libraries in new
environments or languages.
o Example: Babel, a JavaScript trans-piler, converts ECMAScript 2015+ code into
a backwards-compatible version of JavaScript that can run in older environments.

2) What is the need for separating the analysis phase into lexical analysis and
parsing?

In compiler design, the analysis phase is crucial for translating high-level programming
languages into machine code. This phase is typically divided into two distinct parts: lexical
analysis and parsing. This separation brings several key benefits, which can be summarized as
follows:

1. Simplicity and Modularity

 Lexical Analysis (Lexer):


o Function: Converts sequences of characters from the source code into tokens.
o Example: Converts if (a == b) { ... } into tokens like IF, (, IDENTIFIER,
==, IDENTIFIER, ), {, ..., }.
o Benefit: Simplifies the input for the parser by breaking it down into manageable
pieces (tokens).
 Parsing (Parser):

1
o Function: Takes tokens generated by the lexer and arranges them according to the
language's grammar to form a syntactic structure, usually a parse tree or abstract
syntax tree (AST).
o Example: Converts tokens into a tree structure representing the program's syntax.
o Benefit: Focuses on the hierarchical structure of the language, without worrying
about character-level details.

2. Specialization

 Lexical Analyzer:
o Uses finite automata for efficient pattern matching and token recognition.
o Specializes in identifying keywords, operators, identifiers, literals, etc.
 Parser:
o Uses context-free grammars and algorithms like LL, LR, or their variations to
build the syntactic structure.
o Specializes in enforcing the language's syntax rules and creating a meaningful
structure from tokens.

3. Error Detection and Reporting

 Lexical Analysis:
o Detects errors such as invalid characters, malformed literals, or unrecognized
tokens early in the compilation process.
o Provides clear and specific error messages for character-level mistakes.
 Parsing:
o Detects syntax errors related to the incorrect arrangement of tokens, such as
missing semicolons, unmatched parentheses, or incorrect statement structures.
o Provides more precise and context-aware error messages for syntax-related issues.

4. Efficiency

 Lexical Analysis:
o Can be done in a single pass over the input source code, making tokenization
efficient.
o Prepares the data for the parser in a streamlined form.
 Parsing:
o Works with tokens, allowing it to focus on syntactic rules without being slowed
down by low-level character processing.
o Can be more efficient and focused in building the syntactic structure.

5. Reusability

 Lexer:
o Can be reused for different languages that share common token patterns, like
identifiers and literals.

1
o Allows the lexer to be independent of specific syntax rules.
 Parser:
o Can be adapted to different languages or dialects by modifying the grammar rules
without changing the lexical analysis.
o Facilitates language extensions and modifications.

6. Tool Support

 Lexical Analysis:
o Tools like Lex or Flex help generate lexical analyzers from regular expressions,
making the lexer creation straightforward and automated.
 Parsing:
o Tools like Yacc or Bison generate parsers from context-free grammars,
automating the parser creation and ensuring correctness.

7. Maintainability and Extensibility

 Separated Concerns:
o Changes to the lexical rules (e.g., new keywords) can be made without affecting
the parser.
o Changes to the syntax rules (e.g., new control structures) can be made without
affecting the lexer.
 Overall Benefit:
o This separation reduces complexity, making the compiler easier to understand,
maintain, and extend. Each component can be independently developed, tested,
and optimized.

In summary, separating lexical analysis and parsing allows for a more modular, efficient, and
maintainable compiler design. It leverages specialization, improves error handling, and enhances
reusability and tool support, leading to robust and flexible compilers.

3) Given the following grammar

E → TE’

E’ → +TE’| ε

T → FT’

T’ → * FT’| ε

F→ (E) | id

1
Parse id + id * id using the non-left recursive grammar above using left-most derivation?
Left-Most Derivation
Parsing using left-most derivation is a method in which the leftmost non-terminal in a production
is always the first to be replaced by one of its rules. This process continues recursively until the
entire input string is generated. In essence, left-most derivation mimics the steps a top-down
parser would take when constructing a parse tree.

left-most derivation provides a structured way to parse input strings by always expanding the
leftmost non-terminal first, facilitating top-down parsing methods and contributing to the
systematic construction of parse trees.

Steps in Left-Most Derivation


1. Start with the Start Symbol: Begin with the start symbol of the grammar.

2. Replace the Leftmost Non-Terminal: At each step, identify the leftmost non-terminal in
the current string and replace it using one of its production rules.

3. Continue Until Complete: Repeat this process until the string consists entirely of
terminal symbols, matching the input string.

Start with the start symbol: E

Expand E using the rule E → TE': E → TE'

Expand T using the rule T → FT': TE' → FT'E'

Expand F using the rule F → id: FT'E' → idT'E'

Expand T' using the rule T' → ε: idT'E' → idE'

Expand E' using the rule E' → +TE': idE' → id + TE'

Expand T using the rule T → FT': id + TE' → id + FT'E'

Expand F using the rule F → id: id + FT'E' → id + idT'E'

Expand T' using the rule T' → FT': id + idT'E' → id + id * FT'E'

1
Expand F using the rule F → id: id + id * FT'E' → id + id * idT'E'

Expand T' using the rule T' → ε: id + id * idT'E' → id + id * idE'

Expand E' using the rule E' → ε: id + id * id

So, the left-most derivation of id + id * id is:

E → TE'

→ FT'E'

→ idT'E'

→ idE'

→ id + TE'

→ id + FT'E'

→ id + idT'E'

→ id + id * FT'E'

→ id + id * idT'

→ id + id * id

4) Explain bottom-up parsing and types of bottom-up parsing using examples


(hint: LR parser (LR(0), SLR(1), LALR(1), CLR(1))vs Operator precedence
parser

Bottom-up parsing is a method in compiler design that constructs the parse tree from the leaves
(input symbols) to the root (start symbol). It tries to reduce a string to the start symbol by
repeatedly applying grammar rules in reverse (reductions). This is also known as shift-reduce
parsing.

1
Key Concepts:

 Shift: Move the next input symbol onto the stack.


 Reduce: Replace a sequence of symbols on the stack with a non-terminal according to a
grammar rule.

Types of Bottom-Up Parsing

1. LR Parsing (Left-to-right, Rightmost derivation)

 LR(0) Parser:
o Uses no lookahead symbols.
o Simple but can handle a limited set of grammars.
o Example: Consider a grammar with productions S → aS | b. An LR(0) parser
uses state transitions based on items (e.g., S → a•S) to decide shifts and
reductions without considering the next input symbol.
 SLR(1) Parser (Simple LR):
o Uses one lookahead symbol to decide reductions.
o Resolves some conflicts in LR(0) parsing by incorporating follow sets.
o Example: For the same grammar, SLR(1) would consider the follow set of S to
resolve whether to reduce S → aS or shift based on the lookahead symbol.
 LALR(1) Parser (Look-Ahead LR):
o Merges similar states in the SLR(1) parser to reduce the number of states.
o Most commonly used due to its balance between power and efficiency.
o Example: In practice, compilers like Yacc use LALR(1) parsing, where states
with the same core items but different lookahead sets are merged.
 CLR(1) Parser (Canonical LR):
o Uses detailed lookahead for each item in the states.
o More powerful and complex, can handle a wider range of grammars but with a
larger state table.
o Example: Each state includes specific lookahead symbols, making it capable of
resolving more conflicts but requiring more memory.

2. Operator Precedence Parsing

 Description: A simpler form of bottom-up parsing that uses precedence relations


between operators to handle expressions without explicit parse trees.
 Precedence Relations:
o Less than (<): Indicates that an operator should appear before another in the parse
tree.
o Equal to (=): Indicates that operators are at the same level.
o Greater than (>): Indicates that an operator should appear after another.
 Example: For an expression like a + b * c, the parser recognizes that * has higher
precedence than +, so it processes b * c before a + (b * c).

1
Example of Bottom-Up Parsing

Let's parse the string id + id * id using a simplified grammar:

E→E+T
E→T
T→T*F
T→F
F → id

We'll use a generic bottom-up parsing approach to illustrate the process:

1. Input: id + id * id, Stack: []


o Shift id: Stack: [id], Input: + id * id

2. Reduce F → id: Stack: [F], Input: + id * id


o Reduce T → F: Stack: [T], Input: + id * id
o Reduce E → T: Stack: [E], Input: + id * id
3. Shift +: Stack: [E, +], Input: id * id
o Shift id: Stack: [E, +, id], Input: * id

4. Reduce F → id: Stack: [E, +, F], Input: * id


o Reduce T → F: Stack: [E, +, T], Input: * id

5. Shift *: Stack: [E, +, T, *], Input: id


o Shift id: Stack: [E, +, T, *, id], Input: ``

6. Reduce F → id: Stack: [E, +, T, *, F], Input: ``


o Reduce T → T * F: Stack: [E, +, T], Input: ``

7. Reduce E → E + T: Stack: [E], Input: ``


8. Accept: Stack contains the start symbol E, and the input is consumed.

1
5) What are the compiler design tools? Clearly describe those tools.

Compiler design tools are essential for developing, analyzing, and optimizing compilers. These
tools assist in various phases of compiler construction, from lexical analysis to code generation.
Below is a detailed description of key compiler design tools:

1. Lexical Analyzers (Lexers)

lexical analyzers are tools that convert a sequence of characters from the source code into tokens,
which are the atomic units of syntax (e.g., keywords, operators, identifiers).

 Popular Tools: Lex, Flex


 Functionality:
o Pattern Matching: Use regular expressions to identify patterns in the input
string.
o Token Generation: Produce tokens for recognized patterns, which are then
passed to the parser.
 Example: In C, the input int main() would be tokenized into int, main, (, ).

2. Parsers

Parsers take the tokens produced by lexical analyzers and build a parse tree or abstract syntax
tree (AST) based on the grammar of the programming language.

 Popular Tools: Yacc, Bison, ANTLR


 Functionality:
o Syntax Analysis: Check the sequence of tokens for correct syntax according to
the grammar rules.
o Tree Construction: Construct a parse tree or AST representing the hierarchical
structure of the source code.
 Example: For the expression a + b * c, the parser would build a tree showing + as the
root with a as one child and * as the other, which in turn has b and c as children.

3. Syntax-Directed Translators

These tools associate actions with grammar rules to perform translations or transformations as
parsing proceeds.

 Popular Tools: JavaCC, ANTLR


 Functionality:
o Semantic Actions: Attach code snippets to grammar rules to perform specific
actions during parsing.
o Intermediate Code Generation: Produce intermediate representations (IR) such
as three-address code.
 Example: Transforming an arithmetic expression into an intermediate representation
during parsing.

1
4. Intermediate Code Generators

Generate an intermediate representation of the source code, which is independent of the target
machine but easier to optimize.

 Popular Tools: LLVM IR, GCC GIMPLE


 Functionality:
o Code Transformation: Convert high-level constructs into lower-level
intermediate code.
o Optimization: Perform optimizations on the intermediate code.
 Example: Convert a loop construct in a high-level language to a set of instructions in an
intermediate representation.

5. Code Optimizers

Code optimizers optimize the intermediate code to improve performance and efficiency.

 Popular Tools: LLVM, GCC


 Functionality:
o Local Optimization: Optimize code within basic blocks (e.g., constant folding,
dead code elimination).
o Global Optimization: Optimize across basic blocks or functions (e.g., loop
unrolling, inlining).
 Example: Removing redundant calculations or reordering instructions to improve cache
performance.

6. Code Generators

It translates the intermediate code into the target machine code or assembly language.

 Popular Tools: GCC, LLVM


 Functionality:
o Instruction Selection: Map intermediate instructions to machine instructions.
o Register Allocation: Assign variables to machine registers.
o Instruction Scheduling: Order instructions to minimize stalls and pipeline
hazards.
 Example: Generating x86 assembly code from intermediate representations.

7. Assemblers

Converts assembly language code into machine code (binary).

 Popular Tools: GNU Assembler (GAS), Microsoft Macro Assembler (MASM)


 Functionality:
o Assembly: Translate mnemonic operation codes into machine instructions.
o Symbol Resolution: Resolve symbolic names to memory addresses.

1
 Example: Converting mov eax, 1 into the binary equivalent for an x86 processor.

8. Linkers

Combine multiple object files into a single executable, resolving references between them.

 Popular Tools: GNU Linker (LD), Microsoft Linker


 Functionality:
o Symbol Resolution: Resolve external symbols and function calls.
o Address Binding: Assign final memory addresses to code and data sections.
 Example: Linking main.o and utils.o into a single executable program.

9. Debuggers

Provide tools to test and debug the compiled code.

 Popular Tools: GDB, LLDB


 Functionality:
o Breakpoint Management: Set breakpoints to pause execution at specific points.
o Variable Inspection: Examine and modify the values of variables at runtime.
o Step Execution: Execute code line-by-line to monitor behavior.
 Example: Debugging a segmentation fault by inspecting the call stack and variables at
the point of failure.

10. Profilers

Analyze the runtime performance of programs to identify bottlenecks.

 Popular Tools: gprof, Valgrind, Intel VTune


 Functionality:
o Performance Metrics: Measure execution time, memory usage, and CPU
utilization.
o Hotspot Identification: Identify functions or lines of code that consume the most
resources.
 Example: Profiling a program to determine that a specific function is consuming 80% of
the execution time.

These tools collectively support the development and optimization of compilers, ensuring
efficient and correct translation of high-level programming languages into machine code.

6) what is Syntax directed definition (SDD) In compiler design? Why do we use


it?

1
Syntax Directed Definitions (SDD) are a formal method in compiler design used to define the
syntax and semantics of programming languages. An SDD associates attributes with the
grammar symbols and specifies semantic rules for computing these attributes. The attributes can
be classified into two types:

1. Synthesized Attributes: Attributes that are computed from the attribute values of the
children nodes in the parse tree.
2. Inherited Attributes: Attributes that are passed down from the parent and sibling nodes
to a node in the parse tree.

The SDD framework integrates both the syntactic structure (described by a context-free
grammar) and the semantic actions (described by attribute evaluation rules) into a unified
formalism.

Components of SDD

1. Grammar Rules: Define the syntactic structure of the language. Each rule has a left-
hand side (LHS) non-terminal and a right-hand side (RHS) sequence of terminals and
non-terminals.
o Example: E → E1 + T where E, E1, and T are non-terminals, and + is a terminal.

2. Attributes: Values associated with grammar symbols (both terminals and non-terminals).
Attributes can be numerical, strings, references to data structures, etc.
o Example: In the rule E → E1 + T, E, E1, and T could each have an attribute val.

3. Semantic Rules: Specify how to compute the attributes associated with grammar
symbols. These rules are attached to the grammar rules.
o Example: E.val = E1.val + T.val for the rule E → E1 + T.

Example of SDD

Consider a simple arithmetic expression grammar and its associated SDD for computing the
value of expressions:

1. Grammar:

E→E+T
E→T
T→T*F
T→F
F→(E)
F → id

2. Attributes:
o E.val, T.val, and F.val represent the values of the expressions.
o id.val represents the value of the identifier.

1
E → E1 + T { E.val = E1.val + T.val }
E→T { E.val = T.val }
T → T1 * F { T.val = T1.val * F.val }
T→F { T.val = F.val }
F → ( E ) { F.val = E.val }
F → id { F.val = id.val }

In this example, the attribute val is synthesized for all non-terminals.

Why Do We Use SDD?

1. Formal Definition of Semantics:


o SDD provides a clear and formal way to specify the semantics of programming
languages. It integrates syntactic and semantic specifications, making it easier to
understand how constructs are evaluated or translated.

2. Compiler Construction:
o In the construction of compilers, SDDs facilitate the implementation of various
compiler phases such as syntax analysis, semantic analysis, and intermediate code
generation.

3. Modularity and Maintainability:


o By associating semantic rules directly with grammar productions, SDDs promote
modularity. Changes to the language’s semantics can be made by updating the
corresponding semantic rules without affecting the entire compiler.

4. Attribute Evaluation:
o SDDs provide a systematic approach to attribute evaluation, which is crucial for
generating intermediate code, performing type checking, and optimizing code.

5. Tool Support:
o Many compiler construction tools and frameworks, such as ANTLR, Yacc, and
Bison, support syntax-directed definitions, making it easier to develop robust and
efficient compilers.

Evaluation Methods in SDD

The evaluation of attributes in an SDD can be performed using different strategies, including:

1. Parse-Tree Traversal:
o Attributes can be evaluated by traversing the parse tree. Synthesized attributes are
typically evaluated in a bottom-up traversal, while inherited attributes may require
a top-down or other order of traversal.

2. Dependency Graph:

1
o A dependency graph can be constructed where nodes represent attributes and
edges represent dependencies between attributes. Topological sorting of the
dependency graph ensures that each attribute is evaluated only after all attributes
it depends on have been computed.

3. L-Attributed SDDs:
o A special class of SDDs where inherited attributes can be computed in a single
left-to-right pass over the input, making them suitable for efficient parsing
algorithms like LL parsing.

Example of SDD in Action

Consider parsing and evaluating the expression 3 + 4 * 2 using the given grammar and SDD:

1. Parse Tree Construction:

E
/|\
E+T
| |
T F
| |
F id

2. Attribute Evaluation:
o For 3 + 4 * 2, the parse tree nodes and their attributes are evaluated as follows:
 F.val = 3, F.val = 4, F.val = 2
 T.val = F.val = 2
 T.val = T1.val * F.val = 4 * 2 = 8
 E1.val = 3, E.val = E1.val + T.val = 3 + 8 = 11

This process demonstrates how SDDs enable systematic evaluation of expressions by attaching
semantic rules to grammar productions.

In conclusion, Syntax Directed Definitions (SDDs) are a powerful and formal method for
specifying both the syntax and semantics of programming languages. They play a crucial role in
compiler design by providing a structured way to associate semantic actions with syntactic
constructs, thereby facilitating the construction of robust and efficient compilers.

You might also like