Download as pdf
Download as pdf
You are on page 1of 19
Compiler Construction (CS354) Introduction to Compiler Construction (An Overview) Lecture 02 Imran RAO imranrao@gmail.com Definition and Purpose of Compilers A compiler isa specialized computer program that translates source code written in one programming language (the source language) into another language (the farget language). The target language is often machine code, which is a low-level language that a computer's processor can execule directly. Translation: Convert high-level source code to lower-level language (like machine code] for computer execution. Optimization: Improve code performance by reducing size, increasing speed, and optimizing memory usage. Error Detection: Identify syntax and semantic errors in the source code during compilation, aiding in debugging. Platform Independence: Enable the same source code to run on different hardware architectures, enhancing software portability. Abstraction and Productivity: Allow programmers to focus on program logic rather than hardware specifics, increasing efficiency. Facilitate Modern Software Development: Support complex software development with features like object-orientation and functional programming CS354 Compiler Construction Dr: Imran RAO BLUE BRACKETS Source, Target, & Implementation Source Language - the language in which the source code is written Target Language — the language in which the object code is written Implementation Language - the language in which the compileris written The implementation language for compilers used to be assembly language. It is now customary to write a compiler in the source language. Why? The compileritself can then be used as a sample program to test the compiler's ability to translate complex programs that utilize the various features of the source language. CS354 Compiler Construction Dr: Imran RAO BLUE BRACKETS Compiler, Interpreters, Assemblers Some programming languages are compiled (C and Java) and some are interpreted (Python and Javascript). A compiled language generates an intermediate artifact that is then interpreted by a runtime. An interpreted language has no intermediate artifact. The interpreter directly executes the specified code. An assembled language is a machine language (binary) code. An assembler is a program that translates instructions that humans can read and write, and converts itinto the language that the computer's processor can run directly. CS354 Compiler Construction Dr: Imran RAO BLUE BRACKETS Compiler, Interpreters, Assemblers Feature Compiler Interpreter Assembler 7 Converts the program Converts the entire program fine by line during, Converts assembly Conversion before execution seme oa language to machine code Scanni Scans the entire program Detects errors line by Detects errors in the first Ee before converting it line phase E detecti Gives a full error report after Detects errors in the first Detects errors during ror detecHon she whole scan phase assembly Cod i Generates intermediate code No intermediate code Generates object code and ode generation 14 then machine code _generation then machine code Slower than compiler, Execution time Faster execution time Slower execution time faster than interpreter GAS, GNU Assembler, Examples G+, Java, Ce Python, Pedl, VB PostScript CS354 Compiler Construction DrImran RAO 5 BLUE BRACKETS Overview of Compiler Structure + The structure of a compilercan be viewed asa pipeline consisting of © front-end: focused on source code analysis © amiddle-end: optimizing the intermediate representation without changing the program's behavior, and © back-end: generating and optimizing machine-specific code. * This division allows for a modular design, where different front- ends can be paired with different back-ends to support multiple source languages and target architectures. Frontend Back-end ( Analysis [> /~_| Synthesis iy Intermediate Sources Code Machine Code Representation Cove CS354 Compiler Construction Dr: Imran RAO BLUE BRACKETS Overview of Compiler Structure Front-End: The front-end of a compiler deals with source code analysis. It's responsible for understanding and checking the source code and preparing it for further stages. The Key Processes are: © Lexical Analysis: Converts code into tokens. © Syntax Analysis: Constructs a syntaxtree based on grammatical rules © Semantic Analysis: Checks tor semantic errors and annatates the syntax tree. © Language Specificity: Ihe front-end fs typically language-specitic, as it must understand the syntax indisemantics of the source lancuage. Middle-End: Also known as the optimizer, the middle-end transforms the program into an intermediate representation (IR). Its goal is to improve the program's performance and efficiency without altering its sernantics. The Key Processes are: © Optimization: Performs various optimizctions on the IR, such as loop optimization, decd code elimination, and constant folding, © Language Neutrality: The middie-end Is generally language-neutral, working with the intermediate. representation rather than language-specific constructs, Back-End: The back-end of a compiler is responsible for generating the target code (often machine code) and performin optimizations specific to the target machine. The Key Processes are: © Code Generation: Converts! to target machine code, © Machine-Specific Optimization: Tailors the code to efficiently utilize the hardware, such as register allocation and instruction scheduling. © Target Specificity: The back-end is specific to the target machine, as it needs to understand the defails of the hardware architecture. CS354 Compiler Construction DrImran RAO 7 BLUE BRACKETS Overview of Compiler Structure Compiler Front- and Back-end Source program (character stream) Rete EIA) Tokens De ONCE Parse tree Front end analysis Rare ee LIS Ce Bae eL Ce RL Abstract syntax tree or other intermediate form Back end Abstract syntax tree or other intermediate form Machine- ar ee eke ORs -2 Modified intermediate form g 2 = = Berens a RL Assembly or object code fachine-Specific Oe BRN LE Modified assembly or object code CS354 Compiler Construction Dr.Imran RAO 8 BLUE BRACKETS Overview of Compiler Structure code generator external libraries CS354 Compiler Construction Interpreter ¥ Intermediate language code Results Execution Dr: Imran RAO BLUE BRACKETS Tools for Compiler Construction + LLVM + ANTLR * Roslyn LLVM Project The TLV project rifaled 1 2000 a The University of Minofs af Urbana-Champaign y Vikram Adve and Chris Latiner. was originally @ research platform focuse: name compicion mesos for venous proyfomiming languages beth sate and dynamic. LLVM encompasses a suite of compiler and toolchain technologies, adaptable for © creating frontends for any programming language and © backends for any instruction set architecture. Central to LLVM is a language-agnostic intermediate representation (IR), functioning as a versaiile, high-level assembly language. This IR enables extensive optimization through multiple transformation passes. While LLVM initially stood for "Low Level Virtual Machine," this designation is no longer used as the project's scope has broadened significantly. Developedin C++, LLVM facilitates optimizations at compile-time, link-time, runtime, and during idle periods Originally tailored for C and C++, LLVM's adaptable design now supports a broad range of programming languages These include, but are not limited to. ActionScript, Ada. C# for NET, Common Lisp. CUDA. D, Delphi, Fortran, Haskell. Java bytecode. Julia, Kotlin, Lua, Objective-C. Open Ruby. Rust. Scala, Swift. and Zig, with some languages employing LLVM directly or generating LLVM IR-compalible compiled programs. CS354 Compiler Construction Dr. Imran RAO 11 BLUE BRACKETS LLVM Project Gang c1c++/00e Pronsacd praxma *Polly: polyhedral loop optimizer CS354 Compiler Construction Dr. Imran RAO 12 BLUE BRACKETS ANTLR "ANTLR (ANother Tool for Language Recognilion)is G soprisicated Tool used for parsing. a crucial process in language processing and compiler design I's specifically designed to handle siructured text or binary files, making it a versatile to0lin various applications. including developing programming languages, software tools, and frameworks. Parser Generation: ANILR automatically generates parsers from defined grammars. these parsers read and dissect source code into elements as dictated by grammar rules. Handling Structured Text and Binary Files: ANILR is versatile, processing both siructured text and binary files, making if suitable for various applications. Parse Trees: The parsers from ANILR build parse trees, visuallyrepresenting a source code's syntactic structure. ANILR facilitates tree-walking, allowing operations like translation or interpretation at each node. Grammar-Based Operation: ANTLR’s key functionality revolves around grammars - sets of rules that define a language's syntax. It generates parsers specifically tailored to these grammars. Language and Tool Development: ANILR is extensively used for creating new programming languages, DSLs, editors, and tools, owing fo its efficient parsing capabilities. Execution and Translation: Beyond parsing, ANILR parsers can perform actions ike translating text or executing Code during the parsing process. CS354 Compiler Construction Dr:Imran RAO 13 BLUE BRACKETS ANTLR PSOAPS Program t declaration statement ANTLR Lexer (Tokenizer) [ter totenatteon #4 viz acekgetmt ANTLR Parser ae ANTLR Abstract var = expression ; Syntax Tree poet i iy term + term ¥ JANTLR Tree Parser] |ANTLR Tree Parser T soner aid ! | lt TPTP Document PSA Abstract foo 1 Syntax Object (TPTP: Thousands of Problems for Theorem Provers) CS354 Compiler Construction Dr. Imran RAO 14 BLUE BRACKETS Roslyn The Roslyn) Net Compler Platform, commonly referred To smply Gs Roslyn 6 Gn open-source inifialive by Microsoff. encompasses a suite of Compilers and code analysis Application Programming Interfaces (APIs) specifically designed for the C# and Visual Basic (VB) programming languages. Open-Source Compilers: Roslyn provides tully open-source implementations of the C# and VB compilers. These compilers are themselves written in the languages they compile, known as “bootstrapping”. This approach allows for a deeper integration with the .NET ecosystem and facilitates continuous development and improvement by the community. Advanced Code Analysis: Roslyn extends beyond traditional compilation by offering rich APIs for syntax and semantic analysis of code. this enables developers to analyzé and manipulate C# and VB code more effectively. IDE Integration: One of the significant advantages of Roslynis its seamless integration with integrated Development Envirénments (IDEs), notably Visual Studio. ths Integration provides developers with Dowerlul features ike reaHtime syntax highlighfing, code completion, refactorings, and diagnostics. Rich APIs for Developers: Roslyn exposes APIs that allow developers to write their own code analysis tools. These APIs make it possible fo create custom static analysis tools, refactorings, and even complex code transformations. Language Feature Development: Roslyn serves as ¢ platform for developing new language features in c# and vB By being open-source and actively developed. it enables. a more dynamic evolution of these languages. Community Contributions: As an open-source project, Roslyn benefits from Contributions from a diverse developer community. This ensures that the platform evolves fo meet the needs of a broad range of users and use cases CS354 Compiler Construction Dr: Imran RAO 15 BLUE BRACKETS Machine Learning and Compiler Construction + Natural Language Processing (NLP), + Large Language Models (LLM) Using LLM for Compiler Optimization Using Large Language Models (LLMs) like GPT-4 for compiler construction and optimization is an innovative approach that intersects the fields of machine learning and traditional compiler technology. LLMs can understand and generate human-like text. They excel in pattern recognition and prediction. LLMs can be used to predict code patterns, suggest optimizations, and even generate parts of the compiler itself. LLMs can assist in generating boilerplate code or even complex functions based on specified requirements. The suggestions made by LLMs need to be verified for correctness and efficiency. CS354 Compiler Construction Dr: Imran RAO 17 BLUE BRACKETS Large Language Models (LLM) as Compilers + The vision begins witha compiledlanguage, with a potential evolution towards interpretation. . Step: by-Step Compilation Process: High-Level Behavior Specification: Use a tool ke Cucumber's Gherkin to specify high+evel behavior. This involves describing software features in a human-readable, domain-specific language. © Detailed Technical Specification: Employ ‘programming by wishful thinking’ to define desired outcomes before implementation details, use tools similar to Cucumber's Step Definitions for delaled specifications, and leverage a GilHub-CopiloHike Al tool for assistance in crafting these specifications in a preferred language from the "Perl and friends" family. Compilation via Specialized LLM: The specifications derived from the initial two steps are processed by a specialized Large Language Model (LLM) that acts as a compiler 1o generate the resulting codebase. o Execution of the Codebase: Ihe generated code is then executed using the appropriate runtime for the "Perl and friends” languages. 0 Manval Verification and Iteration: Manually verify the code’s behavior and, if modifications are required, iterate the process starting again from the first step. CS354 Compiler Construction Dr:Imran RAO 18 BLUE BRACKETS LLM as Compilers: Implications + Productivity Leap with LLM as Compiler: © Mirroring the productivily surges seen with the advent of C in 1972.and Perl in 1987, the concept af "LLM as Compier” promises a similar, significant boost in developer productivity. © Historically, each new language abstraction level (from Assembly to C, thenC to Perl and its equivalents) increased productivity by an order of magnitude. The LLM as Compier concept aims to extend this trend. Role of Lower-Level Languages: © Despite this shift, the necessity for low-level languages ike Assembly and C/C++ persists, especially for specific problems where granular control is essential. © Proficiency across the entire stack, from low-level to high4evellanguages, remains crucial. Comparative Productivity: © The future envisions a scenario where defining software behaviorin a format like Cucumber outpaces productivityin languages lke Perl by an order af magnitude. Complementary Approaches: Compiler vs Teammate: © “LLMas Compiler” represents a new paradigm in Al'srole in programming, differing from the current trend of "LLM as Teammate" seen in tools ike GilHub Copiot and ChatGPT. These approaches are nat in competitian but are complementary, each enhancing different aspects of the software develapment process. CS354 Compiler Construction Dr: Imran RAO 19 BLUE BRACKETS

You might also like