Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Indian Institute of Information Technology, Allahabad

CPD Project

Mini C++ Compiler














Submitted to: Submitted by:

Prof. A Agarwal Raman Goyal (IIT2011021)
Alisha Singh (IIT2011067)
Ankit Bhatia (IIT2011083)
Pramiti Goel (IIT2011100)
Monika Choudhary (IIT2011105)
Sidharth Singla (IIT2011108)
Page 1
Content



Problem Statement..2
Introduction...2
Tools used......3
Methodology/Algorithm...5
Flow Chart8
Work done so far...9
Page 2
Problem Statement

Construct a mini C++ type compiler which is able to strictly identify only C++ code and
report error on any C code which is acceptable in C++ in general (It separates C from
C++, in other words).


Introduction

The name compiler is primarily used for programs that translate source code from a
high-level programming language to a lower level language (e.g. assembly language
or machine code). A compiler is a computer program (or set of programs) that
transforms source code written in a programming language (the source language)
into another computer language (the target language, often having a binary form
known as object code). A compiler is likely to perform many or all of the following
operations: lexical analysis, preprocessing, parsing, semantic analysis (Syntax-
directed translation), code generation, and code optimization. The term compiler-
compiler is sometimes used to refer to a parser generator, a tool often used to help
create the lexer and parser.
The C and C++ programming languages are closely related. C++ grew out of C, as it
was designed to be source-and-link compatible with C. Due to this, development tools
for the two languages (such as IDEs and compilers) are often integrated into a single
product, with the programmer able to specify C or C++ as their source language.
However, due to minor semantic differences, most non-trivial C programs will not
compile as C++ code without modification C++ is not a superset of C.
The incompatibilities between C and C++ should be reduced as much as possible in
order to maximize inter-operability between the two languages. But since C and C++
are two different languages, compatibility between them is useful but not vital;
according to this camp, and efforts to reduce incompatibility should not hinder
attempts to improve each language in isolation.
Page 3
Tools Used

Tools used for development of the compiler are:

1. Flex
Flex is a program generator designed for lexical processing of character input
streams. It accepts a high-level, problem oriented specification for character string
matching, and produces a program in a general purpose language which recognizes
regular expressions. The regular expressions are specified by the user in the source
specifications given to Flex. The Flex written code recognizes these expressions in an
input stream and partitions the input stream into strings matching the expressions.
At the boundaries between strings program sections provided by the user are
executed. The Flex source file associates the regular expressions and the program
fragments. As each expression appears in the input to the program written by Flex,
the corresponding fragment is executed. The user supplies the additional code
beyond expression matching needed to complete his tasks, possibly including code
written by other generators. The program that recognizes the expressions is
generated in the general purpose programming language employed for the user's
program fragments. Thus, a high level expression language is provided to write the
string expressions to be matched while the user's freedom to write actions is
unimpaired. This avoids forcing the user who wishes to use a string manipulation
language for input analysis to write processing programs in the same and often
inappropriate string handling language.

2. Yet Another Compiler Compiler (YACC)
Yacc provides a general tool for imposing structure on the input to a computer
program. The Yacc user prepares a specification of the input process; this
includes rules describing the input structure, code to be invoked when these rules
are recognized, and a low-level routine to do the basic input. Yacc then generates a
function to control the input process. This function, called a parser, calls the
Page 4
User - supplied low-level input routine (the lexical analyzer) to pick up the basic
items (called tokens) from the input stream. These tokens are organized according
to the input structure rules, called grammar rules; when one of these rules has been
recognized, then user code supplied for this rule, an action, is invoked; actions have
the ability to return values and make use of the values of other actions. Yacc is
written in a portable dialect of C and the actions, and output subroutine, are in C as
well. Moreover, many of the syntactic conventions of Yacc follow C. The parser
generated by Yacc requires a lexical analyzer. Lexical analyzer generators, such as
Lex or Flex are widely used for this purpose.


3. C++
Page
5

Methodology

The only basic types are Boolean, character, string, integer, and real.
A variable name can consist of any combinations of letters, numbers, and the
underscore (_).
However, the first character in a variable must be a letter.
This language is case-sensitive.
Statements are evaluated from right to left, top to bottom, with each new statement
on a new line.

Comments:
As with any good code, your files should be commented as well. Comments are of
course not executed like normal languages. Multiline comments are surrounded by
braces {}. Single line comments use the hash mark #. Comments should be written
assuming that they will carry over into the code. Comments for variables passed in a
function are required and indicated by a colon.

Operators:
Add (+)
Subtract (-)
Multiply (*)
Divide (/)
Modulus (%)
inc (++)
dec (--)
"<"
">"
"<="
">="
"=="
"||"
"&&"
"++"
"--"
">>"
"<<"
">>>"
Page
6

"+="
"-="
"*="
"/="
"&="
"|="
"^="
"%="
"<<="
">>="
">>>="
Primitive Type
bool
char
byte
short
int
long
float
double
void
Modifier
abstract
final
public
PROTECTED
PRIVATE
STATIC
TRANSIENT
VOLATILE
NATIVE

The above method leads to compilation of C++ Program but for a C Program, We will
be differentiating both the languages by looking at the functions which are valid in
C++ and c including the same header files, i.e., the same C++ code can also be run by
using c compiler (changing its extension to .c). We will be making rules to
differentiate such header files and its functions using flex regular expressions. The
codes which are not valid for C language and are only valid for C++ language are
checked by header files and functions of C++ language using regular expressions in
Page
7

flex. In this way we will differentiate the codes of C language which are acceptable
in C++ language.

Ex 1:
#include <stdio.h>

int main()
{
printf("6565\n");
return 0;
}

We have to produce error in this code as this is acceptable in both C and C++.


Ex 2:

#include<stdio.h>
#include<iostream>

int main() {


std :: cout << "Try" << std ::endl;

return 0;

}

This code will not work for C language. Hence, its a valid C++ code only. No error in
this program.







Page
8

Flow Chart

Input C++ file







Tokens














Output as message whether this code
Is acceptable by C++ or not





Lexical Analysis through
flex program
Syntactic Analysis and
Internal generation of
parse tree
Checking Syntactic
errors using grammar
rules in yacc file
Page
9

Work Done So Far

We have identified the rules for differentiating C and C++ which will produce errors on
any C code which is acceptable in C++ in general.

Features:-
It can check syntax of a given C++ program and identify errors. Also, can tell if its a C
code which is acceptable in C++.

Errors Detected:-
1. Bracketing errors (missing parenthesis, braces and square brackets).
2. Missing semicolons.
3. All type of keyword errors.
4. Most types of operator errors.
5. Syntax errors in different control statements:
6. Standard syntax of C++.
7. Syntax error of variable, class, methods, and declarations.
8. Improper inclusion of headers.
9. Errors in passing parameters to the methods of functions.

File Structure:-
cpp.l - Contains the flex file to tokenize the input C++ file for syntax error checking.
Also, to check if its not a C code which is acceptable in C++.
cpp.y Contains the yacc grammar which checks syntax errors in the C++ file.
text.cpp - contains the c++ program

Dependencies:-
flex Lexical Analyzer
bison Parser

How to compile:-
yacc -d cpp.y
flex cpp.l
gcc lex.c.y y.tab.c -lfl -ly
./a.out < text

You might also like