Lex Yaac

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 24

Lex & Yacc

Lex
 lex is a program (generator) that generates lexical analyzers,
(widely used on Unix).
 It is mostly used with Yacc parser generator.
 Written by Eric Schmidt and Mike Lesk.
 It reads the input stream (specifying the lexical analyzer ) and
outputs source code implementing the lexical analyzer in the C
programming language.
 Lex will read patterns (regular expressions); then produces C
code for a lexical analyzer that scans for identifiers.

Lex
Lex is a program that generates lexical analyzer. It is used with YACC
parser generator.
• The lexical analyzer is a program that transforms an input stream into a
sequence of tokens.
Lex
◦ A simple pattern: letter(letter|digit)*
 Regular expressions are translated by lex to a computer program that mimics an FSA.
 This pattern matches a string of characters that begins with a single letter followed by
zero or more letters or digits.
Lex

Pattern Matching Primitives


Lex

• Pattern Matching examples.


Declarations
The declarations section consists of two parts, auxiliary declarations and
regular definitions.

The auxiliary declarations are copied as such by LEX to the output lex.yy.c
file. This C code consists of instructions to the C compiler and are not
processed by the LEX tool.The auxiliary declarations (which are optional)
are written in C language and are enclosed within ' %{ ' and ' %} ' . It is
generally used to declare functions, include header files, or define global
variables and constants.

LEX allows the use of short-hands and extensions to regular expressions


for the regular definitions. A regular definition in LEX is of the form : D R
where D is the symbol representing the regular expression R.
2.2 Rules
Rules in a LEX program consists of two parts :

1. The pattern to be matched


2. The corresponding action to be executed

Auxiliary functions
LEX generates C code for the rules specified in the Rules section and places
this code into a single function called yylex(). (To be discussed in detail later).
In addition to this LEX generated code, the programmer may wish to add his
own code to the lex.yy.c file. The auxiliary functions section allows the
programmer to achieve this.
The yyvariables

These variables are accessible in the LEX program and are automatically declared
by LEX in lex.yy.c.

yyin
yytext
yyleng

yyin
yyin is a variable of the type FILE* and points to the input file. yyin is defined by LEX
automatically. If the programmer assigns an input file to yyin in the auxiliary functions
section, then yyin is set to point to that file. Otherwise LEX assigns yyin to
stdin(console input).

yytext
yytext is of type char* and it contains the lexeme currently found.Each invocation of
the function yylex() results in yytext carrying a pointer to the lexeme found in the input
stream by yylex(). The value of yytext will be overwritten after the next yylex()
invocation.
yyleng
yyleng is a variable of the type int and it stores the length of the lexeme pointed to
by yytext.
The yyfunctions
yylex()
yywrap()

yylex()
yylex() is a function of return type int. LEX automatically defines yylex() in lex.yy.c but
does not call it. The programmer must call yylex() in the Auxiliary functions section of
the LEX program.
yywrap()
LEX declares the function yywrap() of return-type int in the file lex.yy.c . LEX does not
provide any definition for yywrap(). yylex() makes a call to yywrap() when it encounters
the end of input. If yywrap() returns zero (indicating false) yylex() assumes there is
more input and it continues scanning from the location pointed to by yyin. If yywrap()
returns a non-zero value (indicating true), yylex() terminates the scanning process and
returns 0 (i.e. “wraps up”). The programmer must define it in the Auxiliary functions
section or provide %option noyywrap in the declarations section. It is mandatory to
either define yywrap() or indicate the absence using the %option feature. If not, LEX
will flag an error
Disambiguation Rules
yylex() uses two important disambiguation rules in selecting the right action to execute
in case there is more than one pattern that matches a string in the given input:
1. Choose the first match.
2. "Longest match" is preferred.
Example:
“break” { return BREAK; }
[a-zA-Z][a-zA-Z0-9]* { return IDENTIFIER; }

If "break" is found in the input, it is matched with the first pattern and yylex() returns
BREAK. If "breakdown" is found, it is matched with the second pattern and yylex()
returns IDENTIFIER. Note the use of disambiguation rules here.

Example:

/* Declarations section */
%%

“-” {return MINUS;}


“--” {return DECREMENT;}

%%
/* Auxiliary functions */
Lex

Lex predefined variables.


Yacc
Theory:
◦ Yacc reads the grammar and generate C code for a parser .

◦ Grammars written in Backus Naur Form (BNF) .

◦ BNF grammar used to express context-free languages .

◦ e.g. to parse an expression , do reverse operation( reducing the expression)

◦ This known as bottom-up or shift-reduce parsing .

◦ Using stack for storing (LIFO).


Yacc
 Input to yacc is divided into three sections.

... definitions ...


%%
... rules ...
%%
... subroutines ...
Yacc
 The definitions section consists of:
◦ token declarations .
◦ C code bracketed by “%{“ and “%}”.

◦ the rules section consists of:


 BNF grammar .

 the subroutines section consists of:


◦ user subroutines .
yacc& lex in Together
 The grammar:
program -> program expr | ε
expr -> expr + expr | expr - expr | id

 Program and expr are nonterminals.


 Id are terminals (tokens returned by lex) .

 expression may be :
◦ sum of two expressions .
◦ product of two expressions .
◦ Or an identifiers
Lex file
Yacc file
Linking lex&yacc

You might also like