Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

1

Syntax Analysis
[Chapter 4 - Parts 7&8]

Lecture 15

Edited by Dr. Ismail Hababeh


German Jordanian University
Adapted from slides by Dr. Robert A. Engelen
Lexical Analyzer & Parser Generators 2

• Lexical Analyzer:
– Lex/Flex is a computer program that converts
the text into tokens. It is used to identify
specific text strings in a structured way from
source text.
• Parser Generators:
– Yacc/ Bison (Yet Another Compiler Compiler)
is a computer program that generates LALR(1)
parsers.
Creating a Lexical Analyzer &
3

Parser with Lex and Yacc


Lex and Yacc are tools used to generate lexical analyzer and parser
lex
source lex / Yacc lex.yy.c
program compiler Yacc reads your grammar and
generate C code for a syntax
lex.l analyzer or parser.

lex.yy.c C a.out
compiler

input sequence
stream a.out of tokens
Generating LALR(1) Parser with
4

Yacc/Bison

yacc yacc.y
contains yacc Yacc / Bison y.tab.c
specification compiler

gcc y.tab.c -ly create the object program


y.tab.c C a.out
compiler

input output
stream a.out stream
5

Yacc Program Specification


• The Yacc program structure:
%{ Yacc and C declarations %}
regular definitions
%%
translation rules
%%
user-defined auxiliary procedures
• The translation rules are of the form:

Productions p1 { semantic action1 }


p2 { semantic action2 } semantic
… actions
pn { semantic actionn }
Yacc Grammar Productions
6

• Productions in Yacc are of the form:

Nonterminal: tokens/nonterminals { semantic action }


| tokens/nonterminals { semantic action }

| tokens/nonterminals { semantic action }
;
• Tokens of single characters can be used directly within
productions, e.g. ‘+’
• Token name must be declared first in the declaration part
using %tokenTokenName
Productions Semantic Actions 7

• Semantic actions may refer to values of the synthesized


attributes of terminals and nonterminals in a production:
X : Y1 Y2 Y3 …Y n { semantic action }
– $$refers to the value of the nonterminal X (left hand side)
– $i refers to the value of the nonterminal Yi (right hand side)
• Example:
factor : ‘(’ expr ‘)’ { $$=$2; }
The $$ in the semantic action refers to the value of the attribute factor.
The $2 in the semantic action refers to the value of the attribute expr.
Yacc Program - Example 1 8

%{ #include <ctype.h> %}
%token DIGIT
%%
line : expr ‘\n’ { printf(“%d\n”, $1); }
;
expr : expr ‘+’ term { $$ = $1 + $2; }
| term { $$ = $2; }
;
term : term ‘*’ factor { $$ = $2 * $3; }
| factor { $$ = $3; }
;
factor : ‘(’ expr ‘)’ { $$ = $1; }
| DIGIT { $$ = $1; }
; Attribute of factor(child)
%% Attribute of
int yylex() term(parent) Attribute of token
{ int c = getchar(); (stored in yylval)
if (isdigit(c))
{ yylval = c-’0’; yylval passes the semantic value associated with a
return DIGIT; token from the lexer to the parser.
}
return c;
if c is a digit, then it is returned as a DIGIT token,
} otherwise c is returned as the token name
9

Defining Operator Precedence Levels in Yacc


• By defining operator precedence levels and left/right
associativity of the operators, we can specify ambiguous
grammars in Yacc, such as
E E+E | E-E | E*E | E/E | (E) | -E | num
• Defining precedence levels and associativity in Yacc in
the declaration part:
%left ‘+’ ‘-’
%left ‘*’ ‘/’
%right UMINUS
Yacc Program - Example 2 10

%{
#include <ctype.h>
#include <stdio.h> Double type for attributes
#define YYSTYPE double and yylval
%} a union with a member for each type
%token NUMBER of token defined within the yacc
%left ‘+’ ‘-’ source file
%left ‘*’ ‘/’
%right UMINUS
%%
lines : lines expr ‘\n’ { printf(“%g\n”, $2); }
| lines ‘\n’
| /* empty */
;
expr : expr ‘+’ expr { $$ = $1 + $3; }
| expr ‘-’ expr { $$ = $1 - $3; }
| expr ‘*’ expr { $$ = $1 * $3; }
| expr ‘/’ expr { $$ = $1 / $3; }
| ‘(’ expr ‘)’ { $$ = $2; }
| ‘-’ expr %prec UMINUS { $$ = -$2; }
| NUMBER to give a grammar rule a precedence uses
; the %prec TOKEN
%%
11

int yylex()
{ int c;
while ((c = getchar()) == ‘ ‘)
;
if ((c == ‘.’) || isdigit(c))
Simple lexical analyzer
{ ungetc(c, stdin); //push back last char read for floating point doubles
scanf(“%lf”, &yylval); and arithmetic operators
return NUMBER;
}
return c; returns a value of 0 if the
} input it parses is valid
int main()
{ if (yyparse() != 0)
fprintf(stderr, “Abnormal exit\n”);
Run the parser
return 0;
}
int yyerror(char *s) Invoked by parser
{ fprintf(stderr, “Error: %s\n”, s);
to report parse errors
}
Combining Lex/Flex with 12

Yacc/Bison

Lex file Lex / Flex


lex.yy.c
lex.l compiler
.c contains the output file, .h
contains the definitions for
Yacc file Yacc / Bison token names. y.tab.c
yacc.y compiler y.tab.h

lex.yy.c C a.out
y.tab.c compiler

input output
stream a.out stream
13

Lex Specification for Example 2


%{
#include “y.tab.h”
Generated by Yacc, contains
#define NUMBER xxx
extern double yylval;
%} Defined in y.tab.c
number [0-9]+\.?|[0-9]*\.[0-9]+
%% sscanf reads
[ ] { /* skip blanks */ } from the
{number} { sscanf(yytext, “%lf”, &yylval); character string
return NUMBER;
} yytext[0] holds the first
\n|. { return yytext[0]; } character of the text
matched by the current
token.

lex example2.l flex example2.l


yacc -d example2.y bison -d -y example2.y
gcc lex.yy.c y.tab.c gcc lex.yy.c y.tab.c
./a.out ./a.out
Compiling YACC and Lex Programs 14

a)Write the lex program and save the file


with extension .l
b)Write the yacc program and save the file
with extension .y
c)From the directory where the lex and yacc
files have been saved:
– Type: lex fileName.l
– Type: yacc fileName.y
– Type: cc lex.yy.c y.tab.h -ll
– Type: ./a.out
15

Error Recovery in Yacc Productions


%{

%}

%%
lines : lines expr ‘\n’ { printf(“%g\n”, $2; }
| lines ‘\n’
| /* empty */
| error ‘\n’ { yyerror(“reenter last line: ”);
yyerror;
}
;

Error production:
Reset parser to normal mode
set error mode and
skip input until newline

You might also like