Professional Documents
Culture Documents
Designing The Lexical Analyzer
Designing The Lexical Analyzer
Designing The Lexical Analyzer
Introduction
As part of the nGineer suite, there was a need to use both a lexical analyzer and a
grammatical parser, neither of which were implemented in the .NET Framework, so they
had to be written. This article explains the main design of the lexical analyzer as a
document to aid those intending to read the code or just learn about the lexical analyzer.
Goals
When I first went to design the lexical analyzer, the main goal I had in mind was to make
it as simple as possible. During the process, I had the KiSS principle in my mind. I
wanted the most easily understandable code and design.
I decided to separate the lexical analyzer from the grammatical parser in order to more
easily leverage its functionality as a tool that is able to disassemble and 'understand' the
text that makes up the C# code file.
Top-Level Design
Most functional lexical analyzers are just made up of a set of functions, trying to discover
what the next character is. Most object oriented lexical analyzers are the same as the
functional ones, just wrapped in a class. I decided that nGineer's lexical analyzer will be
made up of self creating classes that will make up an object grid which in turn describes
the text. Each lexical element got its own class and I used inheritance to express the "is a"
factor (in the coming example: pp-equality-expression is a pp-and-expression ->
PPEqualityExpression derives from PPAndExpression).
Let's take the pp-and-expression for instance:
(Common means of describing lexical grammar are LL(1), LALR, etc. ECMA use a
different means that is described in Chapter 5 of the C# standard. We'll use that notation
in this article.)
pp-and-expression::
pp-equality-expression
pp-and-expression whitespace(opt) && whitespace(opt) pp-
equality-expression
the code generated from this will be:
using System;
namespace nGineer.LexicalGrammar
{
/// <summary>
/// A.1.10
/// pp-and-expression::
/// pp-equality-expression
/// pp-and-expression whitespace(opt) && whitespace(opt) pp-equality-
expression
/// </summary>
public abstract class PPAndExpression : PPOrExpression, ILexicalElement
{
#region ComplexAndExpression
public class ComplexAndExpression : PPAndExpression, ILexicalElement
{
internal const string AndOperator = "&&";
protected ComplexAndExpression()
{
}
if (codeSegment.Contains(AndOperator))
{
codeSegment.SetCheckPoint();
VirtualString tempString =
codeSegment.SubstringLast(AndOperator);
int oldLength = tempString.Length;
PPAndExpression andExpression =
PPAndExpression.Create(tempString, requestEventHandler);
if (andExpression != null)
{
codeSegment.Advance(oldLength -
tempString.Length);
WhiteSpace
preOperatorWhiteSpace = WhiteSpace.Create(codeSegment);
if (codeSegment.Length >=
AndOperator.Length &&
codeSegment.IndexOf(AndOperator, 0, AndOperator.Length) == 0)
{
codeSegment.Advance(AndOperator.Length);
WhiteSpace
postOperatorWhiteSpace = WhiteSpace.Create(codeSegment);
PPEqualityExpression
equalityExpression = PPEqualityExpression.Create(codeSegment, requestEventHandler);
if (andExpression !=
null)
{
newObject
= new ComplexAndExpression();
newObject.m_AndExpression = andExpression;
newObject.m_PreOperatorWhiteSpace = preOperatorWhiteSpace;
newObject.m_PostOperatorWhiteSpace = postOperatorWhiteSpace;
newObject.m_EqualityExpression = equalityExpression;
codeSegment.AcceptChanges();
}
else
{
codeSegment.RejectChanges();
}
}
else
{
codeSegment.RejectChanges();
}
}
else
{
codeSegment.RejectChanges();
}
}
return newObject;
}
#region <snip />
ILexicalElement[] ILexicalElement.ChildElements
{
get
{
return new ILexicalElement[] {
m_AndExpression,
m_PreOperatorWhiteSpace,
m_PostOperatorWhiteSpace,
m_EqualityExpression
};
}
}
#endregion
protected PPAndExpression()
{
}
codeSegment.BeginPossibility();
codeSegment.EndPossibility(PPEqualityExpression.Create(codeSegment,
requestEventHandler));
codeSegment.BeginPossibility();
codeSegment.EndPossibility(ComplexAndExpression.Create(codeSegment,
requestEventHandler));
#if kek
#define keke
#endif
#if keke
#endif
This is correct and would render keke as an input section and not a skipped section. This
would not be recognized unless there was some table of symbols. But wait a second, what
if we were to create a symbol, but it was not a real symbol, but just an option to be
discarded?
So what I did was put a list of them in each input section and allow requests to be made
using delegates that were propagated to the classes below (see the PPAndExpression
example's Create method).
Walkers
There was a need for a mechanism for walking over the elements' object grid once it was
finished, to use the lexical analyzer to its fullest extent. This is where walkers come into
play.
The walkers' design is simple. Simply implement the base class WalkerBase and add
code that does whatever you want whenever a certain element is stumbled upon. There
are three basic implementations that come with nGineer - CodeWalker (persists the code
back to a file), XmlWalker (which serializes the object grid to Xml) and
HighlightedCodeWalker (on which webify is based). There's more about walkers in the
next article, "Implementing A Lexical Walker".