Professional Documents
Culture Documents
Algebraic Expression Travesties 002
Algebraic Expression Travesties 002
ABSTRACT
Mathematica (or the Wolfram language) offers an exceptionally easy and straightforward way to generate
tools like test generators, parsers, type-checkers, visualizers, and solvers from certain concise grammars.
In this paper, we illustrate our methods with syntax-driven travesties (random utterances) and syntax-driven
parser generation for an example grammar, F, of algebraic expressions. Elsewhere, we tested our genera-
tors on much bigger grammars, for instance, the famous game W ff-n-Proof (link below), for which we
include inferencing, and also for a small programming language from “Types and Programming Languages”
by Benjamin C. Pierce, for which we exhibit type checking.
OVERVIEW
Travesties are syntactically valid but random utterances, useful for testing. We could, for example, easily
write a strategy in Python’s hypothesis testing library using methods from this paper (https://hypoth-
esis.readthedocs.io/en/latest/ ).
Parsers convert utterances (streams of tokens) into trees, imposing the recursive structure of the grammar.
A parser generator produces a parser from a grammar. Parser generation is usually considered abstruse
and difficult. Think what it would take you to replicate yacc. However, our yacc-alike is trivialized because of
the specific form we require of our grammars.
The grammars we found easy to process are in stripped-down Polish Prefix Form (PPF). In our version of
PPF, each non-terminal in the grammar has a unique leading head symbol. Arities of functional forms, i.e.,
numbers of arguments, are fixed. No punctuation is needed or allowed because a head immediately dic-
tates how many arguments: no curly braces, commas, semicolons, brackets, parentheses, etc. Also no
operators, overloads, or precedence tables are needed or allowed.
We speculate that more complicated grammars, say those for C++, Python, or Haskell, can be translated
into our PPF by a top-level, traditional parser that treats our PPF as an intermediate representation. We
further speculate that it might be worth the effort to transcribe complicated syntaxes into PPF because,
stripped of syntactic noise, our PPFs ease the work of downstream language-processing tools like test
generators, parsers, type-checkers, visualizers, and solvers.
We do not strive for academic rigor, preferring to illustrate by examples and to invite the reader to join the
fun, to extend and apply our methods. We also state, up-front, that any novelty, here, comes from using
Mathematica. Our methods have been known to lisp programmers, for example, for a very long time (See
my paper from 1989, “A Scheme for Interactive Graphics,” for example [link below]). The target audience,
2 AlgebraicExpressionTravesties002.nb
here, comprises programmers who are accustomed to working much too hard at too low a level of abstrac-
tion so as to achieve similar results.
https://www.researchgate.net/publication/2319434_A_Scheme_for_Interactive_Graphics
Preliminaries
If you don’t mention their explicit locations, you’ll need the following packages on your Mathematica $Path.
The following shows the explicit locations where I put the packages on my Linux system:
<< "c:/Users/brianbeckman/Dropbox/MMA/Packages/JacquardProlog.m"
<< "c:/Users/brianbeckman/Dropbox/MMA/logic.m"
In[3]:=
<< "c:/Users/brianbeckman/Dropbox/MMA/Packages/Jacquard.m"
DEFINITIONS
Forward references to later definitions are in non-bold italics. Undefined words like field are also in non-bold
italics. Definitions are in bold-italics.
Let a grammar be a function from non-terminal symbols to productions, each of which is a suit (ordered
collection with no duplicates, aka permutation) of alternatives. Alternatives are ordered for the convenience
of pairing (zipping) them with travesty-generation probabilities; logically, alternatives are unordered, i.e., a
set rather than a suit.
For example, F below is our grammar for algebraic expressions. We assume an algebraic field with addition
and multiplication following the usual commutative, distributive, and associative laws (Mathematica automati-
cally enforces these laws when simplifying expressions; Incidentally, Mathematica’s simplifications are not
well founded as Mathematically freely annihilates expressions multiplied by zero without stipulating that
denominators be non-zero).
AlgebraicExpressionTravesties002.nb 3
In[6]:= ClearAll[F, Expr, FSum, FInt, FVar, FProd, Start]; (* for "Field" *)
FSum
In this example, F is a function from the non-terminal symbols Expr, FSum, FInt, FVar, FProd , and Start,
each to a list (as a suit) of alternatives, each alternative a list (as a sequence) of terms, recursively. This list
of non-terminal symbols is the domain of the function.
A production is a suit of alternatives. The value of F[Expr] above, namely {{FSum, FProd, FVar,
FInt}}, is an example of a production.
Each alternative is a list, stream, sequence, or array (all synonyms for an ordered collection, duplicates
allowed) of terms. The terms must match, in both order and form, the terms in utterances.
An utterance is a list (as a stream) of terminals like {Plus, x, y}, The utterance has the parse tree
FSum[Plus,FVar[x],FVar[y]], which is a stand-in, written in our PPF grammar, for the algebraic expres-
sion x+y. The parser, which we will generate automatically from the grammar, converts utterances into
parse trees. Parse trees exhibit the recursive structure of utterances, a structure imposed by the grammar.
A terminal symbol is a literal like an integer or a bottom-level expression like x, 1/x, or -1. A terminal
symbol must match exactly a token appearing in the input.
A token is an atomic expression in the source language. Examples include a symbol, e.g., Plus or FSum, a
String (we don’t use strings here, but they work just fine), or an FInt enclosing a literal integer.
A non-terminal symbol recurses back into the grammar. It’s always of Mathematica type (i.e., Head)
Symbol.
The start symbol is the special, distinguished name and symbol Start, not available for other uses.
How about a pretty display for the domain of F? From the Jacquard library imported in the Preliminaries
section of this document, gridRules, exhibits terminals in dark red, non-terminals in dark blue, lists in green
boxes, lists of lists in doubled green boxes. This visual notation does not distinguish lists as suits, lists as
sets, lists as sequences, or lists as bags. Mathematica Rules have orange boxes on the left and light-yellow
boxes on the right. The following displays the rules above. The code includes a little inscrutable gymnas-
tics to prevent premature evaluation by Mathematica (See Robby Villegas’s paper “Working with Unevalu-
ated Expressions,” https://library.wolfram.com/infocenter/Conferences/377/ ).
4 AlgebraicExpressionTravesties002.nb
FSum
FProd
Expr
FVar
FInt
FInt RandomInteger
Times
FProd Expr
Expr
Plus
FSum Expr
Expr
Out[15]= x
Times - 1 x
Times - 1 y
FVar
Times - 1 z
Power x - 1
Power y - 1
Power z - 1
Start Expr
(nonTerminalsFromGrammar[F]) // InputForm
The terminals in a grammar is the complement of the non-terminals against the set of all symbols, which is
the union of all the right-hand sides of the productions.
In our pretty display for allSymbols via Jacquard’s gridExpression below, terminals are in a light-yellow
background, non-terminals in a purple background; gridExpression is a visual version of Mathematica’s
built-in FullForm, so it shows the Head List explicitly, whereas gridRules above exhibits lists as colored
boxes without the explicit Head List.
6 AlgebraicExpressionTravesties002.nb
Expr
FInt
FProd
FSum
FVar
Plus
RandomInteger
Times
Power x
-1
List
Times -1
x
Out[21]=
Power y
-1
Times -1
Power z
-1
Times -1
{Expr, FInt, FProd, FSum, FVar, Plus, RandomInteger, Times, x^(-1), -x, x, y^(-1),
-y, y, z^(-1), -z, z}
Here are the terminals from our example grammar, F, prettified by Mathematica’s automatic typesetting
AlgebraicExpressionTravesties002.nb 7
In[23]:= ClearAll[terminalsFromGrammar];
terminalsFromGrammar[ps_] :=
allSymbols @ ps,
Complement[
nonTerminalsFromGrammar @ ps]
(T = terminalsFromGrammar @ F)
Transform each alternative (sequence of terms) into a list of Mathematica Rules (a list of rules is a
Jacquard object): one rule for the probability of choosing the alternative in a travesty and another rule for
the original alternative itself. Generate the probabilities with a function that maps the alternatives to a
parallel, isocardinal (zippable) list. The orders of the alternatives and of the probabilities are important only
for Zip, even though the alternatives are, notionally, a set.
In[26]:= ClearAll[injectGenerationProbabilities];
injectGenerationProbabilities[
grammar_,
probsFromAlternatives_] :=
Module[{newTable, nonTerminals = nonTerminalsFromGrammar @ grammar},
Scan[(* Scan is like Map, but just for side-effects *)
Function[nonTerminal,
alternatives = grammar[nonTerminal],
With[{
probabilities = probsFromAlternatives[grammar[nonTerminal]]},
newTable[nonTerminal] = (* side-effect this definition *)
Zip[probabilities, alternatives,
The following function assigns equal probabilities to every element of a list. It’s the default. Use something
else if you have better estimates of appropriate probabilities for travesty generation.
8 AlgebraicExpressionTravesties002.nb
In[28]:= ClearAll[equiProbabilities];
equiProbabilities[list_List] :=
]) // visGrammar
equiProbabilities
probability 0.25
alternative FSum
probability 0.25
alternative FProd
Expr
probability 0.25
alternative FVar
probability 0.25
alternative FInt
probability 1.
FInt
alternative RandomInteger
probability 1.
Times
FProd
alternative Expr
Expr
probability 1.
Plus
FSum
alternative Expr
Expr
probability 0.111111
alternative x
Out[30]=
probability 0.111111
alternative y
AlgebraicExpressionTravesties002.nb 9
probability 0.111111
alternative z
probability 0.111111
alternative Times - 1 x
probability 0.111111
probability 0.111111
alternative Times - 1 z
probability 0.111111
alternative Power x - 1
probability 0.111111
alternative Power y - 1
probability 0.111111
alternative Power z - 1
probability 1.
Start
alternative Expr
We need a function from probabilized alternatives to a particular choice, given an input die roll.
(Jacquard object) by applying the Jacquard rules that implement the object via /., ReplaceAll. ReplaceAll
If there is only one alternative, choose it. Notice that we access the key “alternative” in the lookup table
fills the role of dot from object-oriented programming in Jacquard’s lightweight polymorphism, the part of
object-oriented programming that Jacquard exploits.
In[31]:= ClearAll[chooseFromAlternatives];
chooseFromAlternatives[{probabilizedAlternative_}, dieRoll_] :=
"alternative" /. probabilizedAlternative;
If there are many alternatives, pick the first whose cumulative probability is greater than or equal to
dieRoll. Track the cumulative probability by decrementing dieRoll by each looked-up probability as we
recurse down the alternatives. (More scalable is Walker’s method of aliases (e.g., https://www.keithschwarz.-
10 AlgebraicExpressionTravesties002.nb
If[dieRoll < p,
(* then *)"alternative" /. probabilizedAlternative,
(* else *)chooseFromAlternatives[{rest}, dieRoll - p]]];
chooseFromAlternatives[badArgs___] := Throw[{"CHOOSE:BADARGS: ", {badArgs}}];
The ground term is the term to force when the recursion limit is exceeded. The default recursion limit is 100.
That’s too big for algebraic expressions, where 20 is ample. But 100 or even 500 is useful in other
applications.
◻ chainExpansion: helper
It’s debatable whether to have a symbol like debugPrint for Print debugging, or to use
Block[{Print=Identity},...] to switch off printing. We opt for the former.
In[37]:= ClearAll[chainExpansion];
(* terminal symbol *)
Module[{},
If[MemberQ[T, term],
subtrees =
If[MemberQ[T, #], {#},
chainExpansion[iP, gd, T, {#}, {}, i + 1, iLim]] & /@ {rest};
result = Join[sentence, {term}, Flatten[subtrees, 1]];
debugPrint[<|"branch" "term", "i" i, "prodn" production,
"term" term, "rest" {rest}, "sub" subtrees, "res" result|>];
(* non-terminal symbol *)
result],
]]];
result
In[41]:= ClearAll[generateSentence];
Let' s have a button to generate random sentences. We’ll be much more interesting a little bit below, but
unit-test the gadgetry so far:
12 AlgebraicExpressionTravesties002.nb
GEN
Out[43]=
FE`foo$$32
Now we have utterances in the grammar. Let’s parse them and display them.
DATA-DRIVEN PARSING
Can we write parserFromGrammar, a function that writes a parser from a grammar? Such a thing is analo-
gous to yacc, but our version is pitifully short because it exploits convenient properties of PPF.
Any of our parsers takes a stream of tokens and a tree, then iteratively augment the tree by side effect. We
don’t write down that function signature explicitly; just remember it.
HoldPattern[F[FInt]] {{RandomInteger}}
HoldPattern[F[FProd]] {{Times, Expr, Expr}}
HoldPattern[F[FSum]] {{Plus, Expr, Expr}}
HoldPattern[F[FVar]] {x}, {y}, {z}, {- x}, {- y}, {- z}, , ,
1 1 1
HoldPattern[F[Start]] {{Expr}}
x y z
◼ prefix,
suffix,
terminalsFromRule,
nonTerminalHeadFromRule,
arityFromPrefixRule
In[45]:= ClearAll[
prefix, suffix,
terminalsFromRule,
nonTerminalHeadFromRule,
arityFromPrefixRule];
terminalsFromRule[rule_, nonTerminals_] :=
Select[prefix /@ rule[[2]], ! MemberQ[nonTerminals, #] &];
In[47]:=
nonTerminalHeadFromRule[rule_, nonTerminals_] :=
With[{h = rule[[1, 1, 1]]},
In[48]:=
First @ lens]
Throw[{"ARITY FROM PREFIX RULE: CATASTROPHE", lens}]];
◼ genParserBody, parserDefFromGrammarRule
Accumulate the above overloads of parserTargetSym: for each set of terminals in a rule, write a Mathemat-
ica pattern to recognize those Mathematica Alternatives (firstPattern below). The second pattern just
picks up the tree-so-far.
ts = terminalsFromRule[rule, nonTerminals],
With[{
h = nonTerminalHeadFromRule[rule, nonTerminals],
a = arityFromPrefixRule[rule]},
If[Length @ ts =!= 0,
(* then *)
(* else *)
parserTargetSym[{}, tree_ : Null] := {{}, tree}]];
parserTargetSym[xs___] :=
Throw[{ToString @ parserTargetSym <> ": CATASTROPHE: ", xs}]]
In[57]:= ClearAll[parserPatterns];
parserPatterns[grammar_, parserTable_] :=
DownValues @ grammar];
That’s it for the parser generator! It’s equivalent to a yacc for our massively simplified PPF grammars. It’s
small because it doesn’t consider punctuation, operator precedence, and overloads in the object language
of the PPF.
In[59]:= ClearAll[$exprTable];
parserPatterns[F, $exprTable];
Make a little ad-hoc function (ad-hoc-ness signified by the dollar sign in the name of the function) to gener-
ate random sentences, parse them, and then typeset them in Mathematica.
In[61]:= ClearAll[$fexpr];
$fexpr[dummy_, depth_ : 20] :=
With[{sen = generateSentence[
FProbabilized, FVar, terminalsFromGrammar @ F, depth]},
With[{parse = ($exprTable @ sen)[[2]]},
With[{interp = {
FInt[RandomInteger] RandomInteger[10],
FVar Identity,
(FSum FProd)[f_, args__] Apply[f, {args}]}},
{TreeForm[parse, VertexLabeling Automatic],
(parse //. interp // FullSimplify)}]]]
Here’s an animation that exhibits typeset expressions, their leaf counts, and their parse trees. It keeps track
of the biggest expression found so far, for bragging rights.
In[63]:= $fcontender = 0;
Animate[
{t, e} = $fexpr[dummy];
Module[{t, e},
With[{le = LeafCount[e]},
If[le > LeafCount @ $fcontender, $fcontender = e];
Grid[{{e, SpanFromLeft}, {le, t}}, Frame All]]],
{dummy, 1, 25, 1}]
dummy
Out[64]=
23
16 AlgebraicExpressionTravesties002.nb
In[65]:= Dynamic[$fcontender]
5-z (1 + x + y) z
-x + -2y+ + - + (8 + y) (- 1 + z (y + z))
2 1
Out[65]=
y x z xy