Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Bad ness 0

Sen. Bill Cassidy of Louisiana, representing the


______ party, said paying workers the same
wages for fewer hours would force employers
(Epsom’s version) to pass the cost of hiring more workers along to
consumers.
Dr. Tom Mur phy VII, Ph.D. “It would threaten millions of small businesses
March 2024 operating on a razor-thin margin because they’re
unable to find enough workers," said Cassidy.
"Now they’ve got the same workers, but only for
Many peo ple w alk this Earth un both ered by incor - three-quarters of the time. And they have to hire
rect de tails. For exam ple, they are un con cerned more.”
when a hy per link includes a sur round ing space
char acter. They don’t no tice that the screw heads In fact, that’s not the exact quote, but I needed to
on a light switch w all plate are not all lined up. make it look nice. [3] And this is not a pa per about
They don’t care about the rules of Wordle’s “hard pol itics, but you can prob ably guess the word that
mode” being sim ply wrong. They don’t no tice the goes in the blank.
dif ference betw een “its” and “it’s.” When some -
one asks, “W ill you marry me?” and they think Any w ay, OKAY (and I’ll explain why in a second),
“Oh my god!” it’s not because the pro poser prob a- first of all, ra zors famously hav e high mar gins. It’s
bly should hav e used the sub junc tiv e would. like the worst pos sible metaphor here.

I am not like this. If I can infer from a coffee cup’s This guy uses both fancy and ASCII quotes.
mo ment of inertia that it does not con tain any liq-
uid, I im me di ately lose sus pen sion of dis belief and The main thing I w ant to talk about is: What? No!
will not pur chase the prod uct featured in the com - 32/40 is not three quar ters. This is not, like, com pli -
mer cial. I literally pro jectile vomit if Auto-Motion cated math. It uses some of the world’s small est in-
Plus is en abled on a tele vision in the ho tel I’m stay - tegers. Every body knows that the work w eek is 40
ing in, even if the TV is not turned on, or if some - hours, and that a work day is 8 hours, and that the
one mis uses the word “lit erally.” If I see a pe riod pro posed bill reduces it by one day, giv ing four of
miss ing at the end of a para graph on Wikipedia, I five days. I don’t really mind if some one makes an
will spend dozens of hours writ ing soft w are to or - error in calcu lation (w ell, I do mind, but I am cer-
ga nize and semi-automate a dis trib uted effort to tainly prone to do ing it). The infu riat ing realiza -
fix all the miss ing pe riods on Wikipedia. [2] And tion here is that this per son does not even think of
worse, each time I learn of a new type of mis take, “three-quarters” as a kind of thing that can be right
I am for ever cursed to no tice that mis take or wrong. He says three quar ters because it makes
smaller num ber feelings. You could imag ine him
Seriously: One time I found my self spell-correcting hav ing the con versation (with me, per haps): “You
some one else’s lorem ipsum text in a slide. It said say four-fifths, I say three-quarters.” Me: “But it is
“lorem ep som,” which is funny. I think about that four fifths. And why are you alw ays hy phen at ing
incident all the time. The per son that wrote the it?” Him (smil ing pa tron izingly): “I guess w e just
slide prob ably thinks about things like lev erag ing hav e to agree to dis agree.”
syn ergy, gen erativ e AI, meta verses, blockchain 3.0,
snack able con tent, being eco-green, and so on, with - Don ald Knuth is the op po site of this per son.
out it occur ring to him that these things could hav e
I’m not say ing that Don ald Knuth isn’t suc cess -
nu ance and mean ing sep arate from their names.
ful and rich. Accord ing to the w ebsite Famous Birth -
He has prob ably nev er even read the Wikipedia ar - days , which is prob ably gen erated by AI or at least
ticle on Lorem Ip sum. He is suc cess ful and rich.
by peo ple whose economic out put is mea sured in
An other suc cess ful per son is Bill Cas sidy, who is a a count of words, and words whose value is com -
con gressper son. He crit icized a pro posed bill that puted by their abil ity to driv e ad clicks, Don ald
would reduce the stan dard work w eek in the US Knuth is “is one of the most pop u lar and rich est
by 8 hours, from 40 to 32. He said, Math emati cian who w as born on Janu ary 10, 1938
in Wiscon sin, Wiscon sin, United States. Math emati -
cian and en gineer who w as ar guably most recog -
nized as the Pro fessor Emer itus at Stan ford in Palo
Alto, Cal ifor nia.” As one of the rich est Math emati - cicles 5–6; 736 pages). It is un known how many
cian from United States, accord ing to the analy sis more episodes remain in Volume 4. I expect that
of Famous Birth days , Wikipedia, Forbes & Business In- every con versation that Knuth has with his ed itor
sider, “Don ald Knuth’s net worth $3--5 Million. [4]” goes like this. Ed itor: “Hey, Don ald, I hope you’ re
w ell. Just won der ing if you hav e an up date on
It is ar guable that he is the Pro fessor Emer itus. It when 4C will be ready? Or any more icicles?” Don -
is likely that he is the only pop u lar and rich math - ald E. Knuth: “I am work ing dili gently on fascicles
emati cian born on that spe cific day in Wiscon sin. for Volume 4C. As I’ve men tioned in the past, it’s
The sin gu lar “Math emati cian” is per haps a tech ni- im pos sible to tell how long it will be, since math e-
cal master-stroke. The aster isk does not hav e any mat ics does not obey the rules of project man age -
referent on the page. ment.” Ed itor: “I just need a date to tell the pub lish -
ers.” Don ald E. Knuth: “Like I’ve said, any date
What I mean when I say that Don ald Knuth is the would be very low con fidence, other than the fact
op po site of this per son is that Knuth is inter ested that it will be in the fu ture.” Ed itor: “I just need a
in un pack ing a sin gle un nec essary de tail, recur - date.” Don ald E. Knuth: “Would you like me to
siv ely, un til it is com pletely solv ed. Accord ing to say a date, know ing that it’s a very low con fidence
the w ebsite Famous Bibliophiles, one day Don ald guess, and that I would be extremely likely to miss
Knuth set out to write down the en tire sub ject of that date, or even de liver early?” Ed itor: “Early!
com puter science in a sin gle book called The Art Now w e’re talk ing.” Don ald E. Knuth: “What use
of Com puter Pro gram ming. As he w as do ing so, is the date if you’ re excited about the pos sibility
he realized that de scrib ing com puter algo rithms in of it being early, rel ativ e to some un known date?”
a last ing form would require a pro gram ming lan - Ed itor: “I just need a date for the pub lish ers.” Don -
guage that w as not sub ject to con stant revision, so ald E. Knuth: “2030.” Ed itor: “Thanks Don ald,
he invented the MIX instruc tion set for an ide alized you’ re the best!”
com puter. After writ ing some 3000 pages out in
long hand, he found that it w as im prac tical to print Knuth is estimated to be ready with Volume 5 in
them all in one book, so the plan expanded to be 2030, when he will be 92.
mul tiple vol umes. Then when he got a draft of one
of the books back from the type set ter, he w as un - That’s a large amount of lan guage!
happy with the de tails of the ty pog ra phy, and so
he paused his work writ ing down all of com puter Night mare on LLM street
science to create some new com puter science: First
an algo rithm for de ter min ing where to place line Then there are large lan guage mod els. [7] One of
breaks in or der to make text op timally beau tiful, the an noy ing things about large lan guage mod -
then algo rithms for hy phen at ing words, then gen - els is that they are so buzz wordy, but un like most
eraliza tions of these for type set ting math emat ics, buzz wordy trends, they are actu ally sub stan tiv e.
and then a full com puter type set ting sys tem that is They pro duce remark ably flu ent text. With no ad -
still in wide use today, called TeX. Along the w ay di tional train ing, they often out per form mod els
he w as un sat isfied with the spe cific type faces that that hav e been de veloped for decades. They gen er-
existed in the world, and un sat isfied with the w ay alize to com pletely new situ ations.
that type faces w ere de scribed at only one w eight,
and so he created the pa ra me ter ized METAFONT So many things about “AI” dis tress me. Dolor sit
sys tem and sev eral new type faces. Un de terred by amet! I worry about the de valu ation of hu man cre-
these excur sions, he returned to his orig inal task ativ ity, about large-scale dis infor ma tion and spam
of writ ing down the en tirety of com puter science, ru ining the beau tiful library of know ledge that hu -
us ing all the tech nol ogy he had built. By the time mans hav e created, about extreme con cen tra tion
he finished this, much more com puter science had of w ealth. And yes, I worry about com pet ing with
been invented, includ ing by his own hand, and so AI. Being able to work tire lessly and thou sands of
he needed to rework MIX for the next vol ume, and times faster than hu mans is a huge com pet itiv e ad -
up date the first. The revised plan of eight vol umes van tage. Of course, I find some solace in the sig nif-
remains the inten tion in 2024. How ever, he found icant pos sible up sides. It might help us solv e hard
that the vol umes w ere get ting rather long, and be- prob lems like climate change and AI. But even in
gan releas ing por tions of vol umes (“fas cicles”). the best scenar ios w e will not be able to ignore it:
So far, Volume 4 has been par tially pub lished as Even if it nev er gets as smart and pre cise as Knuth,
books 4A [5] (fascicles 0–4; 912 pages) and 4B[6] (fas- it’s already too econom ically use ful in its Lorem
Ep som state (just like Lorem Ep som him self). brain is called Mother Brain and its goal is to con -
trol the hy per cap ital ists called Space Pirates to in-
On the one hand, the tech nol ogy is pretty neat and crease their “score” as high as pos sible by con quer -
lends itself to some nice ab strac tions. On the other ing plan ets through out the galaxy. Mother Brain
hand, I lov e play ing with words. So, I hav e been w as invented by the Space Pirates, although it is
exper iment ing with LLMs in prac tical and im prac - not clear whether the cur rent situ ation w as actu -
tical ap pli cations. I also try to make it fun (for me) ally intended by the Space Pirates. The most su per
to pro gram with them. version of Metroid is Super Metroid.

I hav e a myr iad of strate gies for di gest ing things In the 1990s, the w ebsite gamefaqs.com collected
that irritate me. For this work, I’m inspired by the plain text “FAQs” for clas sic video games, then just
“Hurry-Cow ard So-so-morphism,” where I make known as video games. On this site, an other hero
con nec tions betw een top ics based solely on con fu - w as born. They w ere writ ing the de finitiv e guide
sion of su per ficial lexical sim ilar ities with out re- to speedrun ning the SNES game Super Metroid ,
gard to their un der lying mean ing. So for exam ple, and they saw that some of their ASCII lines ended
w e hav e “ML ” mean ing both “Ma chine Learn ing” up exactly the same length, and it looked good:
and “Meta Lan guage,” as w ell as “type” both as
Once you save the game at your ship (about 1 hour 15 minutes is good), go
in “type face” and as in “type sys tems for pro gram - down to Tourian. Do not save your game in Tourian if you have intentions of
ming lan guages.” [8] And because ma chine learn - returning to any previously explored section on Planet Zebes. There will be
a few Metroids to kill before you reach Mother Brain, and they must all die
ing has claimed so many words, there are a great in order to continue to Mother Brain. Read the boss guide for more details.
Once Mother Brain is defeated, you will need to hurry back to your ship. By
many shared with ty pog ra phy as w ell: now you will already have the HYPER BEAM. From Mother Brain’s room, go west
and then south. Take the blue door at the bottom and speed dash east. Super
jump up, and continue north. Once you land up top and are running east, aim
+----------+ diagonally down to the right and shoot an unseen door. Eventually, you will
|typography| get to this door since lava will start to rise from the floor in this area.
Speed dash through the door you preopened, and charge for a super jump. Hug
+----------+ either the left or right wall in the Craterian shaft and super jump up. Now
quickly get to your ship before the planet explodes. There should be almost
/ \ “baseline” a minute left on the timer. Sit back and watch the ending! Did you beat the
“fixed point” / \ “floating point” game within 1 hour and 20 minutes?

/ \ “weight” “vector”
“type” / \ “descent” and so they wisely de cided to word smith the en -
/ \ “kerning trick”
/ \ “dingbats” tire 28-page guide so that every line w as exactly
/ \ “gradient” the same length, with no extra spaces or other
+-----------+ +--------+ cheat ing, just because it could be done. [9]
| functional|---------|machine |
|programming| |learning| Doing this man u ally is a chore, and I do like to au -
+-----------+ “ML” +--------+ tomate the chores of Speedrun ners. [10]
“lambda” “generalization”
“parameter” “tensor” Exercise in rephrasing text. The following para-
graph needs to be rephrased so that it retains its
precise meaning, but with minor variations in the
I spent a lengthy intro duc tion talk ing about Don - specific choice of words, punctuation, and so on.
ald Knuth’s work in com puter ty pog ra phy. Now I No new facts should be introduced or removed,
can tell you what this pa per is about. If w e are giv - but it is good to use synonyms and change the
ing up on pre cision in our near AI fu ture, per haps word order and phrasing.
w e can hav e some thing w e w ant: per fect ty pog ra -
phy. This pa per is about a new type set ting sys tem, After this, I insert Rephrased text: , which is the
BoVeX, which allows us to con trol the exchange of rephras ing of the orig inal text. The model is ready
pre cision for beauty. It essen tially giv es us a dial to gen erate tokens.
betw een Lorem Ip sum and Don ald Knuth.
I then sam ple text a word at a time to con tinue this
The scientists’ findings were astounding! They prompt. If a line ends exactly on the num ber of
discovered that the powers of the Metroid might char acters that I w ant (and the next char acter is a
be harnessed for the good of civilization! space or other char acter that is ap pro pri ate to end
a line) then I accept the stream so far and con tinue.
The Metroid series is a video game series about a If I exceed the line length, I back up to the state at
brain that has been en slav ed inside a jar in an un - the begin ning of the line and try again with new
der ground dat acen ter on the planet Zebes. This ran dom sam ples. I just keep do ing that un til the
para graph is com plete, and w e hav e beau tifully (checks out).
justified mono space text that resem bles the orig i-
nal. Here is an exam ple of this para graph ren dered I can load a quan tized version of the model into
in mono space: RAM with llama.cpp. There are two dif ferent mod -
els, the 7b and the 70b, which refer to the num ber
I sample text a word at a time to continue of billions of pa ra me ters, which must be a mul ti-
this prompt. If a line ends exactly on the ple of VII for per for mance rea sons.
number of characters I want, I accept that
text so far, and continue. If I exceed the Quan tiza tion is a process that reduces the num ber
line length, I back up to the beginning of of bits used to rep resent floating-point w eights.
the line and try again with new samples. I [13] This sav es mem ory, but it also speeds up infer-
keep repeating this until I get text I can ence, which needs to read pretty much the en tire
render in monospaced font, and that is how model for every pre dicted token. I got rea son able
we can get beautifully justified monospace
text. Here is an example of this paragraph qual ity and good per for mance from LLama-v2-7b
rendered in monospace: with 16-bit floats. This model fits com pletely on
my world’s (phys ically) largest GPU. To tune vari-
The text could be im prov ed by us ing “mono space” ous set tings, I ran thou sands of tri als for the dif fer-
and “mono spaced” con sistently. The most up set - ent mod els, and made some nice cus tom graphs:
ting thing is that the para graph ends with a colon,
as if there is go ing to be an other exam ple.

The ap proach de scribed works rea son ably w ell,


but it has sev eral de ficien cies that w e will ad dress
in the real BoVeX sys tem. But it is a good exam ple
to explain some con cepts that will be use ful later.

¿Como te LLama?
Facebook’s Large Lan guage Model, [11] Llama, is
av ailable for any one who agrees not to use it to de -
stroy the world. Wouldn’t it be funny if the world
is de stroy ed by some thing called “Llama”? That’s
some Stay-Puft Marsh mal low Man stuff. Actu ally The x axis of the graph is the num ber of CPU
I hear that llamas are pretty mean, and if you are threads, and the y axis is the num ber of model lay -
think ing about hug ging a cute long-neck, you are ers loaded onto the GPU. As expected, increas ing
prob ably think ing about an alpaca. But that’s prob - the num ber of threads and lay ers on the GPU im -
ably a version of the lin ear algebra pack age LA- prov es per for mance, since the en tire model fits on
PACK. Llama-v2-70b is a good LLM which can do the GPU. For the 70b mod els (not pic tured), there
some im pres siv e things, but when I say de stroy is an abrupt drop-off in through put before w e load
the world I mean stuff like filling the inter net with all the lay ers, and my com puter gets very slug gish
infinite spam, or build ing crit ical infra struc ture on if I exceed the GPU mem ory. We see that if w e use
it in or der to cut costs, where most of our “safety” more than the num ber of phys ical cores (32), w e
mea sures con sist of ask ing the model po litely to do not see any ben efit, which is not sur pris ing be-
recite its daily affirma tions before per form ing its cause hy per thread ing ba sically nev er helps any -
tasks. That kind of thing. It’ll be at least months be- thing. The best through put actu ally uses a mod est
fore w e really hav e to worry. num ber of cores (about 12). Mostly I’m just includ -
ing the graph to demon strate that BoVeX has sup -
Any w ay, the nor mal w ay to pro gram with Llama port for includ ing PNG files.
is to use Python, and a moun tain of things that you
are not sup posed to un der stand and can not un der - Where w as I? Right. Fun da men tally, LLMs are
stand, mostly by past ing exam ples from oth ers and trained to pre dict a token giv en some sequence of
then tw eak ing pa ra me ters and prompts. I don’t tokens that pre cede them. There is a fixed set of to-
care for it. For tu nately, hu man geniuses [12]hav e im - kens for the model, and rather than pre dict a sin -
ple mented the inference code for llama-like mod els gle token, they actu ally giv e a score for every pos -
in a nice, portable C++ library called “llama.cpp” sible token. These scores are typ ically nor mal ized
into a prob ability dis tri bu tion. So for exam ple, if To gen erate mono spaced lines of the same length,
w e hav e the text I use a prompt that asks the model to rephrase the
input para graph. Greed ily sam pling the dis tri bu -
SIGBOVIK is an tion typ ically results in a copy of the input para -
graph, which is fine for our pur poses. How ever, if
then the prob ability dis tri bu tion begins as the lines are already the right length, w e need not
change them! To pre vent bore dom, when ever the
( annual) 69.8010% process repeats a line that’s already been seen, I in-
( April) 3.8023% crease the “tem per ature” mod ifier to the prob abil-
( ac) 3.2456% ity dis tri bu tion. This causes the can di date lines to
( academic) 2.9374%
( artificial) 2.0857% be more varied, but less prob able (accord ing to the
( open) 1.7993% orig inal prob ability dis tri bu tion).
( under) 1.2331%
( international) 1.1032% This is all there is to the monospac ing version. It’s
... just 300 lines of code, includ ing boil erplate, de bug -
ging code, and false starts.

with the thou sands of other tokens follow ing. So Great !! You fulfiled your mission. It will revive
three-quarters of the time the next token should be peace in space. But,it may be invaded by the
"an nual" , but there are many other rea son able pos - other Metroid. Pray for a true peace in space!
sibilities. We can pick one of these tokens how ever
w e like, ap pend it to the sequence, and run the We are sat isfied with the out put text, but w e are
model again. This giv es us a new prob ability dis - not sat isfied with the font. We w ant to hav e excel-
tri bu tion. By do ing this ov er and ov er w e can gen - lent justified text with all the perks of pro por tional
erate a likely piece of text. This is what "Lorem Ep- fonts and a pro gram ma ble doc u ment prepa ra tion
som" means when he says “Gen erativ e AI.” Rather, sys tem. We w ant to hav e it by the estimated SIG-
what he means is “the new thing that is cool,” but BOVIK dead line so that it can be used to pre pare
what he is un know ingly "re fer ring to" is that the pa per that I’m now writ ing. This is the Don ald
you can sam ple a prob ability dis tri bu tion. He has Knuth Any% speedrun.
prob ably nev er even read the "wikipedia ar ti cle
on Markov chains" . The boxes-and-glue al go rithm
If I alw ays sam ple the most likely token, I alw ays The justification of mono spaced text looks quite
get the most likely text. It is good to be likely; this bad [14] when more than one space is inserted be-
is why the model is use ful. How ever, you might tw een words. We can tell if text is suit able for some
not w ant exactly the same result each time, and in width by adding up the code points. For the full-on
many situ ations if you only sam ple the most likely ty pog ra phy case with pro por tional fonts, there are
token, you get very bor ing, repet itiv e text. Pseudo - many more de grees of free dom. It is pos sible to ex-
ran dom num ber gen eration is the spice of life! pand or con tract the space betw een words a little
bit, even if it varies from line to line. We can also
We do not need to sam ple from the prob ability dis - hy phen ate words.
tri bu tion to gen erate text. We can sim ply pick the
token w e w ant. This is how the ini tial “prompt” In the time when I w as being born, and prob ably
works; w e just run the inference process one to- being very up set about it, Knuth w as hav ing sim -
ken at a time, but alw ays select the next token in ilar feelings about the w ay his computer-typeset
the prompt, ignor ing the prob abilities. So at each doc u ments looked. He dis cov ered a nice ab strac -
mo ment, the text w e’ve gen erated so far (more or tion that gen eralizes most of these ty po graphic de -
less) com pletely char acter izes the state of the LLM. grees of free dom, and de vised an algo rithm for
This means that w e can eas ily go back to ear lier pro duc ing op timal text lay out giv en some pa ra -
mo ments and gen erate a dif ferent con tin u ation of me ters. [15] The idea is to con sider the text of a para -
the text, by sim ply replay ing tokens. We also hav e graph as con sist ing of rigid “boxes” (say, words)
the op tion of stor ing the LLM state (giga bytes) in and stretchy “glue” (say, space) betw een them.
RAM, which allows us to return to a pre vious state Both boxes and glue hav e various de tail (and can
in con stant time. be extended to sup port all sorts of quirks) but the
ba sic algo rithm can be un der stood with just those
pieces. So, let’s do that. cur rent line and the remain ing text. The line is the
num ber of words before the cur rent word that are
Knuth’s pa per is great, but I started hav ing spoiler included on the line, and the text is the po sition in
feelings when read ing it, so I figured out my own the string where w e’ll next look for a word. That’s
algo rithm, which is more fun than read ing. The all there is to it; w e start with empty text and then
key insight is that you do not need to try all pos si- write the loop to fill out cells in the right or der.
ble break points. When ever w e break after a word,
the prob lem is the same for the rest of the text, no Knuth’s boxes-and-glue algo rithm has many exten -
mat ter how w e got there. This lends itself to a dy - sions, and mine has many exten sions as w ell. For
namic pro gram ming algo rithm. exam ple, w e’ll talk about how you can adapt the
algo rithm to per form hy phen ation and kern ing.
Dy namic pro gram ming is a pro gram ming tech - There are many rab bit holes to go down, and I ex-
nique for solv ing prob lems on the white board at plored the ones that at tracted my at ten tion. There
tech com pa nies. When I w as young, it w as a mys te- is plenty of time to add more features later, since
rious con cept because of its strange name. Here is I hav e now cursed my self to use BoVeX for my fu -
how I think about it. Imag ine you hav e a recur siv e ture SIGBOVIK pa pers.
pro cedure that solv es the prob lem. In this case, the
pseudocode is some thing like But here’s where I di verge from Knuth some what.
Knuth w as reluc tant to add a pro gram ming lan -
pair<int, string> Split(string line, guage to TeX,[16] but I spent the ma jority of my
string text) { time on this project im ple ment ing a full-fledged
if (text.empty()) return {0, ""}; lan guage. BoVeX is about 33,000 lines of code, the
auto [word, rest] = GetFirstWord(text); ma jority of which is the im ple men ta tion of the lan -
// try splitting guage itself. That’s 110 times longer than the orig i-
auto [penalty1, rest1] = Split(word, rest); nal mono spaced proof of con cept and 30 times the
penalty1 += badness from leftover space;
// try not splitting length of this doc u ment!
auto [penalty2, rest2] =
Split(line + " " + word, rest); The BoVeX lan guage
penalty2 += badness from line too long;
if (penalty1 < penalty2) { The BoVeX pro gram ming lan guage is de scribed in
return {penalty1, word + "\n" + rest1}; this section, and its im ple men ta tion. If you are just
} else { in it for the jokes, you can skip this section, which
return {penalty2, word + " " + rest2}; is ba sically serious and loaded with pro gram ming
} lan guage the ory jargon.
}
BoVeX is a typed func tional pro gram ming lan -
The line and text are split. In the nor mal case, guage in the ML fam ily. It has syn tax that closely
there is a word left, and two pos sibilities: either resem bles Stan dard ML. Here is an exam ple piece
split ting after the first word, or not split ting. This of code from the source code of this doc u ment:
is expo nen tial time, because each call makes two
recur siv e calls, to try each of the two op tions. But datatype (a) option = SOME of a | NONE
deep recur siv e calls will be made with the same ar -
gu ments many times. So, add some mem oiza tion: fun consume-outer-span f s =
case layoutcase s of
If the func tion is called for the same line and text a Node (SPAN, attrs, children) =>
second time, just return the same an sw er as before let
with out do ing any work (espe cially not mak ing re- val (ropt, rchildren) =
case children of
cur siv e calls again). This lim its the func tion to be one :: nil => consume-outer-span f one
called at most once for each pos sible ar gu ment; w e | _ => (NONE, layout-concat children)
can then see that line is no longer than the input in
(so it is size O(n)), and text is alw ays some suf fix case (f attrs, ropt) of
(NONE, _) => (ropt, span attrs rchildren)
of the input (so it is size O(n)), giv ing O(n 2) calls. | (SOME vouter, inner as SOME _) =>
(inner, rchildren)
In dy namic pro gram ming, w e store the values of | (outer, NONE) => (outer, rchildren)
end
all recur siv e calls before w e need them, and then | _ => (NONE, s)
look them up. For this prob lem, w e store the val-
ues in a ta ble indexed by the two pa ra me ters, the
The lan guage you see here is a full-fledged pro - AST pools. datatype de clara tions are for ) and an -
gram ming lan guage, [17] although it is not a stan - noy ing in C++. I con tin ued to exper iment with dif -
dard one. It sup ports higher or der func tions, poly - ferent w ays to do this. I use arena-sty le allocation
mor phism, algebraic datatypes, pat tern match ing, for the syn tax nodes (alw ays const after creation),
Hindley-Milner type inference, and so on. It is ba si- so that they can be created and reused at will. My
cally core (no mod ules) Stan dard ML,as allow pat - cur rent favorite ap proach to ma nip u lat ing the
terns on both sides. Any w ay, a full de scrip tion of nodes is to write “in” and “out” func tions (tedious,
the lan guage would be bor ing and take too much man ual) for each con struct in the lan guage. The
time as the SIGBOVIK dead line draws closer. syn tax nodes can then be im ple mented how ever I
like (for exam ple, a flat struct or std::variant ). I
Im ple men ta tion get the compiler ’s help when ever I change the lan -
guage (which is often!) since each in/out func tion
In the past I hav e im ple mented many sim ilar lan - is explicit about its con stituents.
guages, includ ing for my dis ser ta tion. [18] I could
hav e started from one of my existing im ple men - Passes and guesses. Many trans for ma tions in
ta tions, but they w ere writ ten in Stan dard ML, a com piler rewrite a lan guage to itself; for exam -
which I could not get to work on my com puter in ple each IL op timiza tion is a func tion from IL to
2024. So I started ov er from scratch in C++, which IL. These can be tedious to write and up date, espe -
at least does work on my com puter. (I also w anted cially since a giv en op timiza tion usu ally only cares
to inter face with GPU inference code for run ning about one or two con structs in the lan guage. I use
the LLM, which would be eas iest from C++). C++ the “pass” id iom to write these. This is ba sically an
is not a good lan guage for writ ing lan guage im ple - iden tity func tion on the AST that pulls apart each
men ta tions, but it has got ten bet ter. node, calls a vir tual func tion for that node, and
then rebuilds the node. To write a pass that only
The BoVeX im ple men ta tion is a “com piler” cares about one type of node, you inherit from this
in the sense that it trans forms the source lan - class and then just ov erride that one node’s func -
guage through mul tiple inter me di ate lan guages tion. One issue with this is that each time you re-
into a low-lev el byte code. This byte code is just build the en tire tree you create a lot of un nec es-
straight-line code on an ab stract ma chine with infi- sary node copies. So exchang ing tedium (mine)
nite reg isters and op erations like al lo cate and set - for efficiency (my computer ’s?), every node type’s
field . It does not pro duce ma chine code, and al- “in” func tion also takes a “guess” node pointer. If
though this would be pretty feasible, it would not the node being con structed is exactly equal to the
be the first thing to do to make BoVeX faster. guess, then w e return the guess and avoid creat ing a
copy. Then the base pass is actu ally the iden tity (it
First, it con cate nates the source files (han dling im- returns the same pointer ) and does no long-liv ed
port and keep ing track of where each byte orig - allocations. This seems to be a good com pro mise
inated, for error mes sages), and then it lexically betw een the tra di tional garbage-fountain ap proach
parses them into the Exter nal Lan guage (EL). Then, and hash con sing, which sounds like it would be a
it elab orates EL into a sim pler and more explicit good idea but is usu ally just a lot slow er. [20] For
Inter nal Lan guage (IL), which is nice and clean. I type-directed trans for ma tions, there is also a
lov e writ ing op timiza tions, but I had to keep my - typed IL pass class, which re cur siv ely passes a
self out of there, or else this would be a 2025 SIG- con text and does bidi rec tional type check ing of
BOVIK pa per. There are just enough to make the the in ter me di ate code. Clo sure con version is a
code rea son able to de bug if I need to look at it. Af- type-directed pass and is im ple mented this w ay.
ter op timiza tion, I per form closure con version,
sim plify again, and gen erate the final “byte code” Pars ing . I hav e this av ersion to parser gen erators,
form. This en tire process hap pens when ever you prob ably because one time I tried to get some one
gen erate a BoVeX doc u ment; the only out put from else’s code to com pile and it com plained about hav -
run ning bovex.exe is the PDF doc u ment. ing the wrong bovines on my com puter and ru ined
my w eek end. After try ing some other people’s C++
I did not cut cor ners on the lan guage im ple men - pars ing libraries and being dis ap pointed by them,
ta tion. For exam ple, com pil ing mutually-recusiv e I did what Knuth would do: I wrote my own. It is
poly mor phic func tions is obnox ious, but I did it. a parser com bina tor [21] library which actu ally de -
The full story is in the source code. [19] scends di rectly from Okasaki’s SML code. [22] I w as
proud of my self for get ting this to work in C++,
since C++’s insane type sys tem is im pos sible to un - object O of { field1 : type1, field2 : type2 }
der stand and its error mes sages are even worse.
(BoVeX’s error mes sages are extremely spar tan, of- and then use this in an expres sion like {O} field1 =
ten sim ply de clar ing Parse er ror at paper.bovex exp1. These object names do not hav e any run-time
line 1, but in many w ays this is more use ful than mean ing; they are just a collection of field types
C++’s mile-long SFINAE vom itus.) It sup ports that are com monly used together. It giv es a good
mutually-recursiv e parsers, res olu tion of dy namic place to doc u ment what they mean and some op -
infix op erators, and all that. My template-heavy por tu nity for bet ter error mes sages, but fun da men -
parser com bina tors take clang about a minute to tally an object is just a collection of named data.
com pile, which is accept able. Less accept able, but Think like “JSON” object. It is pos sible to add and
some thing I only learned after us ing this to write remov e fields from objects (func tion ally) with ex-
a 14–page-long pa per, is that the parsers are very pres sions like exp1 with (O)field2 = exp2.
slow. Putting aside LLM inference, this pa per
takes 13 seconds to ren der into a PDF, 11 seconds The bib liog ra phy for mat in BoVeX con sists of de c-
of which is pars ing! There must be some bug, but I lara tions like this.
don’t know if it’s in my gram mar (it is easy to acci- val knuth1981breaking =
den tally write an expo nen tial time parser, but this bib-article {(Article)
one should not be) or the parser com bina tor library title = "Breaking paragraphs into lines",
(also my fault) or clang pro duc ing bad code (it author = "Knuth, Donald E. and Plass, Michael F.",
journal = "Software: Practice and Experience",
may be giv ing up on op timiza tions, since it is tak - page-start = 1119,
ing so long to com pile; the .o file is 41 megabytes). page-end = 1184,
But these are de tails to be im prov ed in the fu ture. year = 1981,
month = NOVEMBER,
publisher = "Wiley Online Library",
Garbage collection. I keep track of all the point - }
ers that are allocated dur ing execu tion. It’s so easy
that I didn’t even im ple ment it! I hav e 256 giga -
bytes of RAM. In fact, LLM inference acts as a use - Each reference de clares a set of op tional fields. It is
ful “per for mance reg u lator” to make sure that w e too tedious to make each field explic itly op tional,
don’t allocate mem ory too fast. and since the data hav e het ero geneous types, ma -
nip u lat ing a string-indexed data struc ture would
Ob jects be more cum ber some. The bib liog ra phy ren der ing
code case an alyzes the pres ence of fields to ren der
As the SIGBOVIK dead line grew near, I reluc tantly cita tions that hav e dif ferent sub sets of data.
added “ob jects” to the BoVeX lan guage. Ob jects
are no stranger to ML; for exam ple the O’Caml Lan - The paper.bovex source file is a tree struc ture with
guage [23] (pro nounced “OK ML”) has them. [24] But op tional at trib utes on each node, which are rep re-
the com mu nity of func tional pro gram mers I w as sented with an object. This para graph is writ ten in
raised in has a revul sion to things Ob ject Ori ented, the paper.bovex source file as:
just like how a wood worker will im me di ately pro -
jectile vomit if they see a piece of Ori ented Strand Another use is in the [tt[layout]] type.
Board, even though it is a fine tool for many ap pli - This is a primitive type that most of a
cations. I still hav e this dis gust reflex. I imag ine my document’s text is written in. It is a
tree structure with optional attributes
Ph.D. ad visors, should they read this, are con tem - on each node, which are represented with
plat ing whether and how a Ph.D. can be revoked. an object. For example, this paragraph is
Any w ay, I de liberately kept objects low-tech so written in the [tt[paper.bovex]] source
that noth ing could get too Ori ented. file as:
There is one object type in BoVeX. A value of this The square brack ets are used to write a lay out lit-
type has an ar bitrary set of named fields whose eral (the main body of the doc u ment is inside one
types are known; they can only be the base types large literal). Lay out literals can also em bed expres -
int, float, string, bool, lay out, or obj. Fields are dis - sions (of type lay out) with nested square brack ets.
tinct if they hav e dif ferent types. An object can be Here the func tion tt is ap plied to a lay out literal
intro duced with an expres sion like {(field1 = exp1, that con tains text like paper.bovex . The tt func -
field2 = exp2)} or by de clar ing an object name O: tion just adds the font-family at tribute with value
"Fixed er SysLight" to the lay out node. This is a
cus tom mono spaced bitmap font that I made for think ing of the para graphs as “words” (accept able
this pa per us ing soft w are I wrote. It is part of the to break at any line, but bad to break near the start
Fixed er Sys fam ily. [25] Func tions like b and it ap - or end of a para graph) and the columns as “lines.”
ply bold and italic text sty les, but func tions can do It could be used by the doc u ment au thor for other
any thing that you can do in a general-purpose pro - pur poses, I guess. There are other ty po graphic fea-
gram ming lan guage. tures av ailable.

Pri mops The BoVeX ren der ing pipeline is a bit like a com -
piler: It trans forms programmer-written source lay -
Ob jects are used for inter facing with the run time out into for mat ted para graphs, then into boxes of
that executes BoVeX byte code. There are about 50 known size, then into stick ers of known po sition.
dif ferent built-in pri mops that can be used by Bo- At the end, it out puts the doc u ment as a PDF.
VeX pro grams. These include sim ple op erations
such as ad di tion, but also heavy w eight op erations The ren der ing process can report “bad ness” by
such as “load and reg ister this collection of True - calling the emit-badness primop. Bad ness is mea -
Type font files as a font fam ily” or “in voke the sured in square points of area that are out side of
boxes-and-glue pack ing algo rithm with these pa ra - the con tainer. Worse situations—such as text ov er-
me ters.” Pri mops in the for mer cat egory work nat - lap ping other text—hav e their bad ness scaled up
u rally on sim ple base types, but pri mops in the lat - per the same area of ty po graphic hor ror. Less seri-
ter cat egory need to be able to pass com pli cated ous infractions—such as a little too much space be-
tree-structured het ero geneous data betw een the Bo- tw een words—hav e bad ness scaled down.
VeX byte code execu tor and the run time. It would
be pos sible for the run time to con sume and create Fonts
BoVeX values like tu ples and lists, but this has two
prob lems: one, many types like list are de clared as BoVeX can ren der your doc u ment in plain Times
user code (in the BoVeX stan dard library); they are Roman if you don’t care about any thing, or it can
not spe cial, and w e don’t w ant to make them spe - load 13 other bor ing PDF fonts, or it can load any
cial by inform ing the run time of them. Two, requir - True Type font from font files. (They do not need
ing spe cific rep resen ta tions at the run time bound - to be “in stalled,” and it won’t help to install them.
ary inhibits op timiza tion; for exam ple, w e can nor - You just put them in the di rectory with your doc -
mally an alyze the whole pro gram to flat ten data u ment.) It loads their kern ing ta bles and ap plies
struc tures or remov e record fields that are nev er kern ing prop erly, by gen erat ing rigid boxes at the
used. The run time typ ically uses obj to com mu ni- sub-word lev el with un break able glue. I w as dis ap -
cate struc tured data. pointed to find that most fonts include only a few
dozen kern ing pairs. They do this in or der to “sav e
For exam ple, the pack-boxes prim itiv e runs the space” in the font file, which is ut terly rich com ing
boxes-and-glue algo rithm. It takes some lay out from some one that would try to sav e space inside
(which is expected to be a series of box nodes, with of words by squeez ing letters together! In the cur -
at trib utes giv ing their size, glue prop erties, and rent font Palatino, the word “BoVeX” is not kerned
so on) and con figu ra tion pa ra me ters like the type cor rectly because the rare bigraph “oV” does not
of justification and algo rithm to use. It returns an hav e a kern ing pair. I hope to im prov e this de tail
object with a new lay out (the boxes grouped into in a fu ture version (per haps for the pre sum ably
lines, with new glued up widths) as w ell as the to- forth com ing video version of this pa per ).
tal bad ness. Inside the BoVeX lay out sup port code,
this primop is wrapped as pack-boxes with a na - Hy phen ation
tiv e, typed inter face, so pro gram mers do not need
to think about that im ple men ta tion de tail. Other ty - In A.D. 1455, Johannes Guten berg invented the hy -
po graphic features that ben efit from run time sup - phen for his Guten berg Bible, then just known as
port are im ple mented this w ay as w ell. Bible. [26] His print ing process required the lines to
all be the same length, so he had to stick these lit-
Ty po graphic fea tures tle guys all ov er the place. His hy phens looked like
this: . Later on w e straight ened these out and de -
The pack-boxes algo rithm, which is offered by Bo- cided w e only needed one at a time, and today w e
VeX, can be used to nicely justify text. It can also use them not because w e require our lines to all be
be used to dis trib ute para graphs into columns, by the same length, but because w e like the cog nitiv e
chal lenge of remem ber ing the begin ning of the Exercise in rephrasing text. The following para-
word while w e mov e our eyes to the begin ning of graph, which appears between <P> and </P>
the next line while read ing. tags, needs to be rephrased so that it retains its
precise meaning, but with minor variations in the
In BoVeX, w e break each word into boxes at le- specific choice of words, punctuation, and so on.
gal hy phen ation points, mark ing these points as No new facts should be introduced or removed,
sort-of-bad to break, and if w e do break, w e insert and all the ideas from the original paragraph
should appear. However, it is good to use syn-
the hy phen char acter and use a little more space. onyms and change the word order and phrasing.
By de fault, the hy phen sticks out of the end of the
line a little bit. This is actu ally a bug but I like it. The text contains markup as well. There are two
types: <span class="c0">text goes here</span>
The hy phen ation algo rithm I use is the same as and <img src="image.png">. These should be
the one used by TeX, which is clev erly rep resented preserved in the rephrased text. <img> tags
as a pri or itized set of pat terns in or der to fit com - absolutely need to be retained and should not
pactly in mem ory. [27] Again, you hav e to respect change their sources, although it is permissible
Knuth and crew’s at ten tion to de tail, although to to move them around in the text. <span> should
be fair this algo rithm also dates to a time when generally be retained, but the contents could
stor ing a spell check dic tio nary in a computer ’s change. The classes of spans may not change,
mem ory w as de scribed as “not feasible.” So some and only the classes that appear in the original
of this w as out of ne cessity. One of the nice things text may be used.
about the rep resen ta tion is that it gen eralizes to The first part is ba sically the same as what I used
words that w ere not in the 1974 Merriam-W ebster for the mono spaced version, except that I ask the
Pocket Dictio nary. For exam ple it hy phen ates SIG- LLM to de limit the para graph. This is im por tant
BOVIK cor rectly. so that I know when it thinks it’s done, and seems
to work bet ter than look ing for new lines or the
The de tails really go on and on. The hy phen ation
end-of-stream token. The second part is new. I trans -
dic tio nary is stored in a file called hyph-en-us.tex.
late the lay out into plain text where un inter preted
“hyph” here, of course, stands for hy phens, and
sub trees are replaced with <img src="img1.png"> .
“en-us” means “Eng lish (United States).” In fact it
These are gen erally boxes whose con tents are not
is the stan dard lan guage code for US Eng lish in the
text. This could be an actual inline im age or lay -
Small Lan guage Model called IETF BCP 47.[28] But
out used to con trol ren der ing, like some bit of hor -
then w e hav e “hyph-en”, which is a plau sible hy -
izon tal space. Nodes that are used to set text prop -
phen ation of “hy phen”! You could even read it as
erties of the sub trees with at trib utes (like fonts, col-
“hy phen us, tex”, as a request for TeX to hy phen -
ors , sizes , etc.) are trans lated into dis tinct classes and
ate the words in this file. This is the kind of de tail
marked up with <span class="c0">...</span> .
I’m talk ing about! (There is also hyph-uk, which
The LLM has seen plenty of HTML, so it’s able to
for once sounds a little less dig nified than the US
use these rea son ably w ell.
accent.)
After gen erat ing a rephras ing, I parse the out put
Rephras ing HTML and match it up with the orig inal lay out.
If I find any bro ken HTML, it is rejected. If I find
The BoVeX LLM can be used to rephrase text in or - any <img> tag referenc ing a src not in the orig inal,
der to make it more beau tiful. it is rejected. If I find any <span> tag referenc ing a
class not in the orig inal, it is rejected. The more
Un like mono spaced text, a line of pro por tional text com plex ity that the orig inal lay out has, the higher
nev er fits exactly (bad ness 0), so w e need to ap ply the chance of a rejection, but rephras ing gen erally
some glue to make it fit. This gen erally has a cost, suc ceeds. But rejecting sam ples slows us down, so I
even when the text looks great. leav e off the second part of the prompt in the com -
mon case that the input para graph is plain text.
The para graph to be rephrased is some lay -
That w ay the LLM doesn’t even try us ing markup.
out value. I gen erate a textual rep resen ta tion for it
and feed it to the LLM. The prompt looks like this: BoVeX can rephrase the text with HTML and orig -
inal lay out matched up. It pre serv es nested lay out
and at trib utes. The ren der ing process con tin ues.
How ever, w e can’t just use the bad ness score to “hy per beam” to de feat Mother Brain.
tell us how good our rephras ing is. The bad ness
score is a mea sure of how bad the line breaks are, Since w e will run the boxes and glue algo rithm
and it doesn’t tell us any thing about how good the on mul tiple related texts, I gen eralized that algo -
rephras ing is. We need to find a w ay to com bine rithm to work on tree-structured input. This is
the bad ness score with the seman tic loss to get a clean; the memo ta ble keeps the same di men sions,
mea sure of how good the rephras ing is ov erall. but records an ad di tional fact. Now w e store the
penalty, whether to break after this token, and
I wish I could tell you I solv ed this one with a beau - what the best sub tree is. We hav e to con sult each
tiful algo rithm. But so far, I just hav e some thing sub tree when com put ing the score for a node, but
rea son able that works. I gen erate many dif ferent this does not affect the asymp totic run time. The ta -
rephras ings, with their seman tic loss, and run each ble size is still at most O(n 2), and although w e ex-
of them through the boxes-and-glue algo rithm to plore more chil dren per node, branches in the tree
get the ty po graphic bad ness. I choose the one that reduce the max imum depth to the root, which ac-
op timizes the pre ferred trade off betw een seman tic tu ally reduces one of the factors of n to log( n) as
loss and ty po graphic bad ness. This process is con - the tree becomes com plete. How ever, as the SIG-
trolled by BoVeX code, which is in the source code BOVIK dead line crept upon us, I nev er actu ally
of this very pa per. Knuth has a very low tol erance hooked this func tion ality up. It would require ad -
for seman tic loss, and knows that his algo rithms di tional (pro gram ming) work to merge the trees,
pro duce good results with out rephras ing. Lorem and the lay out process is so fast that it doesn’t mat -
Ep som just w ants it to look good and sound good. ter; I can eas ily run the full lay out algo rithm on
Both hav e pub lished in SIGBOVIK 2024. hun dreds of rephras ings per para graph.

Our first step is to gen erate a rephras ing that I would like to im prov e the algo rithm, because it
matches the orig inal text exactly. We do this by does seem like there should be a w ay to integrate
sam pling the most prob able path through the tree. the boxes-and-glue dy namic pro gram ming algo -
At each node, w e take the first token with the high - rithm with the path exten sion algo rithm so that
est prob ability. This is our first rephras ing, and it w e pri or itize explor ing nodes that are likely to gen -
usu ally matches the orig inal text exactly. We com - erate the best bal ance of ty po graphic and seman -
pute the seman tic loss as the av erage prob ability tic qual ity. It won’t be as sat isfyingly op timal as
mass skipped ov er all the tokens in the path. For boxes-and-glue itself because w e hav e incom plete
this first path, w e alw ays took the most prob able infor ma tion (w e nev er know whether one of the
token, so this is 0.0 by de finition. expo nen tially many paths starts out with im prob -
able tokens but then ends with a mir acle streak of
The next path w e explore will di verge from this prob able tokens). But it can certainly be more sat -
path at some node (maybe the root). We pick a sifying. Knuth would not stop here (but this is an
node that is likely to result in a good final loss, by Any% Knuth speedrun).
scor ing each node in the tree. The score is the av er-
age prob ability of all an cestor nodes times the prob - I spent my time im ple ment ing an achiev ement sys -
ability of the next highest-probability token that tem in BoVeX. The first time certain con di tions are
w e hav e not yet explored. The node with the high - met, the sys tem aw ards you an achiev ement and
est ov erall score is the one w e expand, by choos ing prints a nice color tro phy on your ter mi nal. For ex-
that next highest-probability token. We are now in am ple, you can get the “Not bad” achiev ement for
an un explored part of the tree, and so w e sam ple gen erat ing a doc u ment that is at least 5 pages and
the most prob able nodes repeat edly un til w e reach has less than 1000 bad ness per page.
</P>. Speak ing of which, BoVeX has a heck of a
time try ing to rephrase these last few para graphs Ad van tages of rephras ing
because they literally con tain the text </P> in them.
The man ual rephras ing of triv ial de tails, such as
The scores should be seen as heuris tic. We would syn onyms and word or der, can be au tomated. For
get dif ferent results by choos ing dif ferent w ays exam ple, when I wrote the open ing para graph of
of com put ing the score. This is an exam ple of a this pa per, I could hav e used the same syn onyms
“beam search” algo rithm. This con nects the project each time, and let the ty po graphic con sid erations
to Super Metroid, which inspired this work. In the de ter mine which syn onym to use each time.
game, one of the final things you do is acquire the
Con clu sion an other planned feature, which is the abil ity for
the doc u ment to be glob ally op timized by ap ply -
This paper—and with this paper—present BoVeX, ing a black-box op timizer to a set of user-specified
a new com puter type set ting sys tem. It follows the pa ra me ters. For exam ple, the column width, line
tra di tion of TeX, but with mod ern ameni ties such spac ing, or font size could be tw eaked to make the
as requir ing ov er 128 giga bytes of RAM. Though doc u ment fit bet ter. This feature is “Auto-Margin
some may con sider the ad di tion of AI features to Plus.” Things are already set up to do this pretty
TeX to be an un nec essary per version, I find this straight for w ardly; w e would sim ply gen erate the
use of LLMs to be fully justified. doc u ment ov er and ov er while search ing ov er the
pa ra me ter space, and choose the one with the least
Fu ture work bad ness. This may also affect which rephras ings
look best. But instead I spent my pre cious time im -
Typo graphic features. Foot notes are de sir able. It ple ment ing 3D text .[29]
3D text
is hard to write a pa per with out them. Where am I
sup posed to put the bonus di gres sions? The lay out Re pro ducibil ity. The algo rithm for rephra seing
of foot notes is tricky and should be part of a gen - text tries to find the best place to explore the next
eral float ing figure im ple men ta tion. End notes are most likely token from the prob ability dis tri bu tion.
easy, but I don’t w ant them. I w ant them to be lit- This expects the gen eration of these dis tri bu tions
tle foot notes so that you can’t help but read them. to be de ter min istic. Math emat ically, inference is
de ter min istic (it is just a bunch of ma trix mul tipli -
The SIGBOVIK pro gram com mit tee does not sup - cations), so this “should work.” But in prac tice the
port page num bers, and this is good because page enor mous calcu lation is per formed in an un pre -
num bers are for bid den by BoVeX. dictable or der as it is executed in par allel (in mul -
tiple CPU and GPU cores). Because float ing point
TeX is famous for its math emat ical type set ting arith metic is not associativ e (or dis trib u tiv e, com -
as w ell. It would fit neatly into BoVeX in the mu ta tiv e, or other prop erties you’ d like), inference
same w ay, since both use the same fun da men - can some times gen erate dif ferent an sw ers due to
tal boxes-and-glue en gine. BoVeX does not hav e float ing point round-off error. [30] Alas, these are
"macros" or "modes" like TeX, but it would work not even nec essar ily related to the final prob abil-
cleanly to write a BoVeX func tion $ (or, if you like, ities in the model, as billions of non-linear op era-
$ ) that parses a cus tom syn tax. In fact it would tions hap pen within the hid den lay ers of the net -
be nat ural to hav e dif ferent parsers for dif ferent work. The effect is not par ticu larly grav e; w e might
maths, so that you don’t need to parse $ -> as $ in miss out on a highly likely path because the prob -
math emat ical con texts that don’t use $ at all. ability dis tri bu tion w as dif ferent the second time
w e looked at it. There are already lots of w ays w e
Op ti miza tion. There are many op por tu nities might fail to find highly likely paths, so this is not
to make BoVeX code faster. This is mostly im por - some kind of repro ducibil ity crisis. It is mostly just
tant for when it is being run in a loop in or der to a bit un sat isfying.
try out many dif ferent rephras ings of the same
text. (That said, I do not wish to pre clude what Uni code sup port This would hav e been help ful
could be done with BoVeX by assum ing its execu - when abov e I de cided to show you Gutenberg’s
tion is do ing only type set ting tasks. For exam ple, funny hy phen, , for which I had to set tle for em -
shouldn’t you be able to chal lenge your paper ’s re- bed ding a crappy hand-drawn PNG file. Instead
view ers to a game of chess against a strong en gine I could hav e used U+2E17, which since this exotic
em bed ded within your doc u ment?) The first thing code point it is not present in the font Palatino, you
to fix is that it ma nip u lates too many strings at run - could hav e expe rienced as . BoVeX is wit ten with
time (e.g. the code, record labels, object fields, and some Uni code sup port, with the main excep tion
“reg isters”). This is easy to fix since these are all being that the PDF out put code only sup ports the
known at com pile time. There are lots of high-lev el em bar rass ingly diminu tiv e WinAn siEn cod ing. [31]
op timiza tions left to do for the IL code (com mon
subex pres sion elim ina tion, con stant ar gu ment re- Dead lines. Al though BoVeX itself is very fast,
mov al, un cur ry ing, etc.) and lots of peep hole and rephras ing is very slow. This presents a prob lem
control-flow op timiza tions left to do for the byte - for the typ ical w ay that aca d emic pa pers are writ -
code (cur rently no op timiza tions are per formed ten, which is to do all the work in a coffee-fueled
at all). All of this becomes more im por tant if I add fugue in the last few days before the dead line, then
stay up all night writ ing the pa per and find ing ci-
ta tions for the pro-forma “re lated work” section In the mean time, there is an eas ier w ay to get zero
which you did last but you know that the review - bad ness: Delete the whole doc u ment! As a wise
ers will insist upon, and tw eak ing \vspace and per son once said, "If you can’t say some thing with
\begin{figure}[h!] un til it fits within the page limit. non-zero ty po graphic or seman tic loss, don’t say
On the one hand, BoVeX does po ten tially free the any thing at all."
au thor from the visual tw eak ing process. But on
the other hand, the LLM inference for the rephras - Acknow ledge ments. I’d like to thank my ad visor
ing process can be quite slow, and it can take many Karl Crary for his help and sup port. 20 years ago,
hours or days to fully bake a long pa per! For this w e set out on an ill-fated at tempt to replace LaTeX
rea son, it may be bet ter to change con ference dead - with an SML-like lan guage mT eX, which com piled
lines to a sys tem where the pre-rephrasing text is into TeX macros. The nest ing square brack ets syn -
sub mit ted. The pub lish ers (what do they even do?) tax w as Karl’s idea, and BoVeX shares genetic ma -
can be the ones to execute the rephras ing in the terial with mT eX for sure.
cloud as they pro duce the “camera-ready copy.”
With straight for w ard exten sions, this would also al- See you next mis sion,
low the rephras ing to adapt to changes in the ov er-
all vol ume sty le, or to ad just to avoid em bar rass - Tom 7
ing ty po graphic con cidences with other ar ticles in
the same vol ume (such as us ing the same no ta tion
with a dif ferent mean ing). In prin ciple, the pa per Bib li og ra phy
could edit itself to respond to feed back from re-
view ers, in a w ay that min imizes the seman tic dis -
tance from the orig inal. This rapid feed back loop [31] Adobe . "PDF reference: Sixth edi tion ". Octo-
could reduce the time to pub lication, per haps to ber 2006.
mere months, or even w eeks!
[7] You can just go to arxiv.org and click on any
Other w ays to min imize bad ness. The BoVeX ran dom ar ticle these days .
sys tem allows the doc u ment au thor to exchange se-
[16] N Bijlage . "Knuth meets NTG mem bers ". NTG:
man tic con sistency for higher qual ity ty pog ra phy.
MAPS , 16. March 1996. pp. 38–49.
Although w e achiev e state-of-the-art results, there
are likely points that are more Pareto-efficient [3] Russ Bynum . "Bernie Sanders w ants the US to
than what BoVeX can reach.BoV eX uses one of adopt a 32-hour work w eek. Could work ers and
the most pow erful pub licly av ailable LLMs, but com pa nies ben efit?". March 2024.
that model is lim ited to rewrit ing the text within
nar row constraints.Irrresponsible research has [12] https: / / github.com / ggerganov / llama.cpp . gger -
demon strated that lan guage mod els are capa ble anov . March 2024.
of vo lition, tak ing actions and us ing tools to ac-
com plish goals. With mi nor mod ifications, it is [26] Johann Guten berg . "Bible". 1455.
likely pos sible to expand the Pareto fron tier of the
semantic/typographic tradeoff.For exam ple, some - [4] https: / / allfamousbirthday.com / donald-knuth / .
times w e could im prov e the ty po graphic qual ity Febru ary 2024.
of the text with out any seman tic loss, by act ing on
the world to make the reworded text true . Hu man au - [23] https: / / ocaml.org . "The Ob jectiv e Caml sys -
thors do this already: Ear lier when I w as de scrib - tem ". 2023.
ing internal-pack-boxes , rather than explain the
some what awk w ard im ple men ta tion, I w ent back [21] Gra ham Hut ton . "Higher-order func tions for
and changed the already-working code so that pars ing ". Journal of functional programming, 2(3).
it would serv e as a sim pler exam ple of how pri - 1992. pp. 323–343.
mops use obj , but still be truthful.Now imag ine
the dif ficulty in type set ting a state ment like “The [5] Don ald E Knuth . "The Art of Com puter Pro -
uni verse con tains ap prox imately 1,000,000,000 gram ming: Volume 4A, Com bina torial Algo rithms
pa per clips,” and how much more beau tiful Part 1". Addison-Wesley. Janu ary 2011. 912 pages .
the text could be if that num ber w ere instead
10,000,000,000,000,000,000,000,000,000,000,000,000! [6] Don ald E Knuth . "The Art of Com puter Pro -
gram ming: Volume 4B, Com bina torial Algo rithms
Part 2". Addison-Wesley. October 2022. 736 pages . [22] Chris Okasaki . "Even higher-order func tions
for pars ing or Why would any one ever w ant to
[15] Don ald E Knuth, Michael F Plass . "Break ing use a sixth-order func tion? ". Journal of Functional
para graphs into lines ". Software: Practice and Experi- Programming, 8(2). March 1998. pp. 195–199.
ence, 11(11). No vem ber 1981. pp. 1119–1184.
[28] A Phillips, M Davis . "Tags for iden tifying lan -
[27] Franklin Mark Liang . "Word Hy-phen-a-tion guages ". Sep tem ber 2009.
by Com-put-er ". 1983.
[9] https: / / gamefaqs.gamespot.com / snes /
[17] Robin Milner, Mads Tofte, Robert Harper, 588741-super-metroid / faqs / 10114. rs1n . "Super
David Mac Queen . "The de finition of Stan dard ML Metroid Speed Guide and FAQ". 1996.
(Revised) ". MIT Press. May 1997. 114 pages .
[24] Didier Rémy, Jérôme Vouil lon . "Ob jectiv e ML:
[8] John C Mitchell . "Type Systems for Pro gram - An effectiv e object-oriented exten sion to ML". In
ming Lan guages ". Van Leeuw en, Jan, ed . Formal Theory And Practice of Objects Systems, 4(1). 1998.
Models and Semantics. 1990. pp. 365–458. pp. 27–50.

[1] Tom Mur phy VII. "Bad ness 0 (Knuth's version) ". [11] Hugo Tou vron, Louis Mar tin, Kevin Stone, Pe-
SIGBOVIK. April 2024. 16 pages . ter Albert, Am jad Alma hairi, Yasmine Babaei, Niko -
lay Bash lykov, Soumy a Batra, Pra jjwal Bhar gav a,
[30] Tom Mur phy VII. "GradIEEEnt half de cent ". Shruti Bhos ale, Dan Bikel, Lukas Blecher, Cris t-
SIGBOVIK. March 2023. pp. 33–56. ian Can ton Ferrer, Moy a Chen, Guillem Cu cu rull,
David Esiobu, Jude Fernan des, Jeremy Fu, Wenyin
[18] Tom Mur phy VII. "Modal Types for Mo bile Fu, Brian Fuller, Cyn thia Gao, Vedanuj Gosw ami,
Code ". Janu ary 2008. Na man Goy al, An thony Hartshorn, Saghar Hos -
seini, Rui Hou, Hakan Inan, Marcin Kar das, Viktor
[13] Tom Mur phy VII. "NaN gates and flip FLOPS". Kerkez, Ma dian Khabsa, Isabel Kloumann, Artem
SIGBOVIK. April 2019. Korenev, Punit Singh Koura, Marie-Anne Lachaux,
Thibaut Lavril, Jeny a Lee, Diana Liskovich, Ying -
[10] Tom Mur phy VII. "The First Level of Super hai Lu, Yun ing Mao, Xavier Mar tinet, Todor Mi-
Mario Bros. is Easy with Lexicographic Or der ings hay lov, Pushkar Mishra, Igor Moly bog, Yixin
and Time Trav el. After that it gets a little tricky ". Nie, An drew Poul ton, Jeremy Reizen stein, Rashi
SIGBOVIK. April 2013. pp. 112–133. Rungta, Kaly an Saladi, Alan Schelten, Ruan Silva,
Eric Michael Smith, Ran jan Sub ra man ian, Xiao -
[20] Tom Mur phy VII. "The Wizard of TILT: Effi- qing Ellen Tan, Binh Tang, Ross Taylor, Ad ina
cient(?), Con venient and Abstract Type Rep resen ta - Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan,
tions ". Carnegie Mellon tech report CMU-CS-02-120 . Iliyan Zarov, Yuchen Zhang, An gela Fan, Melanie
March 2002. Kam badur, Sha ran Narang, Aurelien Rodriguez,
Robert Stojnic, Sergey Edunov, Thomas Scialom .
[29] Tom Mur phy VII. "The glEnd() of Zelda ". SIG- "Llama 2: Open foun da tion and fine-tuned chat
BOVIK. April 2016. pp. 105–112. mod els". ArXiv.org. July 2023.
[14] Tom Mur phy VII. "ZM~~ # PRinty# C with
ABC!". SIGBOVIK. April 2017. pp. 129–148.

[25] http: / / tom7.org / fixed ersys / . Tom Mur phy VII.


"The Fixed erSys font fam ily". 2024.

[2] https: / / en.wikipedia.org / wiki /


Wikipedia:W ikiProject_Punctuation . Tom Mur phy
VII. "WikiPro ject Punc tu ation ". July 2007.

[19] https: / / sourceforge.net / p / tom7misc / svn / HEAD /


tree / trunk / rephrase / . Tom Mur phy VII. "BoVeX
source code ". 2024.

You might also like