Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

hours, from 40 to 32, this sen ator says,

Bad ness 0 Sen. Bill Cassidy of Louisiana, representing the


(Knuth’s version) ______ party, said paying workers the same
wages for fewer hours would force employers
Dr. Tom Mur phy VII, Ph.D. to pass the cost of hiring more workers along to
consumers.
March 2024
“It would threaten millions of small businesses
operating on a razor-thin margin because they’re
It has become clear to me that many peo ple w alk unable to find enough workers," said Cassidy.
this Earth com pletely un both ered by incor rect de - "Now they’ve got the same workers, but only for
tails. For exam ple, they are un con cerned when a three-quarters of the time. And they have to hire
hy per link includes a sur round ing space char ac- more.”
ter. It doesn’t up set them when the screw heads
on a light switch w all plate are not all lined up. Actu ally, that’s not exactly the quote, but I needed
They didn’t no tice that the rules of Wordle’s “hard to make it look nice. [3] And this is not a pa per about
mode” are sim ply wrong. They care as much as the pol itics, but let’s just say you can guess what word
phone’s au tocor rect (none) about the dif ference be- goes in the blank.
tw een “its” and “it’s.” When some one asks, “W ill
you marry me?” and they think “Oh my god!” it’s Any w ay, OKAY, first of all, ra zors famously hav e
not because the pro poser prob ably should hav e high mar gins. It’s like the worst pos sible metaphor
used the sub junc tiv e would. here.

I am... not like this. If a char acter in a TV com mer - For an other thing: This guy mixes fancy ty po -
cial is han dling a coffee cup but I can infer from graphic quotes and ASCII ones.
its mo ment of inertia that the cup does not con -
But the main thing I w ant to talk about is: What?
tain any liquid, I im me di ately lose sus pen sion of
No! 32/40 is four fifths, not three quar ters. This is
dis belief and will not pur chase the prod uct fea-
not, like, com pli cated math. It uses some of the
tured in the com mer cial. I literally pro jectile vomit
world’s small est integers. Every body knows that
if Auto-Motion Plus is en abled on a tele vision
the work w eek is 40 hours, and that a work day
in the ho tel I’m stay ing in, even if the TV is not
is 8 hours, and that the pro posed bill reduces it
turned on, or if some one mis uses the word “lit er-
by one day, giv ing four of five days. I don’t really
ally.” If I see a para graph miss ing a pe riod at its
mind if some one makes an error in calcu lation
end on Wikipedia, I will spend dozens of hours
(w ell, I do mind, but I am certainly prone to do ing
writ ing soft w are to or ga nize and semi-automate
it). The infu riat ing realiza tion here is that this per -
a dis trib uted effort to fix all the miss ing pe riods
son does not even think of “three-quarters” as a
on Wikipedia. [2] And worse, each time I learn of a
kind of thing that can be right or wrong. He says
new type of mis take, I am for ever cursed to no tice
three quar ters because it makes smaller num ber
that mis take
feelings. You could imag ine him hav ing the con ver-
Seriously: One time I found my self spell-correcting sation (with me, per haps): “You say four-fifths, I
some one else’s lorem ipsum text in a slide. It said say three-quarters.” Me: “But it is four fifths. And
“lorem ep som,” which is funny. I think about that why are you alw ays hy phen at ing it?” Him (smil -
incident all the time. The per son that wrote the ing pa tron izingly): “I guess w e just hav e to agree
slide prob ably thinks about things like lev erag ing to dis agree.”
syn ergy, gen erativ e AI, meta verses, blockchain 3.0,
The op po site of this per son is the hero called Don -
snack able con tent, being eco-green, and so on, with -
ald Knuth.
out it occur ring to him that these things could hav e
nu ance and mean ing sep arate from their names. I’m not say ing that Don ald Knuth isn’t suc cess ful
He has prob ably nev er even read the Wikipedia ar - and rich. Accord ing to the w ebsite “Fa mous Birth -
ticle on Lorem Ip sum. He is suc cess ful and rich. days,” [4] which is prob ably gen erated by AI or at
least by peo ple whose economic out put is mea -
An other suc cess ful per son is the con gressper son
sured in a count of words, and words whose value
Bill Cas sidy. Crit icizing a pro posed bill that would
is com puted by their abil ity to driv e ad clicks, Don -
reduce the stan dard work w eek in the US by 8
ald Knuth is “is one of the most pop u lar and rich - been invented, includ ing by his own hand, and so
est Math emati cian who w as born on Janu ary 10, he needed to rework MIX for the next vol ume, and
1938 in Wiscon sin, Wiscon sin, United States. Math - up date the first. The revised plan of eight vol umes
emati cian and en gineer who w as ar guably most remains the inten tion in 2024. How ever, he found
recog nized as the Pro fessor Emer itus at Stan ford in that the vol umes w ere get ting rather long, and be-
Palo Alto, Cal ifor nia.” As one of the rich est Math - gan releas ing por tions of vol umes (“fas cicles”).
emati cian from United States, accord ing to the So far, Volume 4 has been par tially pub lished as
analy sis of Famous Birth days, Wikipedia, Forbes books 4A [5] (fascicles 0–4; 912 pages) and 4B[6] (fas-
& Business Insider, “Don ald Knuth’s net worth cicles 5–6; 736 pages). It is un known how many
$3--5 Million. *” more episodes remain in Volume 4. I expect that
every con versation that Knuth has with his ed itor
I sup pose it is ar guable that he is the Pro fessor goes like this. Ed itor: “Hey, Don ald, I hope you’ re
Emer itus. And it is very likely true that he is w ell. Just won der ing if you hav e an up date on
the only pop u lar and rich math emati cian born when 4C will be ready? Or any more icicles?” Don -
on that spe cific day in Wiscon sin, mak ing the ald E. Knuth: “I am work ing dili gently on fascicles
sin gu lar “Math emati cian” per haps a tech nical for Volume 4C. As I’ve men tioned in the past, it’s
master-stroke. But more likely this is just an amus - im pos sible to tell how long it will be, since math e-
ingly dense series of im pre cisions. The aster isk of mat ics does not obey the rules of project man age -
course does not hav e any referent on the page. ment.” Ed itor: “I just need a date to tell the pub lish -
ers.” Don ald E. Knuth: “Like I’ve said, any date
What I mean when I say that Don ald Knuth is the would be very low con fidence, other than the fact
op po site of this per son is that Knuth is inter ested that it will be in the fu ture.” Ed itor: “I just need a
in un pack ing a sin gle un nec essary de tail, recur - date.” Don ald E. Knuth: “Would you like me to
siv ely, un til it is com pletely solv ed. Accord ing to say a date, know ing that it’s a very low con fidence
the w ebsite Famous Bibliophiles, one day Don ald guess, and that I would be extremely likely to miss
Knuth set out to write down the en tire sub ject of that date, or even de liver early?” Ed itor: “Early!
com puter science in a sin gle book called The Art Now w e’re talk ing.” Don ald E. Knuth: “What use
of Com puter Pro gram ming. As he w as do ing so, is the date if you’ re excited about the pos sibility
he realized that de scrib ing com puter algo rithms in of it being early, rel ativ e to some un known date?”
a last ing form would require a pro gram ming lan - Ed itor: “I just need a date for the pub lish ers.” Don -
guage that w as not sub ject to con stant revision, so ald E. Knuth: “2030.” Ed itor: “Thanks Don ald,
he invented the MIX instruc tion set for an ide alized you’ re the best!”
com puter. After writ ing some 3000 pages out in
long hand, he found that it w as im prac tical to print Volume 5 is estimated to be ready in 2030, when
them all in one book, so the plan expanded to be Knuth will be 92.
mul tiple vol umes. Then when he got a draft of one
of the books back from the type set ter, he w as un - That’s a large amount of lan guage!
happy with the de tails of the ty pog ra phy, and so
he paused his work writ ing down all of com puter Night mare on LLM street
science to create some new com puter science: First
an algo rithm for de ter min ing where to place line Then w e hav e Large Lan guage Mod els. [7] One of
breaks in or der to make text op timally beau tiful, the irritat ing things about LLMs is that they are so
then algo rithms for hy phen at ing words, then gen - buzz wordy, but un like most buzz wordy trends,
eraliza tions of these for type set ting math emat ics, they are actu ally sub stan tiv e. They pro duce re-
and then a full com puter type set ting sys tem that is mark ably flu ent text. With no ad di tional train ing
still in wide use today, called TeX. Along the w ay they frequently beat purpose-built mod els that
he w as un sat isfied with the spe cific type faces that hav e been in de velop ment for decades. They gen -
existed in the world, and un sat isfied with the w ay eralize to com pletely new situ ations.
that type faces w ere de scribed at only one w eight,
and so he created the pa ra me ter ized METAFONT So many things about “AI” dis tress me. Dolor sit
sys tem and sev eral new type faces. Un de terred by amet! I worry about the de valu ation of hu man cre-
these excur sions, he returned to his orig inal task ativ ity, about large-scale dis infor ma tion and spam
of writ ing down the en tirety of com puter science, ru ining the beau tiful library of know ledge that hu -
us ing all the tech nol ogy he had built. By the time mans hav e created, about extreme con cen tra tion
he finished this, much more com puter science had of w ealth. And yes, I worry about com pet ing with
AI. Being able to work tire lessly and thou sands of hav e some thing that w e w ant: Perfect ty pog ra phy?
times faster than hu mans is a huge com pet itiv e ad - This pa per is about a new type set ting sys tem, Bo-
van tage. Of course, I find some solace in the sig nif- VeX, which allows for the con trolled exchange of
icant pos sible up sides. It might help us solv e hard pre cision for beauty. It essen tially giv es us a dial
prob lems like climate change and AI. But even in betw een Lorem Ep som and Don ald Knuth. To il-
the best scenar ios w e will not be able to ignore it: lus trate, w e’ll first look at a sim pler case by inspect -
Even if it nev er gets as smart and pre cise as Knuth, ing one of my other inter ests: Super Metroid.
it’s already too econom ically use ful in its Lorem
Ep som state (just like Lorem Ep som him self). The scientists’ findings were astounding! They
discovered that the powers of the Metroid might
On the other hand, the tech nol ogy is pretty neat be harnessed for the good of civilization!
and lends itself to some nice ab strac tions. I lov e
play ing with words. So one of my side quests is Metroid is a video game series about a brain that
to mas ticate this whole scenario by exper iment - has been en slav ed inside a jar in an un der ground
ing with LLMs in prac tical and im prac tical ap pli - dat acen ter on the planet Zebes. This brain is called
cations, and to try to make it fun (for me) to pro - Mother Brain and its goal is to con trol the hy per cap i-
gram with them. tal ists called Space Pirates to increase their “score”
as high as pos sible by con quer ing plan ets through -
Many things irritate me, so this is some thing I out the galaxy. Mother Brain w as invented by the
hav e am ple expe rience with. I hav e a myr iad of Space Pirates, although it is not clear whether the
strate gies for di ges tion of them. For this work I’m cur rent situ ation w as actu ally intended by the
inspired by the “Hurry-Cow ard So-so-morphism,” Space Pirates. The most su per version of Metroid
where I make con nec tions betw een top ics based is Super Metroid.
solely on con fu sion of su per ficial lexical sim ilar i-
ties with out regard to their un der lying mean ing. In the 1990s the w ebsite gamefaqs.com collected
So for exam ple w e hav e “ML ” mean ing both “Ma - plain text “FAQs” for clas sic video games, then
chine Learn ing” and “Meta Lan guage”, as w ell as just known as video games. On this site an other
“type” both as in “type face” and as in “type sys - hero w as born. They w ere writ ing the de finitiv e
tems for pro gram ming lan guages.” [8] And because guide to speedrun ning the SNES game Super
ma chine learn ing has claimed so many words, Metroid when they saw that some of their ASCII
there are a great many shared with ty pog ra phy lines ended up exactly the same length, and that it
as w ell: looked good:
Once you save the game at your ship (about 1 hour 15 minutes is good), go
+----------+ down to Tourian. Do not save your game in Tourian if you have intentions of
returning to any previously explored section on Planet Zebes. There will be
|typography| a few Metroids to kill before you reach Mother Brain, and they must all die
+----------+ in order to continue to Mother Brain. Read the boss guide for more details.
Once Mother Brain is defeated, you will need to hurry back to your ship. By
/ \ “baseline” now you will already have the HYPER BEAM. From Mother Brain’s room, go west
“fixed point” / \ “floating point” and then south. Take the blue door at the bottom and speed dash east. Super
jump up, and continue north. Once you land up top and are running east, aim
/ \ “weight” “vector” diagonally down to the right and shoot an unseen door. Eventually, you will
get to this door since lava will start to rise from the floor in this area.
“type” / \ “descent” Speed dash through the door you preopened, and charge for a super jump. Hug
/ \ “kerning trick” either the left or right wall in the Craterian shaft and super jump up. Now
quickly get to your ship before the planet explodes. There should be almost
/ \ “dingbats” a minute left on the timer. Sit back and watch the ending! Did you beat the
/ \ “gradient” game within 1 hour and 20 minutes?

+-----------+ +--------+
| functional|---------|machine | and so they wisely de cided to word smith the en -
|programming| |learning|
+-----------+ “ML” +--------+ tire 28-page guide so that every line w as exactly
“lambda” “generalization” the same length, with no extra spaces or other
“parameter” “tensor” cheat ing, just because it could be done. [9]

Doing this man u ally is a chore, and I do like to au -


By no coincidence, I already spent a lengthy in- tomate the chores of Speedrun ners. [10] I got this
tro duc tion talk ing about Don ald Knuth’s work in work ing in an after noon. It’s, like, easy mode. For
com puter ty pog ra phy. So now I can tell you what a para graph of text and a tar get line length, I ask
this pa per is about. If in our near AI fu ture w e are the LLM to remem ber the para graph and recite it.
giv ing up on pre cision, per haps at least w e can The prompt looks like this:
Exercise in rephrasing text. The following para- Marsh mal low Man stuff. Actu ally I hear that lla-
graph needs to be rephrased so that it retains its mas are pretty mean, and if you are think ing
precise meaning, but with minor variations in the about hug ging a cute long-neck, you are prob a-
specific choice of words, punctuation, and so on. bly think ing about an alpaca. But that’s prob ably
No new facts should be introduced or removed, a version of the lin ear algebra pack age LAPACK.
but it is good to use synonyms and change the Llama-v2-70b is a good LLM which can do some
word order and phrasing. im pres siv e things, but when I say de stroy the
After this I insert some thing like Orig i nal text: fol- world I mean stuff like filling the inter net with in-
low ed by the orig inal para graph, then Rephrased finite spam, or build ing crit ical infra struc ture on
text: . The model is ready to gen erate tokens. it in or der to cut costs, where most of our “safety”
mea sures con sist of ask ing the model po litely to
I then sam ple text a word at a time to con tinue this recite its daily affirma tions before per form ing its
prompt. If a line ends exactly on the num ber of tasks. That kind of thing. It’ll be at least months be-
char acters that I w ant (and the next char acter is a fore w e really hav e to worry.
space or other char acter that is ap pro pri ate to end
a line) then I accept the stream so far and con tinue. Any w ay, the nor mal w ay to pro gram with Llama
If I exceed the line length, I back up to the state at is to use Python, and a moun tain of things that you
the begin ning of the line and try again with new are not sup posed to un der stand and can not un der -
ran dom sam ples. I just keep do ing that un til the stand, mostly by past ing exam ples from oth ers and
para graph is com plete, and w e hav e beau tifully then tw eak ing pa ra me ters and prompts. I don’t
justified mono space text that resem bles the orig i- care for it. For tu nately, hu man geniuses [12] hav e im -
nal. Here is an exam ple of this para graph ren dered ple mented the inference code for llama-like mod els
in mono space: in a nice, portable C++ library called “llama.cpp”
(checks out).
I sample text a word at a time to continue
this prompt. If a line ends exactly on the With llama.cpp, I can load a quan tized version of
number of characters I want, I accept that the model into RAM. Actu ally there are two dif -
text so far, and continue. If I exceed the ferent mod els, the 7b and the 70b, referring to the
line length, I back up to the beginning of num ber of billions of pa ra me ters, which must be a
the line and try again with new samples. I mul tiple of VII “for per for mance rea sons.” The pa -
keep repeating this until I get text I can ra me ters are the w eights on the lay ers of the net -
render in monospaced font, and that is how work. At na tiv e 16-bit floats, the 70b model will fit
we can get beautifully justified monospace in about 130 GB of RAM, just slightly more than
text. Here is an example of this paragraph a nice round 128 GB, mak ing one won der what
rendered in monospace: per for mance rea sons they had in mind. But any -
w ay, ear lier this year I (phys ically) broke my com -
You could ar gue that this is im prov ed, even, by
puter try ing to put the world’s (phys ically) largest
mak ing the text shorter. It does use “mono space”
video card into it, the GeForce 4090, and so I en -
and “mono spaced” incon sistently. The most up -
dow ed the replace ment com puter with 256 GB of
set ting thing here is that it ends with a colon like
RAM. If you are ever look ing at spec ifications for a
there’s go ing to be an other exam ple of the para -
high-end desk top com puter, by the w ay, and won -
graph, but that’s what I asked it to do.
der ing “who the heck buys these things and what
The ap proach de scribed works rea son ably w ell, do they do with them?” one an sw er is “me,” and
but it has sev eral de ficien cies (such as: it only took the other an sw er is “this.”
an after noon) that w e’ll ad dress for the real BoVeX
Quan tiza tion means us ing few er bits to rep resent
sys tem. But it is a good exam ple to explain some
the float ing point w eights. [13] This sav es mem ory,
con cepts that will be use ful later.
but it also speeds up inference, which needs to
read pretty much the en tire model for every pre -
¿Como te LLama? dicted token. I got rea son able qual ity and good
per for mance from LLama-v2-7b with 16-bit floats.
Llama is Facebook’s Large Lan guage This one fits com pletely on my world’s (phys ically)
Model, [11] which they nicely share with any one largest GPU. In or der to tune various set tings, I
who agrees not to use it to de stroy the world. ran thou sands of tri als for the dif ferent mod els,
Wouldn’t it be funny if the world is de stroy ed by and made some nice cus tom graphs:
some thing called “Llama”? That’s some Stay-Puft
with the thou sands of other tokens follow ing. So
three-quarters of the time the next token should
be " annual" but there are many other rea son able
pos sibilities. We can pick one of these tokens how -
ever w e like, ap pend it to the sequence, and run
the model again. This giv es us a new prob ability
dis tri bu tion. By do ing this ov er and ov er w e can
gen erate a likely piece of text. This is what Lorem
Ep som means when he says “Gen erativ e AI.”
Rather, what he means is “the new thing that is
cool,” but what he is un know ingly referring to is
that you can sam ple a prob ability dis tri bu tion. He
has prob ably nev er even read the wikipedia ar ticle
Tun ing results for Llama-v2-7b with 16-bit floats. on Markov chains.
The x axis is the num ber of CPU threads and the
y axis is the num ber of model lay ers that hav e If I alw ays sam ple the most likely token, I alw ays
been loaded onto the GPU. As expected, increas - get the most likely text. It is good to be likely; this
ing the num ber of threads and lay ers on the GPU is why the model is use ful. How ever, you might
im prov es per for mance, since this whole model not w ant exactly the same result each time, and in
fits on the GPU. For the 70b mod els (not pic tured) many situ ations if you only sam ple the most likely
there is an abrupt drop-off in through put before token, you get very bor ing, repet itiv e text. Pseudo -
w e load all the lay ers, and also my com puter gets ran dom num ber gen eration is the spice of life!
very slug gish if I exceed the GPU mem ory. We see
that if w e use more than the num ber of phys ical We also need not use the prob ability dis tri bu tion
cores (32) w e do not see any ben efit, which is not to sam ple at all. We can just pick the token that w e
sur pris ing because hy per thread ing ba sically nev er w ant. This is how the ini tial “prompt” works; w e
helps any thing. The best through put actu ally uses just run the inference process one token at a time
a mod est num ber of cores (about 12). Mostly I’m but alw ays select the next token in the prompt, ig-
just includ ing the graph to demon strate that Bo- nor ing the prob abilities. So at each mo ment, the
VeX has sup port for includ ing PNG files. text w e’ve gen erated so far (more or less) com -
pletely char acter izes the state of the LLM. This
Where w as I? Right. Fun da men tally, LLMs are means that w e can eas ily go back to ear lier mo -
trained to pre dict a token (like a word or part of a ments and sam ple a dif ferent con tin u ation of the
word) giv en some sequence of tokens that pre cedes text, by just replay ing tokens. We also hav e the op -
them. There’s a fixed set of tokens for the model, tion of stor ing the LLM state (giga bytes) in RAM,
and rather than pre dict a sin gle token, they actu - which allows us to return to a pre vious state in
ally giv e a score for every pos sible token. These con stant time.
scores are typ ically nor mal ized into a prob ability
dis tri bu tion. So for exam ple if w e hav e the text For gen erat ing mono spaced lines of the same
length, I use a prompt that asks the model to
SIGBOVIK is an rephrase the input para graph. Here, greed ily sam -
pling the dis tri bu tion typ ically results in a copy
then the prob ability dis tri bu tion (Llama-v2-70b) of the input para graph, which is fine for our pur -
begins as poses. (If the lines already hap pen to be the right
length, w e need not change them!) But when a line
( annual) 69.8010% comes out the wrong length, I w ant to try again. So
( April) 3.8023% I sav e the model state when ever I begin a line. To
( ac) 3.2456% pro duce a vari ant of the line, I sam ple tokens pro -
( academic) 2.9374% por tional to the prob ability dis tri bu tion. When the
( artificial) 2.0857% set of prob able lines is small (this is com mon), the
( open) 1.7993% process will keep gen erat ing the same lines and
( under) 1.2331%
( international) 1.1032% failing because they are not the right length. To pre -
... vent bore dom, when ever the process repeats a line
that’s already been seen, I increase the “tem per a-
ture” mod ifier to the prob ability dis tri bu tion. This and stretchy “glue” (say, space) betw een them.
is an expo nen tial factor that (when higher ) flat tens Both boxes and glue hav e various de tail (and can
out the prob ability dis tri bu tion, mak ing pre viously be extended to sup port all sorts of quirks) but the
un likely tokens more likely. You can think of this ba sic algo rithm can be un der stood with just those
like the model get ting a little hot-headed as it frus - pieces. So, let’s do that.
trat edly does the same thing ov er and ov er. This
causes the can di date lines to be more varied, but Knuth’s pa per (as usual) is great, but I started hav -
less prob able (accord ing to the orig inal prob ability ing spoiler feelings when read ing it, so I figured
dis tri bu tion). I can reset the tem per ature when it out my own algo rithm, which is more fun than
suc ceeds, since w e pre fer to hav e more likely lines. read ing. No doubt the key insight is the same:
Although there are expo nen tially many pos si-
This is all there is to the monospac ing version. It’s ble break points, you do not need to try them all.
just 300 lines of code, includ ing boil erplate and When ever w e break after a word, the prob lem is
commented-out de bug ging code and false starts. now the same for the rest of the text (fit the rest of
the text op timally onto lines, start ing at the begin -
Great !! You fulfiled your mission. It will revive ning of a line) no mat ter how w e got there. This
peace in space. But,it may be invaded by the lends itself to a dy namic pro gram ming algo rithm.
other Metroid. Pray for a true peace in space!
Dy namic pro gram ming is a pro gram ming tech -
Now, look ing at the out put text, w e feel sat isfied nique for white board inter view prob lems at tech
that every thing lines up exactly. How ever, w e can’t com pa nies. I found it mys terious when I w as
help but feel un sat isfied at the same time: Now young, per haps because of its strange name. Here
w e’re look ing at a mono spaced font. Good for pro - is how I think about it. Imag ine you hav e a recur -
gram ming. Bad for pub lish ing. Can w e instead siv e pro cedure that solv es the prob lem. In this
hav e excellent justified text with all the perks of case, the pseudocode is some thing like
pro por tional fonts and a pro gram ma ble doc u ment
prepa ra tion sys tem? And can w e hav e it by the es- pair<int, string> Split(string line,
timated SIGBOVIK dead line so that it can be used string text) {
to pre pare the pa per that I’m now writ ing? Maybe! if (text.empty()) return {0, ""};
This is the Don ald Knuth Any% speedrun. auto [word, rest] = GetFirstWord(text);
// try splitting
The boxes-and-glue al go rithm auto [penalty1, rest1] = Split(word, rest);
penalty1 += badness from leftover space;
When justifying mono spaced text, it looks quite // try not splitting
bad [14] to insert more than one space betw een auto [penalty2, rest2] =
Split(line + " " + word, rest);
words, so w e hav e a sim ple w ay to tell if text is penalty2 += badness from line too long;
suit able for some width. We just add up the code - if (penalty1 < penalty2) {
points. For the full-on ty pog ra phy case with pro - return {penalty1, word + "\n" + rest1};
por tional fonts, there are many more de grees of } else {
free dom. For one thing, it looks fine to expand or return {penalty2, word + " " + rest2};
con tract the space betw een words a little bit, even }
if it varies from line to line. It is also pos sible to }
make fine ad just ments in letter spac ing (kern ing)
to squeeze or air out text. We can also hy phen ate Split takes the line so far and the text that re-
words. mains to be split. In the nor mal case that there is a
word left, it will try two pos sibilities: Either split -
Around the time I w as being born, and prob ably ting after the first word, or not split ting. This is ex-
being very up set about it, Knuth w as hav ing sim - po nen tial time because each call makes two recur -
ilar feelings about the w ay his computer-typeset siv e calls, to try each of the two op tions. But deep
doc u ments looked. He dis cov ered a nice ab strac - recur siv e calls will be made with the same ar gu -
tion that gen eralizes most of these ty po graphic de - ments many times. So, add some mem oiza tion: If
grees of free dom, and de vised an algo rithm for the func tion is called for the same line and text a
pro duc ing op timal text lay out giv en some pa ra - second time, just return the same an sw er as before
me ters. [15] The idea is to con sider the text of a para - with out do ing any work (espe cially not mak ing re-
graph as con sist ing of rigid “boxes” (say, words) cur siv e calls again). This lim its the func tion to be
called at most once for each pos sible ar gu ment; w e datatype (a) option = SOME of a | NONE
can then see that line is no longer than the input fun consume-outer-span f s =
(so it is size O(n)) and text is alw ays some suf fix case layoutcase s of
of the input (so it is size O(n)), giv ing O(n 2) calls. Node (SPAN, attrs, children) =>
let
val (ropt, rchildren) =
Dy namic pro gram ming is just mem oiza tion case children of
inside-out: We create the values for all the recur - one :: nil => consume-outer-span f one
siv e calls before w e will need them, store them in a | _ => (NONE, layout-concat children)
in
ta ble, and then look them up. For this prob lem, the case (f attrs, ropt) of
ta ble is indexed by the two pa ra me ters, the cur rent (NONE, _) => (ropt, span attrs rchildren)
line and the remain ing text. Note that these two | (SOME vouter, inner as SOME _) =>
can be rep resented as integers. The line is just the (inner, rchildren)
| (outer, NONE) => (outer, rchildren)
num ber of words before the cur rent word that are end
included on the line, and the text is just the po si- | _ => (NONE, s)
tion in the string where w e’ll next look for a word.
That’s all there is to it; the base cases of empty text You don’t need to un der stand it. I just w ant to
are used to start the ta ble, and then you just write show you that it is a full-fledged pro gram ming
the loop to fill out cells in the right or der. lan guage. It sup ports higher or der func tions, poly -
mor phism, algebraic datatypes, pat tern match ing,
Knuth’s boxes-and-glue algo rithm con tains many Hindley-Milner type inference, and so on. It is ba si-
exten sions, and so does mine. For exam ple, later cally core (no mod ules) Stan dard ML, [17] although
w e’ll talk about how you can adapt the algo rithm I left out some w arts (op erator ov erload ing, eq-
to per form hy phen ation and kern ing. There are types, ab stype, non-uniform datatypes and poly -
many rab bit holes to go down, and I explored the mor phic recur sion) and added some new w arts.
ones that at tracted my at ten tion. There is plenty For exam ple, as allow pat terns on both sides, since
of time to add more features later, since of course Stan dard ML has alw ays seemed back w ards to
I hav e now cursed my self to use BoVeX for my fu - me and it works per fectly fine to just make it sym -
ture SIGBOVIK pa pers. met ric. Any w ay, a full de scrip tion of the lan guage
would be bor ing and take too much time as the
But here’s where I di verge from Knuth some what. SIGBOVIK dead line draws closer.
Knuth w as reluc tant to add a pro gram ming lan -
guage to TeX,[16] but I spent the ma jority of my Im ple men ta tion
time on this project im ple ment ing a full-fledged
lan guage. BoVeX is about 33,000 lines of code, the I hav e im ple mented many sim ilar lan guages in
ma jority of which is the im ple men ta tion of the the past, includ ing for my dis ser ta tion. [18] It would
lan guage itself. That’s 110× as long as the orig inal hav e been expe di ent to start from one of my exist-
mono space proof of con cept, and 30× the length of ing im ple men ta tions, but they are mostly writ ten
this doc u ment! in Stan dard ML and I couldn’t get MLton to work
on my Windows com puter in 2024. So I started
The BoVeX lan guage ov er from scratch in C++, which at least does work
on my com puter. (I also w ant to be able to inter -
This section de scribes the BoVeX pro gram ming lan - face with GPU inference code for run ning the
guage and its im ple men ta tion. If you are just in it LLM, which will be eas iest from C++). C++ is not a
for the jokes, you can skip this section, which is ba - good lan guage for writ ing lan guage im ple men ta -
sically serious and loaded with pro gram ming lan - tions, but it has got ten bet ter.
guage the ory jargon.
The BoVeX im ple men ta tion is a “com piler”
BoVeX is a typed func tional pro gram ming lan - in the sense that it trans forms the source lan -
guage in the ML fam ily. Its syn tax closely resem - guage through mul tiple inter me di ate lan guages
bles Stan dard ML. Here’s an exam ple piece of code into a low-lev el byte code. This byte code is just
from the source code of this doc u ment: straight-line code on an ab stract ma chine with infi-
nite reg isters and op erations like al loc (allocate a
new “ob ject”) and set field (set a fixed field of the
“ob ject” to a value from a reg ister ). It does not pro -
duce ma chine code, and although this would be
pretty feasible, it would not be the first thing to do or std::variant<> ) with the free dom to change. I
to make BoVeX faster. get the compiler ’s help when ever I change the lan -
guage (which is often!) since each in/out func tion
First it con cate nates the source files (han dling im- is explicit about its con stituents.
port and keep ing track of where each byte orig i-
nated, for error mes sages) and lexes them into to- Passes and guesses . Many trans for ma tions in a
kens. Then it parses those tokens into the Exter - com piler rewrite a lan guage to itself; for exam ple
nal Lan guage (EL), which is just the BoVeX gram - each IL op timiza tion is a func tion from IL to IL.
mar with a few pieces of syn tac tic sugar com piled These can be tedious to write and up date, espe -
aw ay. It does syn tac tic trans for ma tions on the EL cially since a giv en op timiza tion usu ally only cares
AST to remov e some cur ry ing syn tax and trans - about one or two con structs in the lan guage. I use
form nullary datatypes (nil becomes nil of unit ). the “pass” id iom to write these. This is ba sically an
Then it elab orates EL into a sim pler and more ex- iden tity func tion on the AST that pulls apart each
plicit Inter nal Lan guage (IL). Elab ora tion does type node, calls a vir tual func tion for that node, and
inference (Hindley-Milner ) includ ing poly mor phic then rebuilds the node. To write a pass that only
gen eraliza tion and so on, com piles pat tern match - cares about one type of node, you inherit from this
ing into an efficient series of sim pler con structs, class and then just ov erride that one node’s func -
and de com poses heavy w eight stuff (e.g. datatype ) tion. One issue with this is that each time you re-
into its con stituent type-theoretic pieces (e.g. a poly - build the en tire tree you create a lot of un nec es-
mor phic recur siv e sum). The IL is nice and clean, sary node copies. So exchang ing tedium (mine)
so it is a good place to per form op timiza tions. I for efficiency (my computer ’s?), every node type’s
lov e writ ing op timiza tions but I had to keep my - “in” func tion also takes a “guess” node pointer. If
self out of there, or else this would be a 2025 SIG- the node being con structed is exactly equal to the
BOVIK pa per. There are just enough to make the guess, then w e return the guess and avoid creat ing a
code rea son able to de bug if I need to look at it. Af- copy. Then the base pass is actu ally the iden tity (it
ter op timiza tion, I per form closure con version, returns the same pointer ) and does no long-liv ed
sim plify again, and gen erate the final “byte code” allocations. This seems to be a good com pro mise
form. This en tire process hap pens when ever you betw een the tra di tional garbage-fountain ap proach
gen erate a BoVeX doc u ment; the only out put from and hash con sing, which sounds like it would be
run ning bovex.exe is the PDF doc u ment. a good idea but is usu ally just a lot slow er. [20] For
type-directed trans for ma tions, there is also a typed
I w ant you to know that I did not cut cor ners on IL pass class, which recur siv ely passes a con text
the lan guage im ple men ta tion. For exam ple, com pil - and does bidi rectional type check ing of the inter me -
ing mutually-recusiv e poly mor phic func tions is re- di ate code. Clo sure con version is a type-directed
ally obnox ious (AFAIK it requires either monomor - pass and is im ple mented this w ay.
phiza tion or first-class poly mor phism when you
do closure con version) but I did do it, even though Pars ing . I hav e this av ersion to parser gen erators,
none of the BoVeX code I used for this pa per ever prob ably because one time I tried to get some one
needed this feature. Follow ing are some of the im - else’s code to com pile and it com plained about hav -
ple men ta tion de tails; for the full story you’ ll need ing the wrong bovines on my com puter and ru ined
to check the source code. [19] my w eek end. After try ing some other people’s C++
pars ing libraries and being dis ap pointed by them,
AST pools . One of the main things I need to do is I did what Knuth would do: I wrote my own. It is
create tree-structured data to rep resent the ab stract a parser com bina tor [21] library which actu ally de -
syn tax tree of the various lan guages involv ed. scends di rectly from Okasaki’s SML code. [22] I w as
This is very nice in ML (it is what the datatype de - proud of my self for get ting this to work in C++,
clara tion is for ) and an noy ing in C++. I con tin ued since C++’s insane type sys tem is im pos sible to un -
to exper iment with dif ferent w ays to do this. I use der stand and its error mes sages are even worse.
arena-sty le allocation for the syn tax nodes (alw ays (BoVeX’s error mes sages are extremely spar tan, of-
const after creation), so that they can be created ten sim ply de clar ing Parse er ror at paper.bovex
and reused at will. My cur rent favorite ap proach to line 1, but in many w ays this is more use ful than
ma nip u lat ing the nodes is to write “in” and “out” C++’s mile-long SFINAE vom itus.) It sup ports
func tions (tedious, man ual) for each con struct in mutually-recursiv e parsers, res olu tion of dy namic
the lan guage. The syn tax nodes can then be im ple - infix op erators, and all that. My template-heavy
mented how ever I like (for exam ple, a flat struct parser com bina tors take clang about a minute to
com pile, which is accept able. Less accept able, but object O of { field1 : type1, field2 : type2 }
some thing I only learned after us ing this to write
a 16–page-long pa per, is that the parsers are very and then use this in an expres sion like {(O) field1
slow. Putting aside LLM inference, this pa per = exp1}. These object names do not hav e any
takes 13 seconds to ren der into a PDF, 11 seconds run-time mean ing; they are just a collection of field
of which is pars ing! There must be some bug, but I types that are com monly used together. It giv es a
don’t know if it’s in my gram mar (it is easy to acci- good place to doc u ment what they mean and some
den tally write an expo nen tial time parser, but this op por tu nity for bet ter error mes sages, but fun da -
one should not be) or the parser com bina tor library men tally an object is just a collection of named
(also my fault) or clang pro duc ing bad code (it data. Think like “JSON” object. It is pos sible to add
may be giv ing up on op timiza tions, since it is tak - and remov e fields from objects (func tion ally) with
ing so long to com pile; the .o file is 41 megabytes). expres sions like exp1 with (O)field2 = exp2.
But these are de tails to be im prov ed in the fu ture.
There are a few rea sons for objects in BoVeX. One
Garbage col lec tion . Garbage collection is so easy, is the bib liog ra phy for mat, which con sists of de cla-
OMG. I keep track of all the point ers that are allo- ra tions like this
cated dur ing execu tion. Then it is just a mat ter of
val knuth1981breaking =
pe riod ically w alking through the stack and mark - bib-article {(Article)
ing the allocations that are still reach able, then title = "Breaking paragraphs into lines",
delet ing any thing in the heap that isn’t. It’s so author = "Knuth, Donald E. and Plass, Michael F.",
journal = "Software: Practice and Experience",
easy that I didn’t even im ple ment it! I hav e 256 gi- page-start = 1119,
ga bytes of RAM. Even with a 70-billion pa ra me - page-end = 1184,
ter, 128-gigabyte LLM in RAM, there’s still plenty year = 1981,
of space to just keep allocat ing. In fact, LLM infer- month = NOVEMBER,
publisher = "Wiley Online Library",
ence acts as a use ful “per for mance reg u lator” to }
make sure that w e don’t allocate mem ory too fast.

Ob jects where each de clares a reference made up of a


bunch of op tional fields. It is just too irritat ing to
As the SIGBOVIK dead line grew near, I reluc tantly make each one explic itly op tional, and since the
added “ob jects” to the BoVeX lan guage. Ob jects data hav e het ero geneous types, ma nip u lat ing some
are no stranger to ML; for exam ple the O’Caml Lan - string-indexed data struc ture would hav e worse
guage [23] (pro nounced “OK ML”) has them. [24] But sta tic check ing and be more syn tac tically cum ber -
the com mu nity of func tional pro gram mers I w as some. The bib liog ra phy ren der ing code case an a-
raised in has a revul sion to things Ob ject Ori ented, lyzes ov er the pres ence of fields to ren der cita tions
just like how a wood worker will im me di ately pro - that hav e dif ferent sub sets of data.
jectile vomit if they see a piece of Ori ented Strand
Board, even though it is a fine tool for many ap pli - An other use is in the lay out type. This is a prim -
cations. I still hav e this dis gust reflex. I imag ine my itiv e type that most of a document’s text is writ -
Ph.D. ad visors, should they read this, are con tem - ten in. It is a tree struc ture with op tional at trib utes
plat ing whether and how a Ph.D. can be revoked. on each node, which are rep resented with an ob-
Any w ay, I de liberately kept objects low-tech so ject. For exam ple, this para graph is writ ten in the
that noth ing could get too Ori ented. paper.bovex source file as:

There is one object type obj in BoVeX. A value of Another use is in the [tt[layout]] type.
this type has an ar bitrary set of named fields whose This is a primitive type that most of a
types are known; they can only be the base types document’s text is written in. It is a
int , float , string , bool , lay out , or obj . Fields are tree structure with optional attributes
on each node, which are represented with
dis tinct if they hav e dif ferent types. An object can an object. For example, this paragraph is
be intro duced with an expres sion like {() field1 = written in the [tt[paper.bovex]] source
exp1, field2 = exp2}, pro vided that each field’s file as:
type can be syn the sized from the expres sion itself
(in the bidi rectional type-checking sense). Alter na - The square brack ets are used to write a lay out lit-
tiv ely, the pro gram can de clare an object name O: eral (the main body of the doc u ment is inside one
large literal). Lay out literals can also em bed expres -
sions (of type lay out) with nested square brack ets. this w ay as w ell.
Here the func tion tt is ap plied to a lay out literal
that con tains text like paper.bovex . The tt func - Ty po graphic fea tures
tion just adds the font-family at tribute with value
"Fixed er SysLight" to the lay out node. This is a BoVeX offers the pack-boxes algo rithm, which can
cus tom mono spaced bitmap font that I made for be used to nicely justify text. It can also be used
this pa per us ing soft w are I wrote. It is part of th to dis trib ute para graphs into columns, by think -
Fixed erSys fam ily. [25] Func tions like b and it ap - ing of the para graphs as “words” (accept able to
ply bold and italic text sty les, but func tions can do break at any line, but bad to break near the start
any thing that you can do in a general-purpose pro - or end of a para graph) and the columns as “lines.”
gram ming lan guage. It could be used by the doc u ment au thor for other
pur poses, I guess. There are other ty po graphic fea-
Pri mops tures av ailable.

The other thing that objects are used for is inter - Most of the lay out of the doc u ment itself is by Bo-
facing with the run time that is execut ing the Bo- VeX code, which is either part of the stan dard li-
VeX byte code. There are about 50 dif ferent builtin brary or part of your doc u ment, de pend ing on how
pri mops that can be used by the BoVeX pro gram. am bitious you feel. The func tion main-text parses
This includes sim ple things like integer and float - the doc u ment lay out into para graphs and remov es
ing point ad di tion, but also heavy w eight op era- white space that is not really part of the text. It nor -
tions like “load and reg ister this collection of True - mal izes text prop erties across those para graphs
Type font files as a font fam ily” or “in voke the so that they can be ma nip u lated indi vid u ally. For
boxes-and-glue pack ing algo rithm with these pa ra - each para graph it uses the built-in get-boxes to
me ters.” The pri mops in the for mer cat egory work break the words into fixed-size boxes with ap pro -
nat u rally on sim ple base types, but the heavy - pri ate glue and hy phen ation (see the next two sec-
w eight ones need to be able to pass com pli cated tions), and then uses the pack-boxes rou tine to op ti-
tree-structured het ero geneous data betw een the Bo- mize their lay out. The height of result ing lines are
VeX byte code execu tor and the run time. It would mea sured, and spaced accord ing to the line spac -
be pos sible for the run time to con sume and create ing, then packed into columns. Once their final
BoVeX values like tu ples and lists, but this has two place ment is known, boxes become stick ers, which
prob lems: One, many types like list are de clared as are size less elements that only know their po sition
user code (in the BoVeX stan dard library); they are and con tents. In this w ay, the BoVeX ren der ing
not spe cial, and w e don’t w ant to make them spe - pipeline is itself a bit like a com piler: It trans forms
cial by inform ing the run time of them. Two, requir - programmer-written source lay out into for mat ted
ing spe cific rep resen ta tions at the run time bound - para graphs, then into boxes of known size, then
ary inhibits op timiza tion; for exam ple w e can nor - into stick ers of known po sition. At the end, it out -
mally an alyze the whole pro gram to flat ten data puts the doc u ment as a PDF.
struc tures or remov e record fields that are nev er
used. The run time typ ically uses obj to com mu ni- Any part of the ren der ing process can report
cate struc tured data. “bad ness,” by calling the emit-badness primop.
Nom inally, bad ness is mea sured in square
For exam ple, the internal-pack-boxes prim itiv e points of area that is out side of its con tainer.
runs the boxes-and-glue algo rithm. It takes some Worse situations—such as text ov erlap ping
lay out (which is expected to be a series of box other text—hav e their bad ness scaled up per the
nodes, with at trib utes giv ing their size, glue prop - same area of ty po graphic hor ror. Less serious
erties, and so on) and con figu ra tion pa ra me ters infractions—such as a little too much space be-
like the type of justification and algo rithm to use. tw een words—hav e bad ness scaled down. You
It returns an object with a new lay out (the boxes hav e to use your heart to tell you what these scal -
grouped into lines, with new glued up widths) ing factors should be.
as w ell as the total bad ness. Inside the BoVeX
lay out sup port code, this primop is wrapped as Fonts
pack-boxes with a na tiv e, typed inter face, so pro -
gram mers do not need to think about that im ple - BoVeX can ren der your doc u ment in plain Times
men ta tion de tail. Other ty po graphic features that Roman if you don’t care about any thing, or access
ben efit from run time sup port are im ple mented 13 other bor ing built-in PDF fonts, or it can load
any True Type font from font files. (They do not The de tails really keep go ing, too. The hy phen ation
need to be “in stalled,” and it won’t help to install dic tio nary is stored in a file called hyph-en-us.tex .
them. You just put them in the di rectory with your “hyph” here of course stands for hy phens, and
doc u ment.) It loads their kern ing ta bles and ap - “en-us” means “Eng lish (United States).” In fact it
plies kern ing prop erly, by gen erat ing rigid boxes is the stan dard lan guage code for US Eng lish in the
at the sub-word lev el with un break able glue. I w as Small Lan guage Model called IETF BCP 47.[28] But
dis ap pointed to find that most fonts include only then w e hav e “hyph-en”, which is a plau sible hy -
a few dozen kern ing pairs. They do this in or der phen ation of “hy phen”! You could even read it as
to “sav e space” in the font file, which is ut terly “hy phen us, tex”, as a request for TeX to hy phen -
rich com ing from some one that would try to sav e ate the words in this file. This is the kind of de tail
space inside of words by squeez ing letters together! I’m talk ing about! (There is also hyph-uk, which
In the cur rent font Palatino, the word “BoVeX” is for once sounds a little less dig nified than the US
not kerned cor rectly because the rare bigraph “oV” accent.)
does not hav e a kern ing pair. I hope to im prov e
this de tail in a fu ture version (per haps for the pre - Rephras ing
sum ably forth com ing video version of this pa per ).
And of course, BoVeX includes a facility for us ing
Hy phen ation the LLM to rephrase text so that it ren ders more
beau tifully.
Johannes Guten berg invented the hy phen in A.D.
1455 for his Guten berg Bible, then just known as In con trast to the algo rithm I de scribed for mono -
Bible. [26] His print ing process actu ally required the spaced text, it is not straight for w ard to know
lines to all be the same length, so he had to stick whether a pre fix of some text will pack neatly
these little guys all ov er the place. His hy phens with a pro por tional font. It de pends on all sorts
looked like this: . Later on w e straight ened these of con tin gen cies, like kern ing, whether w e will
out and de cided w e only needed one at a time, split mid-word and hy phen ate, or change fonts
and today w e use them not because w e require our mid-sentence, or include an in-line im age, and so
lines to all be the same length, but because w e like on. Un like mono spaced text, a line of pro por tional
the cog nitiv e chal lenge of remem ber ing the begin - text ba sically nev er fits exactly (bad ness 0); w e need
ning of the word while w e mov e our eyes to the be- to ap ply some glue to make it fit, which gen erally
gin ning of the next line while read ing. has some small cost even when the text looks great.

BoVeX sup ports hy phen ation us ing the same ap - One of the fid dliest parts of this is that w e can’t
proach as TeX: We break each word into boxes at just work with plain text, which is what the LLM
legal hy phen ation points, and mark these points as en joys best. Me too. This is because the para graph
sort-of-bad to break, and that if you do, you need being rephrased is some lay out value, which con -
to insert the hy phen char acter and use a little more tains some struc ture. Send ing the orig inal BoVeX
space. By de fault in BoVeX, the hy phen sticks out code for the para graph would maybe be pos sible
of the end of the line a little bit. This is actu ally a in prin ciple, although it would require very inva-
bug but I like it. siv e changes to the com piler, and for bid den ob-
scen ities like “ev al” to run the code it gen erated,
I use the same hy phen dic tio nary as TeX, which is and much bet ter error recov ery for the pre sum ably
clev erly rep resented as a pri or itized set of pat terns vig or ous stream of bro ken BoVeX code gen erated
in or der to fit com pactly in mem ory. [27] Again, by the LLM. So I didn’t try that. Instead, I gen er-
you hav e to respect Knuth and crew’s at ten tion ate a textual rep resen ta tion for the para graph to be
to de tail, although to be fair this algo rithm also rephrased, and feed that to the LLM. The prompt
dates to a time when stor ing a spell check dic - looks like this:
tio nary in a computer ’s mem ory w as de scribed
as “not feasible.” So some of this w as out of ne -
cessity. One of the nice things about the rep -
resen ta tion is that it gen eralizes to words that
w ere not in the 1974 Merriam-W ebster Pocket
Dictio nary. For exam ple it hy phen ates SIG-
BOVIK cor rectly.
Exercise in rephrasing text. The following para- ing process.
graph, which appears between <P> and </P>
tags, needs to be rephrased so that it retains its But, how do w e know whether w e hav e a good
precise meaning, but with minor variations in the rephras ing? When w e run the boxes-and-glue algo -
specific choice of words, punctuation, and so on. rithm, w e get a “bad ness” score for the paragraph’s
No new facts should be introduced or removed, line breaks, which tells us how bad the paragraph’s
and all the ideas from the original paragraph line breaks are. When w e run the rephras ing algo -
should appear. However, it is good to use syn- rithm, the prob ability of the text w e gen erated tells
onyms and change the word order and phrasing.
us how seman tically good it is, and so w e can call
The text contains markup as well. There are two 1 - p the seman tic loss. Com bin ing those two some -
types: <span class="c0">text goes here</span> how tells us how bad this is ov erall, and of course
and <img src="image.png">. These should be w e w ant to find a rephras ing that min imizes the
preserved in the rephrased text. <img> tags ov erall bad ness.
absolutely need to be retained and should not
change their sources, although it is permissible I wish that I could tell you that I solv ed this one
to move them around in the text. <span> should with a beau tiful algo rithm! But so far I just hav e
generally be retained, but the contents could some thing rea son able that works. I gen erate many
change. The classes of spans may not change, dif ferent rephras ings (with their seman tic loss),
and only the classes that appear in the original and run each of them through the boxes-and-glue
text may be used. algo rithm (to get the ty po graphic bad ness). I
choose the one that op timizes the pre ferred trade -
The first part is ba sically the same as what I used
off betw een seman tic loss and ty po graphic bad -
for the mono spaced version, except that I ask the
ness. This process is con trolled by BoVeX code (i.e.
LLM to de limit the para graph. This is im por tant
it is in the source code of this very pa per ) and so
so that I know when it thinks it’s done, and seems
it can be mod ified by the doc u ment au thor. Knuth
to work bet ter than look ing for new lines or the
has a very low tol erance for seman tic loss, and
end-of-stream token. The second part is new. I trans -
knows that his algo rithms pro duce good results
late the lay out into plain text where un inter preted
with out rephras ing. Lorem Ep som just w ants it to
sub trees are replaced with <img src="img1.png"> .
look good and sound good. Both hav e pub lished
These are gen erally boxes whose con tents are not
in SIGBOVIK 2024.
text. This could be an actual inline im age or lay -
out used to con trol ren der ing, like some bit of hor - How to gen erate many dif ferent rephras ings? The
izon tal space. Nodes that are used to set text prop - sim plest thing would be to sam ple ran domly, like
erties of the sub trees with at trib utes (like fonts, col- w e did for the mono spaced version. But since w e
ors , sizes , etc.) are trans lated into dis tinct classes and pre fer rephras ings that max imize prob ability, it
marked up with <span class="c0">...</span> . is bet ter to explore them sys tem at ically. Con sider
The LLM has seen plenty of HTML, so it’s able to the model at the end of the prompt to be the root
use these rea son ably w ell. of an infinite tree. Each node in the tree rep resents
an LLM state (sequence of pre vious tokens) and its
After gen erat ing a rephras ing, I parse the out put
chil dren are the pos sible next tokens. Each of these
HTML and match it up with the orig inal lay out.
tokens has a prob ability. All the model does is al-
If I find any bro ken HTML, it is rejected. If I find
low us to access that prob ability dis tri bu tion for
any <img> tag referenc ing a src not in the orig inal,
a node. Each pos sible rephras ing is a path in this
it is rejected. If I find any <span> tag referenc ing a
tree that ends with </P>. We begin by sam pling the
class not in the orig inal, it is rejected. The more
most likely (as far as w e know) path: At each node
com plex ity that the orig inal lay out has, the higher
w e see, w e take the first (most prob able) token.
the chance of a rejection, but rephras ing gen erally
This is our first rephras ing, and it usu ally matches
suc ceeds. But rejecting sam ples slows us down, so I
the orig inal text exactly. Say that w e “skipped”
leav e off the second part of the prompt in the com -
prob ability mass if w e sam pled a token that is less
mon case that the input para graph is plain text.
prob able than it. We com pute the seman tic loss as
That w ay the LLM doesn’t even try us ing markup.
the av erage prob ability mass skipped ov er all the
With the HTML and orig inal lay out matched up, tokens in the path. For this first path, w e alw ays
BoVeX can recon situte the lay out with the new took the most prob able token, so this is 0.0 by de -
rephrased text. This pre serv es any nested lay out finition.
and at trib utes. It then con tin ues with the ren der -
The next path w e explore will di verge from this expo nen tially many paths starts out with im prob -
path at some node (maybe the root). We pick a able tokens but then ends with a mir acle streak of
node that is likely to result in a good final loss, by prob able tokens). But it can certainly be more sat -
scor ing each node in the tree. The score is the av er- sifying. Knuth would not stop here (but this is an
age prob ability of all an cestor nodes times the prob - Any% Knuth speedrun).
ability of the next highest-probability token that
w e hav e not yet explored. The node with the high - Instead I spent my time im ple ment ing an achiev e-
est ov erall score is the one w e expand, by choos ing ment sys tem in BoVeX. The first time certain con di -
that next highest-probability token. We are now in tions are met, the sys tem per ma nently aw ards you
an un explored part of the tree, and so w e sam ple an achiev ement and prints a nice color tro phy on
the most prob able nodes repeat edly un til w e reach your ter mi nal. For exam ple, you can get the “Not
</P>. Speak ing of which, BoVeX has a heck of a bad” achiev ement for gen erat ing a doc u ment that
time try ing to rephrase these last few para graphs is at least 5 pages and has less than 1000 bad ness
because they literally con tain the text </P> in them. per page.

The scores should be seen as heuris tic; w e would Ad van tages of rephras ing
get dif ferent results by choos ing dif ferent w ays
of com put ing the score. This is an exam ple of a An other nice thing is that the man ual rephras ing
“beam search” algo rithm, which is good because that con sumes valu able brain sug ars when writ ing
it con nects this project again to Super Metroid. As can become op tional. For exam ple, when I wrote
de scribed in the ear lier excerpt from the speedrun the open ing para graph of this pa per and listed a
doc u ment that inspired this work, one of the final variety of triv ial de tails, I might not need to think
things you do in that game is acquire the “hy per of dif ferent w ays to say “un con cerned.” I could
beam” to de feat Mother Brain. just write “un con cerned” each time and let the ty -
po graphic con sid erations de ter mine which syn -
Since w e will run the boxes and glue algo rithm onym to use each time.
on mul tiple related texts, I gen eralized that algo -
rithm to work on tree-structured input. This is Con clu sion
clean; the memo ta ble keeps the same di men sions,
but records an ad di tional fact. Now w e store the In this paper—and with this paper—I pre sented Bo-
penalty, whether to break after this token, and VeX,a new com puter type set ting sys tem. It follows
what the best sub tree is. We hav e to con sult each the tra di tion TeX, but with mod ern ameni ties such
sub tree when com put ing the score for a node, but as requir ing ov er 128 giga bytes of RAM. Though
this does not affect the asymp totic run time. The ta - some may con sider the ad di tion of AI features to
ble size is still at most O(n 2), and although w e ex- TeX to be an un nec essary per version, I find this
plore more chil dren per node, branches in the tree use of LLMs to be fully justified.
reduce the max imum depth to the root, which ac-
tu ally reduces one of the factors of n to log( n) as Fu ture work
the tree becomes com plete. How ever, as the SIG-
BOVIK dead line crept upon us, I nev er actu ally Ty po graphic fea tures . Many more ty po graphic
hooked this func tion ality up. It would require ad - features are de sir able. Foot notes! It is so hard to
di tional (pro gram ming) work to merge the trees, write a pa per with out foot notes. Where am I sup -
and the lay out process is so fast that it doesn’t mat - posed to put the bonus di gres sions? The lay out
ter; I can eas ily run the full lay out algo rithm on of foot notes is tricky and should be part of a gen -
hun dreds of rephras ings per para graph. eral float ing figure im ple men ta tion. End notes are
actu ally easy, but I don’t w ant end notes. I w ant
I would like to im prov e the algo rithm, because it them to be little foot notes so that you can’t help
does seem like there should be a w ay to integrate but read them.
the boxes-and-glue dy namic pro gram ming algo -
rithm with the path exten sion algo rithm so that BoVeX does not sup port page num bers, which is
w e pri or itize explor ing nodes that are likely to gen - good because they are for bid den by the SIGBOVIK
erate the best bal ance of ty po graphic and seman - pro gram com mit tee.
tic qual ity. It won’t be as sat isfyingly op timal as
boxes-and-glue itself because w e hav e incom plete TeX is famous for its math emat ical type set ting
infor ma tion (w e nev er know whether one of the as w ell. It would fit neatly into BoVeX in the
same w ay, since both use the same fun da men - mu ta tiv e, or other prop erties you’ d like), inference
tal boxes-and-glue en gine. BoVeX does not hav e can some times gen erate dif ferent an sw ers due to
“macros” or “modes” like TeX, but it would work float ing point round-off error. [30] Alas, these are
cleanly to write a BoVeX func tion math (or, if you not even nec essar ily related to the final prob abil-
like, $) that parses a cus tom syn tax. In fact it would ities in the model, as billions of non-linear op era-
be nat ural to hav e dif ferent parsers for dif ferent tions hap pen within the hid den lay ers of the net -
maths, so that you don’t need to parse -> as minus work. The effect is not par ticu larly grav e; w e might
greater than in math emat ical con texts that don’t use miss out on a highly likely path because the prob -
mi nus or greater than at all. ability dis tri bu tion w as dif ferent the second time
w e looked at it. There are already lots of w ays w e
Op ti miza tion . There are many op por tu nities to might fail to find highly likely paths, so this is not
make BoVeX code faster. This is mostly im por tant some kind of repro ducibil ity crisis. It is mostly just
for when it is being run in a loop in or der to try a bit un sat isfying.
out many dif ferent rephrased texts. (That said, I
do not wish to pre clude what could be done with Uni code sup port . This would hav e been help ful
BoVeX by assum ing its execu tion is do ing only when abov e I de cided to show you Gutenberg’s
type set ting tasks. For exam ple, shouldn’t you be funny hy phen, , for which I had to set tle for em -
able to chal lenge your paper ’s review ers to a game bed ding a crappy hand-drawn PNG file. Instead
of chess against a strong en gine em bed ded within I could hav e used U+2E17, which since this exotic
your doc u ment?) The first thing to fix is that it ma - code point it is not present in the font Palatino, you
nip u lates too many strings at run time (e.g. the could hav e expe rienced as . BoVeX is wit ten with
code, record labels, object fields, and “reg isters”). some Uni code sup port, with the main excep tion
This is easy to fix since these are all known at com - being that the PDF out put code only sup ports the
pile time. There are lots of high-lev el op timiza tions em bar rass ingly diminu tiv e WinAn siEn cod ing. [31]
left to do for the IL code (com mon subex pres sion
elim ina tion, con stant ar gu ment remov al, un cur ry - Dead lines . Although BoVeX itself is very fast,
ing, etc.) and lots of peep hole and control-flow op - rephras ing is very slow. This presents a prob lem
timiza tions left to do for the byte code (cur rently for the typ ical w ay that aca d emic pa pers are writ -
no op timiza tions are per formed at all). All of this ten, which is to do all the work in a coffee-fueled
becomes more im por tant if I add an other planned fugue in the last few days before the dead line, then
feature, which is the abil ity for the doc u ment to stay up all night writ ing the pa per and find ing ci-
be glob ally op timized by ap ply ing a black-box op - ta tions for the pro-forma “re lated work” section
timizer to a set of user-specified pa ra me ters. For which you did last but you know that the review -
exam ple, the column width, line spac ing, or font ers will insist upon, and tw eak ing \vspace and
size could be tw eaked to make the doc u ment fit \begin{figure}[h!] un til it fits within the page
bet ter. This feature is “Auto-Margin Plus.” Things limit. On the one hand, BoVeXdoes po ten tially free
are already set up to do this pretty straight for - the au thor from the visual tw eak ing process. But on
w ardly; w e would sim ply gen erate the doc u ment the other hand, the LLM inference for the rephras -
ov er and ov er while search ing ov er the pa ra me ter ing process can be quite slow, and it can take many
space, and choose the one with the least bad ness. hours or days to fully bake a long pa per! For this
This may also affect which rephras ings look best. rea son, it may be bet ter to change con ference dead -
But instead I spent my pre cious time im ple ment - lines to a sys tem where the pre-rephrasing text is
ing 3D text .[29]
3D text sub mit ted. The pub lish ers (what do they even do?)
can be the ones to execute the rephras ing in the
Re pro ducibil ity . The algo rithm for reprhas ing cloud as they pro duce the “camera-ready copy.”
text tries to find the best place to explore the next With straight for w ard exten sions, this would also al-
most likely token from the prob ability dis tri bu tion. low the rephras ing to adapt to changes in the ov er-
This expects the gen eration of these dis tri bu tions all vol ume sty le, or to ad just to avoid em bar rass -
to be de ter min istic. Math emat ically, inference is ing ty po graphic con cidences with other ar ticles in
de ter min istic (it is just a bunch of ma trix mul tipli - the same vol ume (such as us ing the same no ta tion
cations), so this “should work.” But in prac tice the with a dif ferent mean ing). In prin ciple, the pa per
enor mous calcu lation is per formed in an un pre - could edit itself to respond to feed back from re-
dictable or der as it is executed in par allel (in mul - view ers, in a w ay that min imizes the seman tic dis -
tiple CPU and GPU cores). Because float ing point tance from the orig inal. This rapid feed back loop
arith metic is not associativ e (or dis trib u tiv e, com - could reduce the time to pub lication, per haps to
mere months, or even w eeks! ber 2006.

Other w ays to min imize bad ness . The BoVeX [7] You can just go to arxiv.org and click on any
sys tem allows the doc u ment au thor to exchange se- ran dom ar ticle these days .
man tic con sistency for higher qual ity ty pog ra phy.
Although w e achiev e state-of-the-art results, there [16] N Bijlage . "Knuth meets NTG mem bers ". NTG:
are likely points that are more Pareto-efficient MAPS , 16. March 1996. pp. 38–49.
than what BoVeX can reach. BoVeX uses one of
the most pow erful pub licly av ailable LLMs, but [3] Russ Bynum . "Bernie Sanders w ants the US to
that model is lim ited to rewrit ing the text within adopt a 32-hour work w eek. Could work ers and
nar row con straints. Irrespon sible research has com pa nies ben efit?". March 2024.
demon strated that lan guage mod els are capa ble
of vo lition, tak ing actions and us ing tools to ac- [12] https: / / github.com / ggerganov / llama.cpp . gger -
com plish goals. With mi nor mod ifications, it is anov . March 2024.
likely pos sible to expand the Pareto fron tier of the
semantic/typographic trade off. For exam ple, some - [26] Johann Guten berg . "Bible". 1455.
times w e could im prov e the ty po graphic qual ity
of the text with out any seman tic loss, by act ing on [4] https: / / allfamousbirthday.com / donald-knuth / .
the world to make the reworded text true . Hu man au - Febru ary 2024.
thors do this already: Ear lier when I w as de scrib -
[23] https: / / ocaml.org . "The Ob jectiv e Caml sys -
ing internal-pack-boxes , rather than explain the
tem ". 2023.
some what awk w ard im ple men ta tion, I w ent back
and changed the already-working code so that [21] Gra ham Hut ton . "Higher-order func tions for
it would serv e as a sim pler exam ple of how pri - pars ing ". Journal of functional programming, 2(3).
mops use obj , but still be truth ful. Now imag ine 1992. pp. 323–343.
the dif ficulty in type set ting a state ment like “The
uni verse con tains ap prox imately 1,000,000,000 [5] Don ald E Knuth . "The Art of Com puter Pro -
pa per clips,” and how much more beau tiful gram ming: Volume 4A, Com bina torial Algo rithms
the text could be if that num ber w ere instead Part 1". Addison-Wesley. Janu ary 2011. 912 pages .
10,000,000,000,000,000,000,000,000,000,000,000,000!
[6] Don ald E Knuth . "The Art of Com puter Pro -
In the mean time there is an eas ier w ay to get zero gram ming: Volume 4B, Com bina torial Algo rithms
bad ness: Delete the whole doc u ment! As a wise Part 2". Addison-Wesley. October 2022. 736 pages .
per son once said, “If you can’t say some thing with
nonzero ty po graphic or seman tic loss, don’t say [15] Don ald E Knuth, Michael F Plass . "Break ing
any thing at all.” para graphs into lines ". Software: Practice and Experi-
ence, 11(11). No vem ber 1981. pp. 1119–1184.
Acknow ledge ments . Sup pos ing his name sur -
viv es rephras ing, I’d like to shout out to one of [27] Franklin Mark Liang . "Word Hy-phen-a-tion
my ad visors, Karl Crary. 20 years ago, he set out by Com-put-er ". 1983.
with me on an ill-advised and ill-fated at tempt to
replace LaTeX with an SML-like lan guage mT eX, [17] Robin Milner, Mads Tofte, Robert Harper,
which com piled into TeX macros. The nest ing David Mac Queen . "The de finition of Stan dard ML
square brack ets syn tax w as Karl’s idea, and BoVeX (Revised) ". MIT Press. May 1997. 114 pages .
shares genetic ma terial with mT eX for sure.
[8] John C Mitchell . "Type Systems for Pro gram -
See you next mis sion, ming Lan guages ". Van Leeuw en, Jan, ed . Formal
Models and Semantics. 1990. pp. 365–458.
Tom 7
[1] Tom Mur phy VII. "Bad ness 0 (Epsom's version) ".
SIGBOVIK. April 2024. 14 pages .
Bib li og ra phy
[30] Tom Mur phy VII. "GradIEEEnt half de cent ".
SIGBOVIK. March 2023. pp. 33–56.
[31] Adobe . "PDF reference: Sixth edi tion ". Octo-
[18] Tom Mur phy VII. "Modal Types for Mo bile Fu, Brian Fuller, Cyn thia Gao, Vedanuj Gosw ami,
Code ". Janu ary 2008. Na man Goy al, An thony Hartshorn, Saghar Hos -
seini, Rui Hou, Hakan Inan, Marcin Kar das, Viktor
[13] Tom Mur phy VII. "NaN gates and flip FLOPS". Kerkez, Ma dian Khabsa, Isabel Kloumann, Artem
SIGBOVIK. April 2019. Korenev, Punit Singh Koura, Marie-Anne Lachaux,
Thibaut Lavril, Jeny a Lee, Diana Liskovich, Ying -
[10] Tom Mur phy VII. "The First Level of Super hai Lu, Yun ing Mao, Xavier Mar tinet, Todor Mi-
Mario Bros. is Easy with Lexicographic Or der ings hay lov, Pushkar Mishra, Igor Moly bog, Yixin
and Time Trav el. After that it gets a little tricky ". Nie, An drew Poul ton, Jeremy Reizen stein, Rashi
SIGBOVIK. April 2013. pp. 112–133. Rungta, Kaly an Saladi, Alan Schelten, Ruan Silva,
Eric Michael Smith, Ran jan Sub ra man ian, Xiao -
[20] Tom Mur phy VII. "The Wizard of TILT: Effi- qing Ellen Tan, Binh Tang, Ross Taylor, Ad ina
cient(?), Con venient and Abstract Type Rep resen ta - Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan,
tions ". Carnegie Mellon tech report CMU-CS-02-120 . Iliyan Zarov, Yuchen Zhang, An gela Fan, Melanie
March 2002. Kam badur, Sha ran Narang, Aurelien Rodriguez,
Robert Stojnic, Sergey Edunov, Thomas Scialom .
[29] Tom Mur phy VII. "The glEnd() of Zelda ". SIG- "Llama 2: Open foun da tion and fine-tuned chat
BOVIK. April 2016. pp. 105–112. mod els". ArXiv.org. July 2023.
[14] Tom Mur phy VII. "ZM~~ # PRinty# C with
ABC!". SIGBOVIK. April 2017. pp. 129–148.

[25] http: / / tom7.org / fixed ersys / . Tom Mur phy VII.


"The Fixed erSys font fam ily". 2024.

[2] https: / / en.wikipedia.org / wiki /


Wikipedia:W ikiProject_Punctuation . Tom Mur phy
VII. "WikiPro ject Punc tu ation ". July 2007.

[19] https: / / sourceforge.net / p / tom7misc / svn / HEAD /


tree / trunk / rephrase / . Tom Mur phy VII. "BoVeX
source code ". 2024.

[22] Chris Okasaki . "Even higher-order func tions


for pars ing or Why would any one ever w ant to
use a sixth-order func tion? ". Journal of Functional
Programming, 8(2). March 1998. pp. 195–199.

[28] A Phillips, M Davis . "Tags for iden tifying lan -


guages ". Sep tem ber 2009.

[9] https: / / gamefaqs.gamespot.com / snes /


588741-super-metroid / faqs / 10114. rs1n . "Super
Metroid Speed Guide and FAQ". 1996.

[24] Didier Rémy, Jérôme Vouil lon . "Ob jectiv e ML:


An effectiv e object-oriented exten sion to ML". In
Theory And Practice of Objects Systems, 4(1). 1998.
pp. 27–50.

[11] Hugo Tou vron, Louis Mar tin, Kevin Stone, Pe-
ter Albert, Am jad Alma hairi, Yasmine Babaei, Niko -
lay Bash lykov, Soumy a Batra, Pra jjwal Bhar gav a,
Shruti Bhos ale, Dan Bikel, Lukas Blecher, Cris t-
ian Can ton Ferrer, Moy a Chen, Guillem Cu cu rull,
David Esiobu, Jude Fernan des, Jeremy Fu, Wenyin

You might also like