Professional Documents
Culture Documents
2 - XMAS - An Open MIDI and Sample-Based Music System Computer Science Tripos Robinson College May 9, 2004 B Davis
2 - XMAS - An Open MIDI and Sample-Based Music System Computer Science Tripos Robinson College May 9, 2004 B Davis
Work Completed
An XML-based structure that allows a .mid author to build his or her own
instrument sounds using .wav files was designed. A software library for parsing
the structure and rendering the music to a mono or stereo PCM sample stream
was written. In particular this incorporated a real-time resampler with cubic
interpolation and a MIDI player. The project is fairly mature and will soon be
available at http://xmas.sf.net/.
Special Difficulties
None.
i
Declaration
I, Ben Davis of Robinson College, being a candidate for Part II of the Computer
Science Tripos, hereby declare that this dissertation and the work described in it
are my own work, unaided except as may be specified below, and that the disser-
tation does not contain material that has already been used to any substantial
extent for a comparable purpose.
Signed
Date
ii
Contents
1 Introduction 1
1.1 The Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 MIDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Samples and Streamed Audio . . . . . . . . . . . . . . . . 2
1.1.3 Amiga mod-based Files . . . . . . . . . . . . . . . . . . . . 3
1.2 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 The Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Preparation 7
2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Initial Analysis of Requirements . . . . . . . . . . . . . . . . . . . 7
2.2.1 Using Industry Standards . . . . . . . . . . . . . . . . . . 7
2.2.2 Compression and Kolmogorov Complexity . . . . . . . . . 7
2.2.3 Structure and Tweaks . . . . . . . . . . . . . . . . . . . . 8
2.2.4 Same Output Everywhere . . . . . . . . . . . . . . . . . . 9
2.3 Project Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Further Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.1 Real-Time Playback . . . . . . . . . . . . . . . . . . . . . 10
2.4.2 Third-Party Players . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Choice of Programming Language . . . . . . . . . . . . . . . . . . 10
2.6 Refining the File Structure . . . . . . . . . . . . . . . . . . . . . . 10
2.7 Final Preparations . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.7.1 Core Requirements . . . . . . . . . . . . . . . . . . . . . . 11
2.7.2 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.7.3 Work Plan and Timetable . . . . . . . . . . . . . . . . . . 11
2.7.4 Libraries and Code Used . . . . . . . . . . . . . . . . . . . 12
2.7.5 Documentation Used . . . . . . . . . . . . . . . . . . . . . 13
2.7.6 Code Management and Back-ups . . . . . . . . . . . . . . 13
iii
3 Implementation 15
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Digital Signal Processing Modules . . . . . . . . . . . . . . 15
3.1.2 The State Tree . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.3 Generating the Music . . . . . . . . . . . . . . . . . . . . . 16
3.2 The XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Parameter Tweaks . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5 The Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5.1 Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5.2 Volume Envelopes . . . . . . . . . . . . . . . . . . . . . . . 22
3.5.3 Multiplexers . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5.4 MIDI Mappings . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5.5 Variable Compute Blocks . . . . . . . . . . . . . . . . . . . 25
3.6 The MIDI Playback Algorithm . . . . . . . . . . . . . . . . . . . 25
3.6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.6.2 State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6.3 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6.4 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.6.5 Noteworthy Features . . . . . . . . . . . . . . . . . . . . . 28
4 Evaluation 31
4.1 Goals Achieved . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Evolution of the Plan . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.1 Generalisation . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.2 Filtering Whole Channels or Tracks . . . . . . . . . . . . . 32
4.2.3 Reference Counting . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Milestones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4.2 The Resampler . . . . . . . . . . . . . . . . . . . . . . . . 34
4.5 Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.6 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.6.1 Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.6.2 Volume Envelopes . . . . . . . . . . . . . . . . . . . . . . . 38
4.6.3 MIDI Playback . . . . . . . . . . . . . . . . . . . . . . . . 39
4.6.4 Flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.7 Problems Encountered . . . . . . . . . . . . . . . . . . . . . . . . 39
4.7.1 STL Containers . . . . . . . . . . . . . . . . . . . . . . . . 39
iv
4.7.2 XML Files . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.7.3 Code Size . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5 Conclusions 41
Bibliography 43
D Project Proposal 54
v
List of Figures
vi
Acknowledgements
Many thanks are due to Neil Johnson, my Project Supervisor, for the guidance
he offered right from inception up until the final deadline. Thanks also go to
Dr Alan Mycroft, my Director of Studies, for his assistance with this dissertation.
The Dissertation was written inside the skeleton structure provided by
Dr Martin Richards’ How to write a dissertation in LATEX [8].
vii
viii
Chapter 1
Introduction
Electronic music is an exciting field. Many people will insist that it is no substi-
tute for conventional acoustic music, and they are right. Acoustic instruments—
and human performance—are an enormous challenge to emulate. Electronic mu-
sic is not a substitute: it is a complement. It is a whole new world, populated
with many synthesisers and filters, each with its own distinctive character, and
free of such constraints as the span of a pianist’s hand. Moreover, it can all be
done in software inside any reasonably modern computer equipped with a sound
card and speakers.
I am a proficient pianist and have composed a great deal of professional quality
music. This strong musical background enabled me to hear bugs in my project’s
output and reason about sound quality.
1
2 CHAPTER 1. INTRODUCTION
1.1.1 MIDI
Enter the MIDI Specification [1]. It was created in 1983 by Sequential Circuits,
Roland and several other major synthesiser manufacturers as a protocol to allow
instruments to communicate with one another. There are 16 channels, numbered
from 1 to 16; a device can respond to events on some channels and not others, or
assign different instruments to different channels. Events such as the following
can be encoded in a byte stream and sent between devices:
The first byte is known as the status byte. There are many other commands,
but the status byte is always in the range 80–FF, and data bytes are always in
the range 00–7F. If the same command is used repeatedly, the status byte only
need be specified once, and it becomes the running status.
MIDI commands can also be saved with time stamps in a Standard MIDI
File. Such a file has the extension .mid. XMAS uses the .mid file as a major
component in a piece of music.
General MIDI, an addition made to the standard in 1991, designates a stan-
dard set of instrument names for the 128 integers that can be used in a Program
Change command. It further allows a single channel to be used for unpitched
percussion (such as cymbal crashes), with standard percussive instrument names
assigned to many of the 128 note values. On multi-part synthesisers, it is usual
for Channel 10 to contain percussion.
have 16-bit values. Many recordings have two channels, one for each speaker in
a stereo set-up;2 here, the sample points are interleaved, and each left-right pair
is called a sample frame. This system for encoding sound as a series of samples
is called Pulse Code Modulation, or PCM . It is possible to store an arbitrary
number of channels in a .wav file, but it is rare for more than two to be stored.
A sample, in one sense of the overappropriated word, is a recording of a sound
effect, a note, or a short, repeatable sequence of notes. It is typically stored in a
.wav file. XMAS uses samples as a key component in instrument design.
Lossy compression algorithms exist for PCM data. The best known is
MPEG-1 Layer 3, or .mp3; others are Ogg Vorbis (.ogg) and Windows Media
Audio (.wma). These typically achieve compression ratios of 12:1. It is possible
to stream such compressed PCM data over a channel3 (e.g. for Internet radio), so
these schemes are often referred to as streamed audio schemes. Supporting these
in addition to .wav files was made an extension for reasons to be given later.
Adjusting the speed of a sample—stretching or compressing it in the time
axis—results in a shift in the frequencies present in the spectrum, perceived as a
change of pitch. XMAS uses this to generate different notes from a single sample.
It is not the best way to change the pitch of a note, but it is quick to do, and the
same algorithm can enable playback at an arbitrary sampling frequency.
I have co-authored DUMB, a library for playing these four mod-based for-
mats [5]. The experience, both positive and negative, from working on DUMB
proved very useful in specifying, planning and implementing this project.
.mid files. Most computers are built with the ability to play .mid files, in soft-
ware if not in hardware. Since MIDI has so much industrial support, there
is good hardware and software available, and .mid files are easy to produce.
However, owing to the nature of MIDI, the output varies vastly from one
system to another. Even General MIDI does not specify exact instrument
sounds, only names: from experience I know that a change of MIDI device
can be utterly devastating to a piece of music.
Compressed audio streams. Why not produce MIDI files for one’s own MIDI
set-up, record the output and encode to a compressed audio scheme such
as .mp3? Many people do this, and for boxed games it is a fair solution.
However, it is not a good choice for games made available for download on
the Internet (including demos of commercial games): even a small selection
of .mp3-format music could take a dial-up modem user hours to download.
These formats have the further limitation that they are unstructured, and
the game cannot make adjustments, e.g. to speed, on the fly.
Amiga mod-based formats. These seem like a good solution, but there are two
problems. The first is that producing mod-based music is very difficult. The
second is that you can never be sure your music will play correctly. All
the original tracker software was closed-source. Third-party players have
been developed, but most of them misinterpret the data, some of them very
severely. We made a serious effort to get it right in DUMB, but there are still
errors. The player shipped with the popular Winamp media player is one
of the least accurate, which poses a real problem to anyone releasing mod-
based music. Furthermore, there is a major third-party tracker, ModPlug
Tracker, which differs significantly in several ways from the original tracker
programs. As a result, there is no single correct way to play mod-based
formats.
1.3. THE SOLUTION 5
.xmi: XML Instrument Definition. This defines one or more instrument sounds,
most likely referring to .wav files or other .xmi files in the process.
.xmm: XML Music. This specifies a .mid file, along with an XML Instrument
Definition defining the instrument sounds to play it with. The definition
may either be embedded or consist of a reference to an .xmi file.
.xma: XML Music Archive. This contains zero or more .xmm files and all files
they depend on, using appropriate compression for each part. It allows
music to be consolidated into a single file. In addition, it is a nice take on
Microsoft’s .wma format!
Since .xma is the ideal final format for music, the project as a whole is called
the XMA System, or XMAS for short.
5
Modulo the rather verbose XML glue, which is highly compressible.
6 CHAPTER 1. INTRODUCTION
Chapter 2
Preparation
2.1 Requirements
• It must be easy for a composer to produce music.
• It must be possible for the composer to keep file sizes down to a minimum.
• The music must be structured so as to allow tweaks on the fly. Such tweaks
might include speed variation, muting of some instruments and instrument
substitution.
7
8 CHAPTER 2. PREPARATION
of generating the data. This shortest program is known as the minimal descrip-
tion. In the worst case it will be a PRINT statement followed by the data to be
output, but in the best case it can be extremely concise.
The Kolmogorov complexity gives a lower bound for the size of losslessly
compressed data (assuming the decompression algorithm is simple). In general,
it is very difficult to meet this lower bound in any automatic process. The minimal
description is usually a representation of the structure of the file, so to create it
requires a good understanding of this structure.
Music is highly structured. Some of the structure can be articulated. For
instance, we think of a piece of music as a sequence of notes, and this is how music
has always been written down, whether for live performers or for a computer. This
is in fact just one of the many structures found in music. Themes and rhythms
recur, harmonic progressions are often predictable, and even the scale itself is
riddled with frequency ratios such as 2:3 (perfect fifth) and 3:5 (major sixth).
Much of the structure is lost when the music is written down.1 However, the
Western twelve-note scale, including its ratios, is implicit and can be implemented
in the library, and the idea of a sequence of notes is preserved. This is enough
to bring the file size down well below that of an .mp3 file, if simple instrument
definitions are used.
What if complicated instruments are used? I have an .xm file (mod-based) that
is over 10 MB in size. An .mp3 version of typical quality would be smaller. It
could still be beaten if the instrument samples were compressed, but this should
be done with care. Samples are often looped, so that when playback reaches the
end of a sample, it jumps back to a specified point (see Section 3.5.1). To avoid
clicks, the point is chosen so that the resultant curve is continuous. .mp3 and
friends are lossy, and there is no guarantee that such a loop would be preserved.
In light of this consideration, providing instrument compression has been left to
the extensions.
tree will be a node, also specified in XML, parenting a set of instruments and
containing a reference to a .mid file.
Furthermore, the above types of node will be unified so that they can be
strung together effortlessly in any layout. The term ‘DSP module’ will refer to
any node, since a node’s purpose is typically to do some digital signal processing.
In particular, this enables short MIDI sequences to be used as instruments in
larger ones.
2.7.2 Extensions
The following two features will have to be consigned to extensions simply because
of the amount of work they would involve:
• The .xma format will take some careful planning, and so has been left as
an extension.
Other possible extensions include extra DSP modules (such as filters, dis-
tortion and echo), click removal for when samples start, stop and loop (not for
clicks in the actual sample data), support for surround sound, a GUI for editing
and testing .xmi and .xmm files, a stand-alone player, and XMMS and Winamp
plug-ins.
This kind of API is not conducive to the object-oriented structure I want, and
it is certainly not thread-safe. The same problem arises with flex and bison:
global variables are heavily used.
By contrast, mathexpr is written in C++ and has a good object-oriented
structure. The site presented a worrying description and example of the parser’s
behaviour, but a test proved that these were incorrect and the parser behaved as
one would expect.
mathexpr treats concatenation of variable names as multiplication (e.g. xy
is x × y), so I performed another test to see if variable names longer than one
character would be accepted. They were, with the restriction that a name could
not consist of an existing name with a suffix added (so ‘note’ and ‘notevelocity’
could not coexist). I considered this an acceptable limitation and decided to use
mathexpr.
Since mathexpr is not a proper library with an installation procedure, I incor-
porated it into libxmas’s code tree. (By contrast, a user who wishes to compile
libxmas will have to obtain and install xmlwrapp first.)
MIDI
http://www.borg.com/~jglatt/tech/miditech.htm covers two important
parts of the MIDI Specification in great detail. The first is the MIDI messages
(e.g. Note On) that may be sent between devices or stored in .mid files. The
second is the .mid file and the meta-events that are stored in it but are not MIDI
messages per se (more on this later).
back-up service, keeping one old copy each time. Finally, I set up a cron job so
that the script would run every day at 4:00 a.m.
Chapter 3
Implementation
3.1 Overview
3.1.1 Digital Signal Processing Modules
Figure 3.1 shows an example of a piece of music as defined by XMAS.
Sample
cymbals.wav
MIDIMapping
Sample
duet.mid
GeneralMultiplexer harphigh.wav
LookupMultiplexer
Instruments
channel=10,note=57
70 . . . Sample
channel=2
note 50 . . . 69 harpmed.wav
channel=1
. . . 49
Sample
harplow.wav
VolumeEnvelope
Sample
Sustain point
flute.wav
Subject
loop="on"
Each node in the tree is a digital signal processing module or DSP module,
and holds music data but no playback state. I refer to this tree as the data tree.
In the library, the base class DSPModule abstracts all types of node.
15
16 CHAPTER 3. IMPLEMENTATION
since lowering the sampling rate reduces the processor power required. It is
possible to design filters to work with an arbitrary rate.
The mixSamples() method returns the number of sample frames generated.
Generally this will be the same as the number requested. However, many DSP
modules are designed only to generate a finite quantity of data, and when a player
has generated them all, it will use the return value to tell the parent player—or
the user of the library, who manages the root player—that it has finished.
3.3 Variables
Instruments have to be able to play at arbitrary pitches and velocities1 . Many
filters have cut-off frequencies, resonance levels and the like, and these need
to be able to be controlled by the MIDI sequence. MIDI has a plethora of
parameters that could be used for this. It would also be nice if we could make a
filter’s parameters, or perhaps the speed of a volume envelope, depend on pitch
or velocity. Finally it would be nice to have mechanisms to control the tempo
(speed) at which a MIDI sequence is played, or transpose the sequence. The list
goes on, and clearly a great deal of flexibility is desired.
XMAS uses a system of variables to achieve this flexibility. The system is
illustrated in Figure 3.2. The MIDIMappingPlayer passes a set of variables to
the constructor for each child. Three variables are shown in the diagram, but
1
Note velocities, or how fast a key was depressed; used to effect what classical musicians
know as dynamics, and often just interpreted as volume.
18 CHAPTER 3. IMPLEMENTATION
MIDIMappingPlayer
channel = 10
note = 35
channel = 2
velocity = 127
note = 72
velocity = 127 channel = 10
GeneralMultiplexerPlayer
note = 57
velocity = 127 Match: none; no child constructed
GeneralMultiplexerPlayer
Match: channel=2
GeneralMultiplexerPlayer
Match: channel=10,note=57
LookupMultiplexerPlayer
Look-up index: note=72
SamplePlayer
Playing: cymbals.wav
SamplePlayer
Playing: harphigh.wav
MIDIMappingPlayer
Variable channelVar; Variables variables;
double channel; Name: "channel" Parent variables
Value: 2.0 Value
Our variables
double note; Variable noteVar;
Value: 72.0 Name: "note"
Value
double velocity;
Value: 127.0 Variable velocityVar;
Name: "velocity"
Value
VariableComputeBlockPlayer
Variable rateVar; Variables variables;
double rate; Name: "rate" Parent variables
Value: 2^((note-60)/12) Value Our variables
Most real-time resamplers keep a pointer into the sample data and effect
interpolation by looking at samples before and after the current one. They have
to take care not to overrun, and they cannot see transparently across flow changes
such as loop points. This can create an audible click each time a sample loops,
even when the continuity across the loop points is perfect.
5 8 9 10 11 12
0
0 1 2 3 4 6 7
-5
#9 #10 #11
1. 0 0 0 1
4. -4 -5 -5 -3
Just before looping
Initial history buffer
pos = 12
pos = 0
#0 #10 #11 #12 #12
2. 0 0 1 4 5. -5 -5 -3 -3
A bit later Just looped!
pos = 1 pos = 12
#0 #1
#11 #12 #12 #11
3. 0 1 4 5
6. -5 -3 -3 -5
Later still
A bit later after looping
pos = 2 #10 #9 #8 pos = 11
7. -5 -4 -1 1
Playing backwards
pos = 7
The XMAS library uses a history buffer, which holds the last three samples
seen before the one pos points to. The concept is illustrated in Figure 3.4, which
shows the state of the history buffer at several points during playback.
Between them, the history buffer and the sample indicated by pos constitute
a run of four samples, and the current playback position is considered to be
between the second and third. A subpos variable holds the fractional part of the
position, a value indicating how far between the second and third samples we
are. When it reaches 1, it is reset to 0, pos is incremented and the history buffer
is updated.
This method provides perfect continuity in all cases, but as presented it is
hardly efficient. The library seamlessly switches to a conventional algorithm
shortly after starting and after each change of flow.
Three interpolation functions are provided. One of them is, ironically, the
non-interpolating function, which always takes the second sample verbatim. The
output is coarse and suffers from aliasing, but it can be done quickly and is
22 CHAPTER 3. IMPLEMENTATION
reminiscent of sounds from old, dearly loved computer systems such as the Com-
modore Amiga.
The second function does linear interpolation between the second and third
samples. This is a fair compromise, doing only a little more work than the first
function in exchange for considerably less aliasing.
The third function does cubic interpolation. All four samples are taken into
account. The tangent to the curve at the second sample is parallel to a line joining
the first and third samples, and a similar property holds at the third sample. This
ensures that the curve and its first derivative are continuous, providing optimum
sound quality for a function of this complexity. Appendix A derives the equations
and presents an optimisation that uses look-up tables to eliminate much of the
computation.
The library provides a global variable via which the programmer can set a
default interpolation function. The instrument designer can override this for a
specific sample by specifying a minimum and maximum quality.
Volume Volume
1 1
0 Time 0 Time
0 1 seconds 0 0.05 0.10 0.15 seconds
Behaviour
A volume envelope is a graph of volume against time. The VolumeEnvelope
module models this graph as a series of linearly connected volume-time pairs
with time increasing monotonically, and each VolumeEnvelope object has one
child, known as the subject. The VolumeEnvelopePlayer constructs one player
for the subject, and applies the envelope to the player’s output. In the case of
the right-hand envelope in Figure 3.5, the VolumeEnvelopePlayer’s output will
be silence initially, full volume at 0.05 seconds, and silence again between 0.1 and
0.15 seconds.
3.5. THE MODULES 23
A VolumeEnvelope can also manage two loops, which are each given in terms
of a starting node and an ending node. These can be the same node if it is desired
that the envelope freeze at that node (see the left-hand example). One of the
loops is the sustain loop, and is obeyed only as long as the note is held.4 The
other loop is obeyed at all times.
In Figure 3.5, the left-hand envelope fades a note in quickly, holds the note
at full volume, and then fades it out pseudo-exponentially; this is quite usual,
and is used by the envelope applied to flute.wav in Figure 3.1. The right-hand
example is a lot more unusual, and potentially rather annoying!
When a volume envelope terminates at zero volume (as happens after one
second in the left-hand example if the note is released immediately), the
VolumeEnvelopePlayer will terminate its output (recall Section 3.1.3). This
is important. The flute.wav Sample in Figure 3.1 is set to loop indefinitely, but
the VolumeEnvelope above it can terminate the output when the note has faded
out, telling the MIDIMapping that the player can be destroyed. If this did not
happen, the note would persist in memory and waste resources.
The VolumeEnvelopePlayer is influenced by a variable called rate. If rate
is 1, the output is as expected. If rate is 2, the position in the envelope will
advance twice as fast, so the first envelope would elapse in half a second for notes
released immediately. It is sometimes useful to compute rate from note or delta
using a variable compute block (Section 3.5.5).
Implementation
The parameter tweak system allows a module to request of a child an adjustment
that is constant for a while, but does not allow for gradual changes. Correspond-
ingly, the VolumeEnvelopePlayer will try to use tweaks only when the volume is
not changing (as while sustaining in the left-hand example). In this case, it can
ask the subject to mix samples into the buffer that was passed to itself. However,
if the volume is changing (or if a tweak fails), the following steps are taken:
• the subject player is asked to mix its samples into the buffer;
• the VolumeEnvelope mixes the contents of the temporary buffer into its
own output buffer, applying the gradual change in the process;
4
There is a variable to indicate when a note is held. See Section 3.6.3.
24 CHAPTER 3. IMPLEMENTATION
While this produces perfect output, it is not very efficient. I shall return to
this in the Evaluation.
3.5.3 Multiplexers
Multiplexers are used to select an instrument sound according to the program
variable, and to distinguish Channel 10, the percussion channel, from other chan-
nels by using the channel variable.5 They are also used to select a sample accord-
ing to the note variable, since the method libxmas uses to create different notes
from one sample is crude and only works well over small note ranges. Review
Figure 3.1 on page 15 for some examples of multiplexers.
A multiplexer object manages several subject modules. Each time a player
is constructed, one subject module is chosen and a single player is constructed.
All subsequent operations on the multiplexer player are deferred to the subject
player.
There are two types of multiplexer: GeneralMultiplexers and
LookupMultiplexers. They differ in how they choose a subject module.
GeneralMultiplexers scan the modules in reverse order and the first match-
ing module found is used. Each module is given with a set of variable ranges—for
instance, one subject might be given with the two ranges 50 ≤ note ≤ 63 and
0 ≤ velocity ≤ 9—and the module matches if all range variables are defined
and within the ranges. The extremes are always integers and the variables are
rounded to the nearest integer before the comparisons take place. This is a lin-
ear search and will not scale well, so a large number of subject modules is not
recommended.
LookupMultiplexers specify an index variable and manage a table of subject
modules. The index variable is rounded to the nearest integer and used as an
index into the table. In addition to the table, there is a pointer to a module
to be used for values below the table’s lower bound, and another for values
above the table’s upper bound. LookupMultiplexers are more limited than
GeneralMultiplexers, but the look-up is a constant-time operation. They are
perfect for selecting an instrument using the program variable.
5
The example in Figure 3.1, page 15, chooses instruments according to channel instead
of program. This was done so the choice could be combined with the step of identifying the
percussion channel, but it is not recommended in real applications. Appendix B.5 shows the
more usual approach.
3.6. THE MIDI PLAYBACK ALGORITHM 25
Both types of multiplexer can define one or more variables for use in making
the decision. These are computed for the selection process only, and are not
passed down to the child constructor.
last. In a conventional MIDI set-up, a sequencer does the timing and sends the
MIDI events to the synthesisers while interpreting the meta-events itself.
.mid files adopt the classical concept of beats and subdivide them into delta-
time ticks. The number of ticks per beat can be specified in the file. By default,
there are 120 beats per minute, but a meta-event can override this, specifying
the tempo as a number of microseconds per beat (though the value presented to
a user is usually in beats per minute).
Tracks are a logical subdivision of music. It is up to the author of a .mid
file to decide what to put in each track. The sequencer will process all tracks
simultaneously and dispatch events to the synthesisers, but information about
which track an event came from is lost. Most events specify a MIDI channel (see
Section 1.1.1). There is often a correspondence between tracks and channels, but
they are distinct concepts not to be confused. The tracks exist in the sequencer,
and the channels are distinguished by the synthesisers.
XMAS’s MIDIMappingPlayer behaves like a sequencer connected to a multi-
part synthesiser. Rather than using hardware timing, it does timing by emulating
the synthesiser for precise amounts of time and changing state in between runs
of emulation. In more concrete terms, it effects an elapsed time by requesting an
appropriate number of samples from the synthesiser. There is no asynchronous
behaviour, and the process is deterministic.
The algorithm described herein is simplified for conciseness, though some of
the extra complexity is alluded to.
3.6.2 State
For each track, the MIDIMappingPlayer maintains three values:
• a position counter for the track, which points to the event bytes (after the
delta-time) for the next event to be processed or holds the value -1 for
tracks that have finished playing;
• the number of delta-time ticks to wait before the next event should be
processed (the wait value);
For each channel, the MIDIMappingPlayer stores a list of all the notes cur-
rently playing. A note consists of a pointer to a DSPModulePlayer along with
some pertinent variables (more on this later). Some variables that are global to
the channel are also stored. These include
3.6. THE MIDI PLAYBACK ALGORITHM 27
• the pitch wheel position, used on many devices to bend all notes up or down
in pitch;
• the channel aftertouch, a measure of the pressure being applied to the keys
on an electronic keyboard, averaged over all depressed keys;
Most of these variables are made available to the instruments, but a few are
processed in the MIDIMappingPlayer itself. In particular, the channel volume is
applied to all notes using volume parameter tweaks, and the MIDIMappingPlayer
takes it upon itself to calculate the final frequency for each note, incorporating
pitch bend and other factors into the computation.
As stated in Section 3.5.4, a single DSPModule is used for all the notes. It is
likely that the DSPModule will include a LookupMultiplexer switching on the
program variable, but it may choose to use the program variable for something
else, or not to use it.
Finally, the MIDIMappingPlayer also stores some global state, such as the
tempo, the number of times the music has left to loop, and a measure of how
much output to generate before the tracks’ wait values will be correct. This
last measure is henceforth referred to as the global wait value, and is given in
extremely fine units of 232 per second.
3.6.3 Notes
As stated, a note consists of a DSPModulePlayer and some pertinent variables.
Some of the variables are note, velocity, and held. The held variable is 1
initially and goes to 0 when the Note Off event is encountered.
Each instrument should be designed to respond to the held variable in
an appropriate manner. The MIDIMappingPlayer never cuts notes off, so
DSPModulePlayers should terminate themselves to avoid a build-up of old notes.
At present, VolumeEnvelope is the only module that responds to held. An
instrument could incorporate a VolumeEnvelope designed to take the volume
down to 0 after the note is released, or it might consist of a sample configured to
play once without looping.
6
This is distinct from the MIDI controllers mentioned in Section 1.1. This kind of MIDI
controller is simply a playback control parameter that can be set by a MIDI event.
28 CHAPTER 3. IMPLEMENTATION
3.6.4 Algorithm
The playback algorithm is essentially a form of discrete event simulation.
When the MIDIMappingPlayer is constructed, all the variables are initialised
and the track pointers are set up. The tracks’ wait values are set to 0, and
then the initial delta-time for each track is processed. Processing a delta-time
involves adding the delta-time to the track’s wait value and then advancing the
track pointer to the following event bytes. Finally, the processMIDI() method
is called.
For each track whose wait value is 0, processMIDI() processes MIDI events
until it finds a nonzero delta-time tick. Then it determines how long to wait
before another MIDI event will be due on any track, subtracts that amount of
time from all tracks’ wait values, and adds it to the global wait value, scaling
as necessary and factoring in the current tempo.
Each time the MIDIMappingPlayer’s mixSamples() method is called, the
following steps are undertaken. (It may be helpful to refer back to the description
of mixSamples() in Section 3.1.3, page 16.)
1. First, we use the global wait value and the sampling rate to determine how
many samples to generate. If this number is greater than the count passed
to mixSamples(), we reduce it accordingly.
2. Each note (on each channel) is asked to generate that many samples.
3. The global wait value is reduced in accordance with the number of samples
generated. If it reaches zero or goes negative, we call processMIDI() until
it goes positive again. (It would always go positive straight away unless
there were many delta-time ticks to a sample, which is very unlikely, but
the while loop does no harm.)
4. If we have not yet generated all the samples that were requested by the
caller, we advance the buffer pointer and return to Step 1.
• How many times to play the music. If this is 0, the music will loop indefi-
nitely.
3.6. THE MIDI PLAYBACK ALGORITHM 29
• Where to loop back to. This can be used to avoid playing an introduction
after the first time.
• A flag indicating whether to wait for a whole beat to elapse before looping.
Some .mid files end as soon as the last Note Off event is seen, which may
be a little too early to loop. Looping on a beat is most likely to sound
correct.
stack (default). Each Note Off will stop the most recently started note that
was started before the current delta-time tick. If no such notes exist, it
will stop the last note from the current tick. This is useful when one track
starts a note at the same time another track stops it, but the former track
is processed first.
strictstack. Each Note Off will stop the most recently started note, including
any started on this delta-time tick.
queue. Each Note Off will stop the note that was started earliest.
preempt. A second Note On will stop the first note (but allow it to fade out).
stopall. Notes can accumulate, but each Note Off will stop all notes.
Portamento
The MIDI Specification provides for a feature called portamento, but I have
found that neither my Creative Labs Sound Blaster Live! card nor the Yamaha
Portatone PSR-550 electronic keyboard obeys the relevant MIDI controller values.
30 CHAPTER 3. IMPLEMENTATION
Evaluation
• “It must be possible for the composer to keep file sizes down to a minimum.”
No compressed audio file formats are supported, so this goal was not met.
However, support for compressed audio could be added with no major re-
designing, and as explained in Section 2.2.2, doing it properly would have
taken more time than was available.
31
32 CHAPTER 4. EVALUATION
been met. Sections 4.1 and 4.6.2 discuss this further. See Section 4.5 for
some measurements.
While not all goals have been met at this stage, the project would not need
any major redesigning to meet any of them. I consider the project a success.
manage some effect trees whose job would be to filter one or more whole channels
or tracks.
It is still possible to filter whole channels. Figure 4.1 applies a filter to Chan-
nels 4, 5 and 6. This is rather involved, and the filter’s parameters must be
controlled in control.mid. Sometimes it would be preferable for cool.mid to
control them, especially when filtering single channels. A composer might well
reject this idea in favour of filtering every note individually, clearly a waste of
processor time.
MIDIMapping
MIDIMapping
cool.mid
control.mid
LookupMultiplexer
LookupMultiplexer Instruments
Instruments
1
1
channel channel 2
2
Filter
3
Subject
MIDIMapping
cool.mid
LookupMultiplexer
Instruments
4
channel 5
6
I believe the best way to fix this would be to split the MIDI mapping so
that ‘channel player’ or ‘track player’ modules could appear as descendants with
effects in between as desired. Since no such effects were actually implemented as
part of the core work, it seemed appropriate to leave splitting the MIDI mapping
as an extension.
4.3 Milestones
I did not anticipate the amount of time it would take to do the second work
package, consisting of the volume envelope and other components that can be
34 CHAPTER 4. EVALUATION
used to define an instrument. A large part of this work was the system of variables
discussed in Section 3.3. However, the subsequent three weeks’ work collapsed to
a few days, as most of the functionality the .xmm format was going to implement
already existed.
In summary, the structure of the project changed to such an extent that the
milestones were no longer a good subdivision of the work to be done. Nevertheless,
they did their job of providing short-term goals and keeping the project moving.
4.4 Testing
4.4.1 General
Most testing was performed aurally. To aid this, files were set up to check that
newly added features were working properly. Test programs designed to call the
mixSamples() method for varying numbers of samples at a time were written.
The one part of the project that required more than aural testing was the
resampler.
4.5 Profiling
Profiling was done using gprof, after the code was compiled and linked with g++’s
-pg switch and run on my AthlonXP 1800+ running at 1145 MHz, a typical
modern configuration. The jou5cred.xmm file featured on the Demo CD was
played, and the audio output was piped into ALSA’s aplay command. Figure 4.4
shows the results for the three different interpolation modes.
The resampler uses just over half the processor time. Considerable proportions
go towards the volume ramping code in the VolumeEnvelopePlayer, discussed in
4.5. PROFILING 35
Section 4.6.2, and the code in main that converts to 16-bit integers and outputs
them, which is not a concern since it is merely part of the test program and
has not been optimised. Additionally, a noteworthy amount of time is spent
processing parameter tweaks; this would deserve investigation given more time.
Surprisingly, the choice of interpolation function does not make much differ-
ence to the amount of processor time used by the resampler. (The ‘self seconds’
column is the most appropriate measurement for this comparison.) I suspect the
code generated by the compiler is sub-optimal, and the overhead per sample is
greater than the cost of the interpolation function.
Despite the above concerns, when compiled without the profiling overhead,
the test program used 32.640 seconds of processor time to play jou5cred.xmm
through aplay with cubic interpolation, as reported by Linux’s time command.
The real time reported was 4 minutes and 20.481 seconds. This equates to an
average of 12.5% CPU usage, which is comfortable.
4.6 Comments
4.6.1 Samples
Refer back to Figure 3.4 and observe how the history buffer begins filled with
zeros. This ensures that the curve makes a smooth departure from the centre
line as a sample starts.
Unfortunately, the end of playback is another matter. If the sample in Fig-
ure 3.4 were set not to loop, and instead ended where the loop end is marked,
then the output from the SamplePlayer would terminate after state 4. Ideally,
the contents of the history buffer should be allowed to phase out and be replaced
by zeros before the output terminates.
Luckily, this is rarely a problem. Most samples are set to loop and are faded
out by an envelope. Those samples that are not set to loop will usually include
their own fade-out, however brief, so the output that is not generated would be
very close to silence anyway.
However, the MIDI protocol itself cannot vary a parameter smoothly over
time. If a channel is faded in or out, the fade will have to be done in steps.
A better implementation would use steps and have generators endowed with
the ability to remove clicks themselves. The SamplePlayer could do this by
including the volume ramping functionality in the resampling loop, where it would
cost considerably less.
4.6.4 Flexibility
I am exceptionally pleased with the flexibility XMAS offers. As I was preparing
music, I felt that some of the instruments were too loud on the high notes and
too quiet on the low notes. No problem; XMAS allowed me to compensate by
adjusting the velocity variable. I wanted one instrument to decay more slowly
for low notes. No problem; the volume envelope will respond if I set the rate
variable. This is leagues ahead of what an Amiga mod-based file or a SoundFont
(an instrument definition for the MIDI player on a Creative Labs Sound Blaster)
can do.
ory and has to move the objects to a new location. If pointers to the objects exist
anywhere, those pointers will become invalidated. The solution was to construct
containers only of pointers to objects, so the pointers would be moved and the
objects would not.
This hitch cost me a couple of days. It did not throw the project off track.
Conclusions
I am extremely pleased with the outcome of this project. While a few problems
are outlined in the Evaluation, they are minor and it is easy to forget how much
of the project went well. I have learnt a lot from the project, particularly in terms
of instrument design and C++ experience, and I shall certainly use XMAS for
games I write in the future.
A Demo CD is enclosed. It includes some aural test results and two complete
pieces of music. A track listing is given in Appendix C.
After doing a little more work on XMAS, I intend to release the library as
an open source project at http://xmas.sf.net/. Please visit this site if you are
interested in XMAS.
41
42 CHAPTER 5. CONCLUSIONS
Bibliography
[1] The MIDI Manufacturers Association. The complete MIDI 1.0 detailed spec-
ification. http://www.midi.org/about-midi/specinfo.shtml, 1996.
[4] Markus F. X. J. Oberhumer and László Molnár. UPX, the Ultimate Packer
for eXecutables. http://upx.sf.net/, 1996–2002.
43
44 BIBLIOGRAPHY
Appendix A
d = x1 (A.3)
a + b + c + d = x2 (A.4)
At t = 0, we desire the curve’s gradient to be parallel to a line joining samples 0
and 2, so dx
dt
= 21 (x2 − x0 ). Likewise, the gradient at t = 1 should be parallel to a
line joining samples 1 and 3, so dx dt
= 12 (x3 − x1 ). Substituting into Equation A.2
gives the following.
1
2
(x2 − x0 ) = c (A.5)
1
2
(x3 − x1 ) = 3a + 2b + c (A.6)
Equations A.3, A.4, A.5 and A.6 can be solved simultaneously, giving the follow-
ing matrix equation:
a −1 3 −3 1 x0
b
=
1
2 −5 4 −1 x1
(A.7)
c 2 −1 0 1 0
x2
d 0 2 0 0 x3
45
46 APPENDIX A. THE CUBIC INTERPOLATION FUNCTION
T3 (t) = t3 − t2 x3
Furthermore, observe the following results:
B.1 volenv.xml
This is the test for the VolumeEnvelope module. A <volenv> element contains
a subject and a list of nodes. The first node is assumed to be at time zero.
This example plays a sample of a harpsichord, applying an envelope (the inner
one) that begins at full volume, fades to silence, immediately fades to five times
the full volume, and then fades down to twice the full volume before sticking there
(end of envelope). The output does not terminate since the final node is nonzero.
A second envelope, much like the right-hand example pictured in Figure 3.5, is
applied to the result.
The output can be heard on the enclosed Demo CD.
<?xml version=’1.0’?>
<volenv>
<subject>
<volenv>
<subject>
<sample filename="harpsi.wav" />
</subject>
<node value="1" />
<node time="0.15" value="0" />
<node time="0.4" value="5" />
<node time="0.7" value="2" />
</volenv>
</subject>
<node value="1" loopstart="" />
<node time="0.025" value="0" />
<node time="0.035" value="0" />
<node time="0.06" value="1" loopend="" />
</volenv>
47
48 APPENDIX B. SOME EXAMPLE XML
B.2 compute.xml
This is the test for the VariableComputeBlock module. The same harpsichord
sample is used, but this time the note it was recorded at is specified. In the test,
notes with numbers ranging from 48 to 72 are generated in quick succession. For
the musicians, this constitutes a chromatic scale covering the octaves below and
above middle C (60).
The compute block assigns a value to velocity that starts at (72−48)∗5+7 =
127 and decreases linearly to (72 − 72) ∗ 5 + 7 = 7. The result, a scale that starts
loud and fades out, can be heard on the Demo CD.
<?xml version=’1.0’?>
<compute>
<variable name="velocity" value="(72-note)*5+7" />
<subject>
<sample filename="harpsi.wav" note="A4" />
</subject>
</compute>
B.3 clarinet.xmi
This is an instrument definition for a clarinet. Three samples are used for different
note ranges, and a GeneralMultiplexer (<multiplexer> chooses between them.
The note at which each sample was recorded at is given. The samples are not
quite at the right pitch; the frequency is overridden to correct this. The samples
are set to loop, and the loop start point is given. The loop end point defaults to
the end of the sample.
clarinetl.wav is enclosed in a volume envelope with a constant amplification
of 70%. It sounded too loud against the other samples, so I added the envelope
to compensate.
Around the multiplexer, there is a VariableComputeBlock. Its purpose is
to reduce the note velocity for high notes and increase it for low notes. This
was judged necessary aurally, but a scientific explanation would be that higher
frequency waves transmit greater power. The use of the velocity variable is a
hack; we want to adjust the volume, and SamplePlayers simply interpret the
note velocity as a variable.
The outermost volume envelope simply applies a rapid, pseudo-exponential
fade-out when the note is stopped.
<?xml version="1.0"?>
<volenv>
<subject>
<compute>
B.4. PIZZ.XMI 49
B.4 pizz.xmi
pizz.xmi defines string instruments (the violin family) played pizzicato, where
the performer plucks the strings instead of drawing a bow across them. The
definition includes another example of a multiplexer, and a volume envelope.
Note that this envelope has no sustain point; the pseudo-exponential fade-out
happens immediately.
Here, a VariableComputeBlock sets the rate variable, which the envelope
obeys. The result is a long decay for low notes and a short decay for high notes.
Once again, the output is on the Demo CD, this time generated by the MIDI
player using a simple scale.mid file that covers the entire range of a piano
(88 notes). This is a greater range than the instruments represented can manage!
<?xml version="1.0"?>
<compute>
<variable name="rate" value="2^((note-60)/12)" />
<subject>
<volenv>
<subject>
<multiplexer>
<generator>
<subject><sample filename="pizzl.wav" note="E4"
frequency="11000" loop="on" loopstart="8403" /></subject>
</generator>
50 APPENDIX B. SOME EXAMPLE XML
<generator>
<range variable="note" low="58" />
<subject><sample filename="pizzh.wav" note="E5"
frequency="11000" loop="on" loopstart="6372" /></subject>
</generator>
</multiplexer>
</subject>
<node value="2" />
<node time="0.10" value="1" />
<node time="0.25" value="0.4" />
<node time="0.45" value="0.18" />
<node time="0.70" value="0.10" />
<node time="1.00" value="0.06" />
<node time="1.35" value="0.03" />
<node time="2.00" value="0" />
</volenv>
</subject>
</compute>
B.5 general.xmi
This defines a whole set of instruments, referring to separate files for the individ-
ual definitions. It also accounts for percussion as defined by General MIDI (see
Section 1.1.1).
There are two LookupMultiplexers. The outer one switches on the channel
variable: all notes on Channel 10 are rendered using the percussion.xmi def-
inition, which chooses samples according to the note variable. For all other
channels, the inner multiplexer uses the program variable to select an instrument
definition.
<?xml version="1.0"?>
<lookup variable="channel">
<generator>
<range />
<subject>
<lookup variable="program">
<generator> <range /> <subject><external filename="piano.xmi" /></subject> </generator>
<generator> <range value="13" /> <subject><external filename="xylophon.xmi" /></subject> </generator>
<generator> <range value="27" /> <subject><external filename="bass.xmi" /></subject> </generator>
<generator> <range low="40" high="55" /> <subject><external filename="strings.xmi" /></subject> </generator>
<generator> <range value="45" /> <subject><external filename="pizz.xmi" /></subject> </generator>
<generator> <range value="46" /> <subject><external filename="harp.xmi" /></subject> </generator>
<generator> <range value="47" /> <subject><external filename="timpani.xmi" /></subject> </generator>
<generator> <range value="56" /> <subject><external filename="trumpet.xmi" /></subject> </generator>
<generator> <range value="57" /> <subject><external filename="trombone.xmi" /></subject> </generator>
<generator> <range value="60" /> <subject><external filename="horn.xmi" /></subject> </generator>
<generator> <range low="68" high="69" /> <subject><external filename="oboe.xmi" /></subject> </generator>
<generator> <range value="70" /> <subject><external filename="bassoon.xmi" /></subject> </generator>
<generator> <range value="71" /> <subject><external filename="clarinet.xmi" /></subject> </generator>
<generator> <range value="73" /> <subject><external filename="flute.xmi" /></subject> </generator>
</lookup>
</subject>
</generator>
<generator>
<range value="10" />
<subject><external filename="percussion.xmi" /></subject>
</generator>
</lookup>
<?xml version="1.0"?>
B.6. EXAMPLE MUSIC 51
<midimapping midifilename="jou5cred.mid">
<external filename="general.xmi" />
</midimapping>
<?xml version=’1.0’?>
<volenv>
<subject>
<midimapping midifilename="rockspin-piece10.mid" loop="on">
<external filename="general.xmi" />
</midimapping>
</subject>
<node value="1" />
<node time="170" value="1" />
<node time="180" value="0" />
</volenv>
All tracks were generated using cubic interpolation in the resampler unless oth-
erwise stated.
1. The output from the sample player test, playing harpsi.wav. The sample
was recorded at 22050 Hz and the output is at 44100 Hz, so resampling is
taking place.
2. The output from the volume envelope test described in Section B.1.
3. The result of the variable compute block test presented in Section B.2.
4. This track first shows the outcome of using a single piano sample for the
whole range of the instrument, illustrating the need for multiple samples.
Next, recordings of twelve notes spanning the entire range of the instrument
are all adjusted to Middle C and played in sequence, showing how different
they are.
5. This track contains a scale covering every note on the piano. The astute
listener will hear each change of sample, confirming that the multiplexer is
at work. It is hoped that a casual listener can ignore the changes, especially
in real music where they are usually less noticeable.
6. The same scale is played using the strings pizzicato definition from Sec-
tion B.4. Note how the decay rate varies with the pitch. Some aliasing can
be heard on the high notes, but such high notes are rare.
7. The scale from the last track is played again with linear interpolation. Some
unwanted high frequencies can be heard on some notes, but it is subtle.
52
53
10. rockspin-piece10.xmm, the second example. The music comes from the
final three levels of my game Rock ‘n’ Spin [2]. It loops, and Section B.6
shows how even the fade-out was able to be done by libxmas.
Project Proposal
Originator: B. N. Davis
22 October 2003
54
55
Introduction
The MIDI protocol is very useful in the production of music. Devices may use it
to communicate performance events (such as when a note is pressed or released)
between each other. It is an industry standard with widespread software and
hardware support. Unfortunately, it has been misused.
Most software-based music editors can dump MIDI data to standard .mid
files. These files store the aforementioned performance events, but not much else.
MIDI module manufacturers have collaborated to implement a scheme called
General MIDI, which specifies a standard instrument mapping (so a piano will
be a piano everywhere), but synthesisers still vary wildly and a piece that sounds
great on one device is likely to sound unbalanced on another (for example the
string section may be too loud). This poses a problem for their distribution.
The Amiga gave birth to ‘music modules’, which are files capable of storing
samples in addition to the sequence data. The PC has expanded them beyond
the Amiga’s limitations, and there are now several editors (‘trackers’) and players
of varying quality. While not properly standardised, modules can be trusted to
sound correct on any system if you are careful which software you use. However,
they are limited, and the trackers are not very user-friendly.
Nowadays, music can be distributed using lossy compression. This is sat-
isfactory in many situations, but not all; dial-up Internet users have to wait a
long time to download them, which is especially a problem if for example a game
developer wants to offer a product for download and include one music track for
each level. There are also people who can hear the degradation that results from
the lossy compression.
This project will produce a solution that has the advantages of both MIDI
and Amiga modules without necessarily the large size or quality loss of general-
purpose streamed audio. Lossy compression may be used if small files are re-
quired, otherwise lossless compression may be used if sound quality is paramount.
That said, forms of compression will be considered extensions to this project, and
I will not mention them again until the ‘Extensions’ section.
Description
A musician may produce one or more sequence files (.mid for the purposes of this
project) using any existing software and hardware, and produce or obtain a set
of samples (.wav for this project). Instruments may be specified in .xmi (XML
Instrument) files; these are a layer above samples and may for example specify
volume envelopes and different samples for different note ranges. Then a .xmm
56 APPENDIX D. PROJECT PROPOSAL
(XML Music Mapping) file ties the samples, instruments and sequences together.
These two XML formats will be specified by this project. They will both allow
author information and other human-readable notes to be embedded.
DSP trees are used at various points. These are trees of DSP modules, which
are filters capable of generating, modifying or combining PCM data. Modules
may have parameters to control them, and a tree will contain expressions for
evaluating the parameters; a simple expression parser will be used here, and
MIDI’s continuous controls will be accessible as variables. Volume envelopes and
the sample and sequence players will be implemented as DSP modules.
The sample player will support stereo and offer three different interpolation
options: none, linear and cubic. The user chooses a preferred algorithm, but
instruments may override this.
Where a tree is used to modify sound, it may have a ‘missing leaf’, at which
the input will be generated. When it is used to generate sound, it may not. A
tree may never have multiple missing leaves.
An instrument file specifies a generator DSP tree for each note of the scale.
It may also specify modifier DSP trees to apply to note ranges and to all notes.
A typical instrument will use a volume envelope at the very least.
In a mapping file, one sequence is designated the root. This is the one that
will be played. Each MIDI instrument is assigned an XML instrument, or another
mapping to use as a sub-sequence; either of these may be a separate file or a nested
XML block. Sub-mappings will inherit their parents’ instrument mappings, but
these may be overridden.
A mapping file may assign a modifier DSP tree to each track or each MIDI
channel (but not both since these are two different ways of subdividing the same
set), and to the whole.
Since samples, instruments, sequences and mappings may be reused, they will
be loaded once and reference-counted.
It is worth reiterating that all the components that have been described will
be treated as DSP modules.
There will be a simple command-line playback tool that writes raw PCM data
to stdout; this can be piped into ALSA’s aplay command.
I will use C++ for this project.
Extensions
Perhaps the two most important extensions are support for other sample formats,
particularly those using lossy compression such as Ogg Vorbis, and support for
57
a ready-to-play archive format to make the music easier to distribute. The .xma
(XMM Archive) format will allow for this; note that it is not XML since it needs to
store binary data compactly. It will be able to store any combination of samples,
sequences and the two XML formats; typically it will be used for a whole piece
of music, a shared sample database and files that refer to this database, or a
whole album. There will be support for author information and human-readable
notes, typically to be used to describe the collection as a whole since there are
already human-readable notes for individual pieces and instruments. Lossless
compression will be used, ideally with algorithms optimised for the various types
of data.
The name of the project comes from the ultimate ideal of being able to dis-
tribute music as .xma files (a nice take on .wma files). ‘XMAS’ is short for ‘XMA
System’.
Other possible extensions include extra DSP modules (such as filters, dis-
tortion and echo), click removal for when samples start, stop and loop (not for
clicks in the actual sample data), support for surround sound, a GUI for editing
and testing instrument and mapping files, a stand-alone player, and XMMS and
Winamp plug-ins.
Finally, the product could be developed into a complete music authoring
environment, with facilities for recording and editing sample and MIDI data in
the GUI, but this could become a whole project in itself.
Note that I have already written plenty of pieces of music that can be exported
to .mid, so I will work with these. Creating example music will not use up a
disproportionate amount of time.
Each of these work packages takes the project to a new level of complexity.
First it will be able to play .wav files. Then it will support instruments, then,
two work packages later, whole pieces of music. I will be able to perform tests at
the end of each work package and thus fix most of the bugs as I go along.
Success Criteria
By the end of the project I will have a piece of music in .xmm (or .xma) format that
takes advantage of multi-sample instruments and volume envelopes. The software
will be able to play this music reliably and accurately. The specifications for the
.xmm and .xmi file formats will indicate what constitutes accurate playback.
If the above paragraph is true, the project will be considered a success.
At the time of the progress report, I expect to have a program that plays a
simple hard-coded tune using a .xmi file loaded from disk, in addition to textual
output to verify the integrity of loaded .xmm and .mid files.
Difficulties to Overcome
The following main tasks will have to be undertaken before the project can be
started:
• Learn about XML, select a library for XML parsing, and familiarise myself
with the library. The library must be able to read XML from an arbitrary
stream.
• Secure documentation on the .wav and .mid formats, and on the MIDI
protocol itself.
Starting Point
I have worked with MIDI before, and am reasonably familiar with its main fea-
tures. I have written a player for Amiga module-based formats, which incorpo-
rates a sample player with cubic interpolation; its code will serve as a reference.
The player, including source code, is available at http://dumb.sf.net/.
59
Resources
All development work will be carried out on a Linux PC equipped with a standard
PCM sound interface. I will be using my machine primarily, but if it breaks down,
I will bring headphones and use the machines in the William Gates Building.
I will be using CVS to manage my source code, and the repository will be
archived and uploaded nightly to Pelican (one old copy will be kept each time).
Work Plan
All dates listed here are Fridays.