Professional Documents
Culture Documents
Ual Musicians and Machine Learning - Nicholas Collins
Ual Musicians and Machine Learning - Nicholas Collins
Keywords: machine learning, musical artificial intelligence, interactive music systems, machine musician
IN an age of robotics and artificial intelligence, the music stars of tomorrow may not be
human. We already see precedents for this in anime virtual pop stars from Japan like the
Vocaloid icon Hatsune Miku, or cartoon bands from Alvin and the Chipmunks to the Goril
laz. These are all audiovisual fronts for human musicians, however, and a deeper involve
ment of artificial musical intelligence in such projects is projected. Could our concert
halls, clubs, bars, and homes all play host to virtual musicians, working touring circuits
independent of any human manager? The applications of such radical music technology
extend from new art music concert works to mass music entertainment in games and edu
cation.
There is already a long and fascinating history to machine interaction in concert perfor
mance, from such 1960s and 1970s precedents as the analog machine listening pieces of
Sonic Arts Union composers Gordon Mumma and David Behrman (Chadabe 1997) to the
computerized online structure formation of OMax (Assayag et al. 2006), from George
Lewis’ many decades development of the computer improvisational system Voyager
(Lewis 1999) to advances in musical robotics (Kapur 2005). Lessons from the creation of
virtual musicians have an essential role to play in our understanding of interactive music
settings in general, for such systems test the limits of engineering research and composi
tional ingenuity.
Page 1 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
In order to work within human musical society, the machines need to be wise to human
musical preferences, from the latest musical stylistic twists across human cultures, to
more deep-rooted attributes of human auditory physiology. Creating truly adaptable virtu
al musicians is a grand challenge, essentially equivalent to the full artificial-intelligence
problem, requiring enhanced modeling of social interaction and other worldly knowledge
as much as specific musical learning (we will not attempt all of that in this chapter!). The
payoff may be the creation of new generations of musically competent machines, equal
participants in human musical discourse, wonderful partners in music making, and of re
doubtable impact on music education and mass (p. 351) enjoyment. One vision of the fu
ture of musical interaction may be that of a “musical familiar” that adapts with a musi
cian from childhood lessons into adult performance, developing as they grow.
Although such portrayals can be a great motivator of the overall research, we can also
drift into more unrealistic dreams; the projects of virtual musicianship are bound up inex
tricably with the future of artificial intelligence (AI) research. Previously (Collins 2011a),
I let speculation go unhindered. Herein, I shall keep things more tightly connected to the
current state of the art and outline the challenges to come from technical and musical
perspectives.
Key to the creation of enhanced autonomy in musical intelligences for live music is the in
corporation of facilities of learning. We know that expert humans musicians go through
many years of intensive training (ten years or 10,000 hours is one estimate of the time
commitments already made in their lives by expert conservatoire students, see Ericsson
and Lehmann 1996; Deliège and Sloboda 1996). A similar commitment to longer-term de
velopment can underwrite powerful new interactive systems. To go beyond over-fitting a
single concert, and to move toward a longer lifetime for musical AIs, rests in practice up
on incorporating machine-learning techniques as a matter of course for such systems.
There is an interesting parallel with tendencies in gaming toward larger game worlds, en
hanced game character AI, and the necessity of being able to save and load state between
gaming sessions. Interactive music systems need larger stylistic bases, enhanced AI, and
longer-term existence. Where the current generation of musical rhythm games center
ground motor skills over expressive creation, more flexible interaction systems may pro
vide a future crossover of academic computer music to mass consumption.
We shall proceed by reviewing the various ways in which machine learning has been in
troduced in computer music, and especially to the situation of virtual musicians for live
performance. We treat machine learning here above parallel engineering challenges in
machine listening (the hearing and music-discerning capabilities of machines). For re
views of machine listening, the reader is pointed to Rowe (2001) and Collins (2007,
2011b).
Page 2 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
music. Whether denoted as time-series analysis (in the mold of statistics) or signal pro
cessing (in engineering), musical data forms streams of time-varying data. With respect
to the time base, we tend to see a progression in preprocessing from evenly sampled sig
nals to discretized events; AI’s signal-to-symbol problem (Matarić 2007, 73) recognizes
the difficulty of moving in perception from more continuous flows of input to detected
events. Though signals and sequences may be clocked at an even rate, events occur non
isochronously in general. Where the timing is implicit in the signal case, events may be
tagged with specific time stamps. In both situations, a window of the last N events can be
examined to go beyond the immediate present, acknowledging the wider size of the per
ceptual present and the role of different memory mechanisms. For evenly sampled sig
nals, the window size in time is a simple function of the number of past samples to in
volve; for discrete events, the number of events taken may be a function of the window
size’s duration (based on what fits in) or the window size in time may be a function of the
number of events examined (in the latter case there would typically be a guarantee on the
average number of events sampled per second, to avoid creating nonsensically massive
windows, or checks in the code to avoid any aberrant scenario). Having gathered a win
dow of data, in some applications the exact time ordering is then dropped (the “bag of
features” approach, where the order of things in the bag is jumbled; see Casey et al.
2008) and in others it remains a critical consideration of the algorithm; some procedures
may also concern themselves only with further derived properties of a window of data,
such as statistical features across all the events.
Achieving some sort of representation which is musically relevant and yet compatible
with an on-the-shelf machine-learning algorithm, a process of learning can take place
over multiple examples of data following that representation. We should distinguish two
sorts of learning tasks here. In supervised learning, the inputs always have directly asso
ciated outputs, and the mapping that is learnt must respect this function space, while
generalizing to cope robustly with new input situations unseen in training. In unsuper
vised learning, the learning algorithm attempts to impose some order on the data, finding
structure for itself from what was otherwise previously implicit. Learning algorithms can
require a large amount of example data to train, and musical situations can sometimes
not supply many examples on a given day. It will not always be practical to train on-the-fly
in the moment of performance, instead it may require preparation steps; many machine-
learning algorithms deployed in concert are not conducting the learning stage itself live,
Page 3 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
but were trained beforehand, and are now just being deployed. This mirrors the way hu
man beings develop over a long haul of practice, rather than always being blank slates in
the moment of need.
We cannot review all machine-learning theory and algorithms in this chapter. Good gener
al reviews of machine learning as a discipline include textbooks by Mitchell (1997) and Al
paydin (2010), and the data mining book accompanying the open-source Weka software,
by Witten and Frank (2005). Stanford professor Andrew Ng has also created an open ma
chine-learning course available online, including video lectures and exercises (http://
www.ml-class.org/course/video/preview_list). We will mention many (p. 353) kinds of ma
chine-learning algorithm in the following sections without the space to treat formally
their properties.
We also won’t be able to review every musical application of every type of machine-learn
ing algorithm herein, but will hopefully inspire the reader to pursue further examples
through the references and further searches. As a rule of thumb, if an interesting learn
ing technique arises, someone will attempt to apply it in computer music. Applications of
ten follow trends in general engineering and computer science, for example, the boom in
connectionist methods like neural nets in the 1990s, genetic algorithms over the same pe
riod, or the growth of data mining and Bayesian statistical approaches in to the 2000s.
• Learning from a corpus of musical examples, to train a composing mechanism for the
generation of new musical materials.
• Learning from examples of musical pieces across a set of particular genres, to classi
fy new examples within those genres.
• Creating a mapping from high-dimensional input sensor data to a few musical con
trol parameters or states, allowing an engaging control space for a new digital musical
instrument.
Although only the last is explicitly cast as for live music, all three could be applicable in a
concert context; stylistically appropriate generative mechanisms are an essential part of a
live musician’s toolbox, and a live system might need to recognize the stylistic basis of the
music being played before it dares to jump in to contribute! We review some associated
projects around these three themes, knowing that the survey cannot be exhaustive.
Machine learning is intimately coupled to modeling of musical data, and many predictive
and generative models of music that rest on initialization over a corpus of data have ap
peared in decades of research on algorithmic composition and computational musicology.
The venerable Markov model, first posited by John Pierce in 1950 as applicable in music
(Pierce 1968), is the premier example. Markov systems model the current state as depen
dent on previous states, with an “order” of the number of previous states taken into con
Page 4 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
sideration (Ames 1989). To create the model, data sequences are analyzed for their tran
sitions, and probability distributions created from counts of the transitions observed; the
model is then usable for generation of novel material (new sequences) in keeping with
those distributions. The popularity of Markov models and that of information theoretic
variants has continued in literature on symbolic music modeling and pattern analysis in
music (Conklin and Witten 1995; Wiggins, Pearce, and Müllensiefen 2009; Thornton 2011),
as well as underlying (p. 354) the well-known (if not always clearly defined) work in auto
mated composition of David Cope (2001). One famous interactive music system, the Con
tinuator of François Pachet (Pachet 2003) is based on a variable-order Markov model. In
its typical call-and-response mode of performance, the Continuator can build up its model
on-the-fly, using human inputs to derive an internal tree of musical structure in what Pa
chet calls “reflexive” music making, because it borrows so closely from the human inter
locutant. Begleiter, El-Yaniv, and Yona (2004) compare various variable-order Markov
models, assessing them on text, music MIDI files, and bioinformatic data. Prediction by
partial match is one such algorithm that has proved successful (the second best after the
rather more difficult to implement context tree weighting in Begleiter, El-Yaniv, and
Yona’s study), and it has been extended to musical settings (Pearce and Wiggins 2004;
see also Foster, Klapuri, and Plumbley 2011 for an application to audio feature vector pre
diction comparing various algorithms) (see also Chapters 22 and 25 in this volume).
The Begleiter, El-Yaniv, and Yona (2003) paper notes that any of the predictive algorithms
from the literature on data compression can be adapted to sequence prediction. Further,
any algorithms developed for analysis of strings in computer science can be readily ap
plied to musical strings (whether of notes or of feature values). The Factor Oracle is one
such mechanism, an automaton for finding common substring paths through an input
string, as applied in the OMax interactive music system at IRCAM (Assayag et al. 2006).
OMax can collect data live, forming a forwards and backwards set of paths through the
data as it identifies recurrent substrings and, like a Markov model, it is able to use this
graph representation for generating new strings “in the style of” the source. One draw
back of this application of a string-matching algorithm is that its approach to pattern dis
covery is not necessarily very musically motivated; the space of all possible substrings is
not the space of all musically useful ideas! As Schankler and colleagues (2011) note, the
Factor Oracle tends to promote musical forms based on recurring musical cells, particu
larly favoring material presented to it earliest on in training (rondo-like forms); a human
participant can cover up for some of the algorithm’s deficiencies.
With the rise of data-mining approaches, an excellent example where the mass use of ma
chine-learning algorithms occurs in computer music is the developing field of music infor
mation retrieval (MIR) (Downie 2003; Casey et al. 2008). Most of these algorithms oper
ate offline, though there are circumstances, for example live radio broadcast, where clas
sifications have to take place on-the-fly. There are certainly situations, for instance, the
audio fingerprinting of the Shazam mobile service used to identify music in the wild,
where as-fast-as-possible calculation is preferable. As for many interactive systems, MIR
systems may have their most intensive model parameter construction precalculated in in
tensive computation, and they can then deploy the model on novel data much more easily.
Page 5 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Nonetheless, newly arriving data may need to be incorporated into a revised model, lead
ing to intensive parameter and structure revision cycles (e.g., as occurs if rebuilding a
kD-tree). The gathering volume of MIR work is a beneficial resource of ideas to adapt for
live performance systems.
Machine learning has also found its way into new musical controllers, particularly
(p. 355)
to create effective mappings between sense inputs and the sound engine (Hunt and Wan
derley 2002). Applications may involve changes in the dimensionality of data, as in many-
to-one or one-to-many mappings. For example, Chris Kiefer uses echo state networks (a
form of connectionist learning algorithm) to manage the data from EchoFoam, a squeez
able interface built from conductive foam, reducing from multiple sensors embedded in a
3D object to a lower number of synthesis parameters (Kiefer 2010). The MnM library for
the graphical audio programming environment Max/MSP provides a range of statistical
mapping techniques to support mapping work (Bevilacqua, Müller, and Schnell 2005); Re
becca Fiebrink has released the Wekinator software which packages the Weka machine-
learning library into a system usable for real-time training and deployment (Fiebrink
2011). With increasingly complicated instruments, machine learning can help from cali
bration and fine tuning of the control mechanism, to making the sheer volume of data
tractable for human use.
Page 6 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
ta like MIDI files, can provide the extensive bootstrapping a given model may require. Re
hearsal recordings can be taken, and passed over by a learning algorithm in multiple
training runs (for example, as required in some reinforcement-learning approaches) or
(p. 356) applied selectively in training an onset detector that has more negative examples
than positive to learn from. Alternatively, algorithms may be preferred that need less da
ta, or simply less data used in training at the cost of reduced performance effectiveness,
as with some demonstrations of the Wekinator (Fiebrink 2011); the added noise of such a
system can (charitably) be musically productive, as for example in the inaccuracies of a
pitch tracker leading to a more unpredictable (and thus stimulating) response (Lewis
1999).
Machine learning in real applications forces various pragmatic decisions to be made. Mu
sical parameter spaces show combinatorial explosions (for example, in considering in
creasingly long subsegments of melodies as the units of learning); keeping the dimension
of the state space low requires compromises on the accuracy of the approximating repre
sentation. Without some simplification, the learning process may not be tractable at all,
or may require too much training data to be practicable! A regression problem with con
tinuous valued data may be reduced to discrete data by a preprocessing clustering or vec
tor quantization step, at the cost of losing fine detail and imposing boundaries (this ten
sion between continuous and discrete is familiar whenever we use convenient categories
in natural language, which can distort the true distribution). Even when a musical repre
sentation is eminently sensible, the machine-learning algorithms themselves have differ
ing inductive biases, with different performances in generalizing to unseen cases. It may
be useful to train multiple models in parallel and select the best performing (there are
technicalities here in holding back certain test data to measure this). Yet what works well
as a small-scale solution to a particular concert task may prove less equipped to the va
garies of a whole tour!
A further issue for those researching the incorporation of learning agents in live music is
evaluation of the effectiveness of these agents as musical participants, especially where
we consider the longer-term existence of these systems. Even after building such sys
tems, evaluating them through longitudinal studies is not easy. The attribution problem in
Page 7 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
machine learning notes the difficulty of assigning credit to guide the learning of a com
plex system, particularly when praise or negative feedback is itself scarce (Collins 2007).
As well as confounding the application of algorithms such as reinforcement learning and
the fitness functions of genetic algorithms, the lack of quality feedback undermines evalu
ation of system effectiveness. Human–computer interaction (HCI) methodologies for feed
back from rehearsal or concerts are currently based around more (p. 357) qualitative
methods of review such as postperformance interviews (Hsu and Sosnick 2009). In-the-
moment quantitative evaluation methods (such as physiological measures from galvanic
skin response or EEG) in HCI are at only a tentative stage (Kiefer, Collins, and Fitzpatrick
2008).
Figure 21.1 gives an overview of the whole system of the original LL. Ten parallel agents
are associated with ten different musical states; the switching of state, and thus which
agent is active, depends on machine learning from the human musician’s inputs to the
system. We avoid too much discussion of the machine-listening components, and the out
put synthesis models herein, instead concentrating on the learning aspects. The primary
sites of machine learning in the system are:
Page 8 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
The first three processes are unsupervised and automatic; the last involves train
(p. 358)
The first process is a special low-level normalization step. Features provided in machine
listening may have different parameter ranges, and some sort of max–min or statistical
(mean and standard deviation) normalization is required for their ranges to be compara
ble. Histogram equalization is a further technique, lifted from computer vision (Bradski
and Kaehler 2008, 188), where the area assigned between 0 and 1 in the normalized fea
ture output is proportional to the actual feature distribution observed in the training da
ta, through a histogramming estimation and linear segment model. This step then tries to
make the different features maximally comparable in combined feature vectors of normal
ized values. The histogram equalization can be learned online (as values arrive), which
can be especially useful where the distribution of data is not well known in advance (and
may be an attribute of many musical situations, for example, microphones in unfamiliar
acoustic environments or a system working across many types of musical input encoun
tering bagpipes for the first time!).
In the second learning process, clustering operates on aggregated timbre points, con
structed from an average of timbral feature vectors over a window of around 600 ms. In
actual fact, the clustering is achieved by running multiple randomly initialized k-means
(where k = 10 in LL) clustering algorithms, and taking the “best” with respect to an error
condition (least total distance of training data to the cluster centers). Postprocessing is
used on the clusterer output for stability; the majority state over the last ten checks
(where checks occur around ten times per second as feature data updates) is taken as the
output state. The best matching cluster is thus a result of feature data collected in the
last 1.5 seconds, a reasonable figure for working memory and a good turnaround time for
reaction to a shift in musical behavior from the musician being tracked. In application,
multiple clustering units can be used, based on different combinations of source features
as the data source; this keeps the dimensionality of input lower for an individual clusterer
than using all features at once, making machine learning more effective with a smaller
amount of input data (recall the discussion of tradeoffs above).
Page 9 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
The third and fourth processes depend on event-timing data lifted from the human musi
cian through onset detection, and machine-listening processes to assess the current met
rical structure of performance (beat tracking). The classifier was constructed by observa
tion as much as a fully supervised algorithm; indeed, when collecting materials in re
hearsal, Eddie Prévost, when asked to provide examples of his freest playing, tended to
still mix in short flashes of more beat-based material; given a smaller amount of data, hu
man inspection of this was the most pragmatic solution (a future study may collect much
more data across multiple drummers and investigate machine classification more rigor
ously). The classifier differentiates performance in a loose, highly improvisatory mode,
from more highly beat-driven material. The response model for the musical agents’ own
timing then follows a Markov model of observed event timings collected during free play
ing, or works with respect to beat-boundaries discovered in the metrical analysis. The
Markov model was constantly active in collecting data, and could develop from rehearsal
through into the actual concert. (p. 359)
Shorn of the rhythmic analysis parts of LL, processes 1 and 2 were packaged into the
more reusable ll~ external for Max/MSP. Figure 21.2 shows a screenshot of ll~ in action,
illustrating feature collection and the classification of timbral states. The external’s three
outputs are the clusterer’s current observed cluster number (the number of states, the
“k” in k-means, can be chosen as an input argument), a measure of how full the memory
is with collected feature data, and a list of histogram equalized and normalized feature
values (which can be useful in further feature-adaptive sound synthesis and processing).
In practice, not all learning has to be online, adapting during a concert. For clustering, al
though online learning algorithms (such as agglomerative clustering) were implemented
the most pragmatic technique was to run the k-means clustering at a predetermined (af
ter enough data is collected) or user-selected point (this is the control mode of the ll~ ex
ternal). This avoids transitory behavior of the clusterer particularly in the early stages of
receiving data. While data collection is continuous, ll~ is quite flexible to being trained at
Page 10 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
chosen points, and particular clustering solutions can be frozen by saving (and later re
calling) files. In practice, even if a system is learning from session to session, the hard
work of reflection and learning may take place between rather than during sessions. The
final system that performed with (p. 360) Eddie Prévost in the evening concert had been
trained in rehearsal sessions earlier in the day and the day before. The time constraints
on available rehearsal had led me to train baseline systems on drum samples and human
beat-boxing; we experimented in performing with systems trained on alternative musical
input than Eddie’s drum kit, perhaps justifiable as an attempt to give the system a diver
gent personality, and while some in-concert adaptation took place in the rhythmic do
main, the feature adaptation and clusterers were fixed in advance, as well as the actual
classification measure for free time versus highly beat-based.
Reaction to LL’s premiere was positive from performer and audience, though in discus
sion after the event (a recording had been made), Eddie was more guarded in any praise.
He was enthusiastic about the ideas of longer-term learning, though we both agreed that
this system did not yet instantiate those dreams. Sam Hayden also kindly sent feedback
on his own investigations of the ll~ object, noting: “I’ve been experimenting with using
pre-trained ll~ objects and mapping the output values onto fx synthesis parameters then
feeding the resultant audio back into the ll~ objects. Though the ll~ system is working as
it should the musical results seem a little unpredictable…Perhaps the mappings are too
arbitrary and the overall system too chaotic. I suppose the issue is of perception: as a lis
tener I think you can hear that the system has some kind of autonomy. It is a question of
how much you need to be able to follow what the system is doing for the musical interac
tions to be meaningful.” In his Adaptations, Sam even feeds back the final output audio of
the system, mixing into the input of the earliest ll~ object. Successive ll~ objects are in
troduced as the piece progresses over time, gradually increasing complexity; he writes
“As a listener, you are aware of some kind of underlying controlling system, even if you’re
not quite sure what it’s doing. It is this ambiguity that interest me.”
These comments highlight the independent views of listener, critic, and composer, and a
musician interacting with the system, and the need for further evaluation of such systems
as new learning facilities are explored. The reader is invited to try the ll~ object, and con
sider the roles machine learning could play in their own work. Much remains to explore,
as ever!
Page 11 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Though our focus has been virtual musicians in concerts, developments in this
(p. 361)
technology interact with other media. Videogames include increasing amounts of AI, and
where the 2000s craze for rhythm games has waned (perhaps as people have realized
they are at heart rather linear piano-roll challenges, like specific musical technical exer
cises), future music games may embrace rather more open-ended worlds, where dynamic
difficulty adjustment works over the lifetime of a player. Beyond touring AIs it is hard to
resist the possibility of musical familiars, virtual-musician programs that act as lifelong
musical companions, from tutors to partners in music making. Where fixed recording may
falter after a busy twentieth century, the rise of gaming points to a return of adaptable
music making for all.
Acknowledgments
With thanks to the editors, and Chris Thornton, for review feedback on the chapter, and
Eddie Prévost and Sam Hayden for their highly musical input and careful reflection on
the systems.
References
Alpaydin, Ethem. 2010. Introduction to Machine Learning. Cambridge, MA: MIT Press.
Ames, Charles. 1989. The Markov Process as a Compositional Model: A Survey and a Tu
torial. Leonardo 22 (2): 175–187.
Assayag, Gérard, Georges Bloch, M. Marc Chemillier, Arshia Cont, and Shlomo Dubnov.
2006. OMax Brothers: A Dynamic Topology of Agents for Improvisation Learning. In AM
CMM ’06: Proceedings of the 1st ACM Workshop on Audio and Music Computing Multi
media. New York: ACM.
Begleiter, Ron, Ran El-Yaniv, and Golan Yona. 2004. On Prediction Using Variable Order
Markov Models. Journal of Artificial Intelligence Research 22: 385–421.
Bevilacqua, Frédéric, Rémy Müller, and Norbert Schnell. 2005. MnM: A Max/MSP Map
ping Toolbox. In Proceedings of the International Conference on New Interfaces for Musi
cal Expression (NIME05), Vancouver, BC.
Bradski, Gary, and Adrian Kaehler. 2008. Learning OpenCV: Computer Vision with the
OpenCV Library. Sebastopol, CA: O’Reilly Media.
Casey, Michael A., Remco Veltkamp, Masataka Goto, Marc Leman, Christophe Rhodes,
and Malcom Slaney. 2008. Content-based Music Information Retrieval: Current Directions
and Future Challenges. Proceedings of the IEEE 96 (4): 668–696.
Chadabe, Joel. 1997. Electric Sound: The Past and Promise of Electronic Music. Engle
wood Cliffs, NJ: Prentice Hall.
Page 12 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Collins, Nick. 2007. Musical Robots and Listening Machines. In The Cambridge Compan
ion to Electronic Music, ed. Nick Collins and Julio d’Escrivan, 171–184. Cambridge, UK:
Cambridge University Press.
——. 2011a. Trading Faures: Virtual musicians and machine ethics. Leonardo Mu
(p. 362)
——. 2011b. Machine Listening in SuperCollider. In The SuperCollider Book, ed. Scott
Wilson, David Cottle, and Nick Collins, 439–460. Cambridge MA: MIT Press.
Conklin, Darrell, and Ian H. Witten. 1995. Multiple Viewpoint Systems for Music Predic
tion. Journal of New Music Research 24 (1): 51–73.
Cope, David, ed. 2001. Virtual Music: Computer Synthesis of Musical Style. Cambridge,
MA: MIT Press.
Deliège, Irène, and John A. Sloboda, eds. 1996. Musical Beginnings: Origins and Develop
ment of Musical Competence. New York: Oxford University Press.
Ericsson, K. Anders, and A. C. Lehmann. 1996. Expert and Exceptional Performance: Evi
dence of Maximal Adaptation to Task. Annual Review of Psychology 47: 273–305.
Fiebrink, Rebecca. 2011. Real-time Human Interaction with Supervised Learning Algo
rithms for Music Composition and Performance. PhD diss., Princeton University. http://
www.cs.princeton.edu/~fiebrink/Rebecca_Fiebrink/thesis.html.
Foster, Peter, Anssi Klapuri, and Mark. D. Plumbley. 2011. Causal Prediction of Continu
ous-valued Music Features. In the Proceedings of the International Society of Music Infor
mation Retrieval Conference, 501–506.
Hamanaka, Masatoshi, Masataka Goto, Hideki Asoh, and Nobuyuki Otsu. 2003. A Learn
ing-based Jam Session System that Imitates a Player’s Personality Model. IJCAI: Interna
tional Joint Conference on Artificial Intelligence, 51–58.
Hayden, Sam, and Mieko Kanno. 2011. Towards Musical Interaction: Sam Hayden’s Schis
matics for E-violin and Computer. Proceedings of the International Computer Music Con
ference, 486–490.
Hsu, William, and Marc Sosnick. 2009. Evaluating Interactive Music Systems: An HCI Ap
proach. In Proceedings of the International Conference on New Interfaces for Musical Ex
pression, 25–28.
Hunt, Andy, and Marcelo M. Wanderley. 2002. Mapping Performer Parameters to Synthe
sis Engines. Organised Sound 7 (2): 97–108.
Page 13 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Kapur, Ajay. 2005. A History of Robotic Musical Instruments. In Proceedings of the Inter
national Computer Music Conference, 1–8.
Kiefer, Chris. 2010 A Malleable Interface for Sonic Exploration. In Proceedings of the In
ternational Conference on New Interfaces for Musical Expression, 291–296. Sydney, Aus
tralia. http://www.nime.org/proceedings/2010/nime2010_291.pdf.
Kiefer, Chris, Nick Collins, and Geraldine Fitzpatrick. 2008. HCI methodology for evaluat
ing musical controllers: A case study. In Proceedings of the International Conference on
New Interfaces for Musical Expression, 87–90. Genova, Italy. http://www.nime.org/pro
ceedings/2008/nime2008_087.pdf.
Le Groux, Sylvain, and Paul F. M. J. Verschure. 2010. Towards Adaptive Music Generation
by Reinforcement Learning of Musical Tension. Proceedings of Sound and Music Comput
ing. http://smcnetwork.org/files/proceedings/2010/24.pdf.
Lewis, George E. 1999. Interacting with Latter-day Musical Automata. Contemporary Mu
sic Review 18 (3): 99–112.
(p. 363) Martin, Aengus, Craig T. Jin, and O. R. Bown. 2011. A Toolkit for Designing Inter
active Musical Agents. Proceedings of the 23rd Australian Computer-human Interaction
Conference, 194–197. New York: ACM.
Mantaras, Ramon Lopez de, and Josep Lluis Arcos. 2002. AI and Music: From Composi
tion to Expressive Performance. AI Magazine 23 (3): 43–57.
Matarić, Maja J. 2007. The Robotics Primer. Cambridge, MA: MIT Press.
Miranda, Eduardo Reck, and John. A. Biles, eds. 2007. Evolutionary Computer Music.
London: Springer-Verlag.
Pachet, François. 2003. The Continuator: Musical Interaction with Style. Journal of New
Music Research 32 (3): 333–341.
Pearce, Marcus T., and Geraint A. Wiggins. 2004. Improved Methods for Statistical Model
ling of Monophonic Music. Journal of New Music Research 33 (4): 367–385.
Pierce, John Robinson. 1968. Science, Art, and Communication. New York: Clarkson N.
Potter.
Rowe, Robert. 1993. Interactive Music Systems. Cambridge, MA: MIT Press.
Schankler, Isaac, Jordan B. L. Smith, Alexandre François, and Elaine Chew. 2011. Emer
gent Formal Structures of Factor Oracle-driven Musical Improvisations. In Mathematics
Page 14 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
and Computation in Music, ed. Carlos Agon, Moreno Andreatta, Gérard Assayag, Em
manuel Amiot, Jean Bresson, and John Mandereau, 241–254. Paris: IRCAM, CNRS, UPMC.
Smith, Benjamin D. and Guy E. Garnett. 2011. The Self-Supervising Machine. Proceedings
of the International Conference on New Interfaces for Musical Expression, 30 May-1 June
2011, Oslo, Norway. http://www.nime2011.org/proceedings/papers/B21-Smith.pdf
Temperley, David. 2007. Music and Probability. Cambridge, MA: MIT Press.
Thornton, Chris. J. 2011. Generation of Folk Song Melodies Using Bayes Transforms. Jour
nal of New Music Research 40 (4): 293–312
Wiggins, Geraint A., Marcus T. Pearce, and Daniel Müllensiefen. 2009. Computational
Modeling of Music Cognition and Musical Creativity. In The Oxford Handbook of Comput
er Music, ed. Roger T. Dean, 383–420. New York: Oxford University Press.
Witten, Ian H., and Eibe Frank. 2005. Data Mining: Practical Machine Learning Tools and
Techniques. San Francisco: Morgan Kaufmann.
Nick Collins
Nick Collins is a composer, performer and researcher who lectures at the University
of Sussex. His research interests include machine listening, interactive and genera
tive music, and musical creativity. He co-edited the Cambridge Companion to Elec
tronic Music (Cambridge University Press 2007) and The SuperCollider Book (MIT
Press, 2011) and wrote the Introduction to Computer Music (Wiley 2009). Some
times, he writes in the third person about himself, but is trying to give it up. Further
details, including publications, music, code and more, are available from http://
www.sussex.ac.uk/Users/nc81/index.html
Page 15 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).