Professional Documents
Culture Documents
Project Final Document
Project Final Document
A PROJECT REPORT
Submitted by
VISWA S (721419104054)
HARIHARAN S (721419104701)
of
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
MAY-2023
ANNA UNIVERSITY: CHENNAI 600 025
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
We are thankful for the support and guidance from all the Faculty of Computer
Science and Engineering Department and sincere thanks to my supporting staff for
their timely assistance. Finally, I would like to express my gratitude towards my
parents for their motivation and support which helped me in the completion of this
project.
ABSTRACT
ABSTRACT iv
LIST OF ABBREVIATIONS ix
1 INTRODUCTION 1
2 RELATED WORKS 4
2.1 OBJECTIVES 4
2.2 IDENTIFIED PROBLEMS 4
3 EXISTING SYSTEM 5
4 PROPOSED SYSTEM 7
4.3 ARCHITECTURE 13
6 CONCLUSION 50
APPENDIX I - SCREENSHOTS 51
APPENDIX II - CODING 56
REFERENCES 69
LIST OF FIGURES
11 AI Artificial Intelligence
12 ML Machine Learning
16 SD Secure Digital
INTRODUCTION
Music composition has been human’s most popular hobby for a long time.
Music generation is a topic that has been studied in much detail in the research
industry in the past. There are several who tried to generate music with different
approaches. There are several approaches which intend to generate a suite of
music and their combination can be used to design a new and efficient model. We
divide these approaches into two wide categories. One is traditional algorithms
operating on predefined functions to produce music and another is an
autonomous model which learns from the past structure of musical notes and then
produces a new music. Garvit et al. also presented a paper to compare these
algorithms on two different hardware’s. Drewes et al. proposed a method on how
algebra can be used to generate music in a grammatical manner with the help of
tree-based fashion. Markov chains and Markov hidden units can be used to
design a mathematical model to generate music. Recorded in the book “Ancient
Greek Music: A Survey,” the oldest surviving complete musical composition is
Seikilos Epitaph from 1st or 2nd century AD. Algorithmic
com-position—composing by means of formalizable methods—Guido of Arezzo
was the first to invent the music theory which is mainly known as “solfeggio” in
1025. He invented a method to automate the conversion of text into melodic
phrases. Music generation using artificial intelligence with artistic style imitation
has been broadly explored within the past three decades. Nowadays, the way
electronic instruments communicate with computers is through a protocol named
Music Instrument Digital Interface (MIDI) which contains information such as
pitch and duration. Midi can be used for the composer to compose an orchestra
without having the orchestra itself. Every part of the orchestra could be created
from midi with synthesizer. Digital audio workstation (DAW) is the standard in
contemporary audio production which is revolutionary, capable in recording,
1
editing, mastering, and mixing. DAW allows midi plays each channel separately
into connected electronic devices or synthesizers. There exist many open source
systems to generate music, e.g. but an in-depth discussion of all such research
works and methods is beyond the scope of this document. The algorithmic
composition has various approaches such as the Markov model, generative
grammars, transition networks, artificial neural networks, etc.
After the breakthrough in AI, many new models and methods were
proposed in the field of music generation. Description of various AI enabled
techniques can be found in including a probabilistic model using RNNs,
Anticipation-RNN and recursive artificial neural networks (RANN) an evolved
version of artificial neural networks for generating the subsequent note,
subsequent note duration, rhythm generation. Generative adversarial networks
(GANs) are actively used in generating musical notes which contain two neural
networks, generator network that generates some random data and discriminator
network that evaluates generated random data for authenticity against the original
data(dataset). MuseGAN is a generative adversarial network that generates
symbolic multi-track music.
3
CHAPTER 2
RELATED WORKS
2.1 OBJECTIVES
● To produce high quality music for various purposes like video editing,
● To solve the demand of getting good music with the help of RNN
algorithm
needs
4
CHAPTER 3
EXISTING SYSTEM
In a Markov chain model for music generation, each note or chord is treated
as a "state," and the transitions between the states are determined by the
probabilities of certain notes or chords following others. For example, if in the
original music, a C major chord is often followed by an F major chord, the
Markov chain model would assign a high probability to the transition from C to
F.
To generate new music using a Markov chain, the model is given a starting
note or chord, and then it randomly selects the next note or chord based on the
probabilities of the possible transitions from the current state. This process is
repeated to generate a sequence of notes or chords that form a new piece of
music.
There are various ways to improve the quality of the music generated by a
Markov chain model, such as adding rules for rhythm and melody, or using
multiple Markov models for different aspects of the music, such as harmony and
melody. Additionally, the quality of the generated music can depend on the size
and diversity of the original dataset used to train the model. While Markov
chains are very useful in music generation, there are some challenges involved
in using them effectively. Some of the challenges include :
6
CHAPTER 4
PROPOSED SYSTEM
There are several advantages to using LSTM for music generation, including
the ability to capture complex patterns in the music and the ability to generate
music that is more structured and coherent compared to other generative models
like Markov chains. However, like other machine learning approaches, there are
still challenges involved in training an LSTM model for music generation, such
as finding the right balance between model complexity and generalization, and
avoiding overfitting to the training dataset. There are several advantages to using
Long Short-Term Memory (LSTM) models in music generation:
2. Flexibility and creativity: LSTMs are highly flexible and can generate a
wide range of musical styles and patterns. By adjusting the architecture and
hyperparameters of the model, it is possible to generate music with different
rhythms, melodies, harmonies, and styles.
8
4.1 LIST OF MODULES
1. Data Collection
2. Data Preprocessing
5. Music Generation
1. Data Collection
2. Data Preprocessing
10
3. Build the model
Building an LSTM model for music generation involves several steps. The
input sequence for the LSTM model is a sequence of musical symbols. The
length of the sequence is determined by the number of time steps that the LSTM
model will process. The LSTM model consists of one or more LSTM layers.
Each LSTM layer includes a set of memory cells that allow the model to learn
long-term dependencies in the music data. The output of the LSTM layer is a
sequence of hidden states, which are used to predict the next symbol in the
sequence. The output layer of the LSTM model maps the hidden states to a
probability distribution over the set of possible musical symbols. This allows the
model to generate diverse and creative music.
Once the architecture is defined, the model must be compiled. This involves
specifying the loss function, optimizer, and metrics used to evaluate the
performance of the model during training. The LSTM model is trained on a
dataset of musical compositions using the compiled loss function and optimizer.
The training process involves iteratively updating the weights of the LSTM
model to minimize the loss function and improve the model's performance. This
is typically done using stochastic gradient descent (SGD) or a variant thereof.
During training, it is important to monitor the performance of the LSTM model
on a validation set of musical compositions. This allows you to detect overfitting
and adjust the model's hyperparameters, such as the learning rate, batch size, and
number of epochs.
11
5. Music Generation
The music generation in LSTM involves using the trained model to generate
new musical compositions. The music generation process starts with defining a
seed sequence, which is a short sequence of musical symbols that serves as the
initial input to the LSTM model. This seed sequence can be either chosen
manually or generated randomly. Given the seed sequence, the LSTM model is
used to predict the next symbol in the sequence. The predicted symbol is then
added to the seed sequence to form a new, longer sequence. This process is
repeated to generate a new sequence of musical symbols. Once the new sequence
of musical symbols is generated, it can be converted into an audio file by using a
MIDI synthesizer or other audio rendering software. The generated audio file can
then be played back to listen to the new musical composition. Music generation
using LSTM is an iterative process. The generated composition can be evaluated
for its musical quality and adjusted based on the user's preferences. The seed
sequence can also be modified to generate different musical styles or genres.
12
4.3 ARCHITECTURE
13
4.4 SOFTWARE USED
⮚ Language : Python
⮚ Technology : Keras
⮚ Developing Tool : Pycharm Community Edition 2023.1
⮚ Web Server : Flask
HARDWARE USED
14
4.5 SOFTWARE DESCRIPTION
Philosophy of AI
While exploiting the power of the computer systems, the curiosity of
humans led him to wonder, “Can a machine think and behave like humans do?”
Goals of AI
To Create Expert Systems − The systems which exhibit intelligent
behavior, learn, demonstrate, explain, and advise its users.
15
What Contributes to AI?
Artificial intelligence is a science and technology based on disciplines such
as Computer Science, Biology, Psychology, Linguistics, Mathematics, and
Engineering. A major thrust of AI is in the development of computer functions
associated with human intelligence, such as reasoning, learning, and problem
solving.
Out of the following areas, one or multiple areas can contribute to building
an intelligent system.
16
4.5.2 Overview of Machine Learning
17
4.5.3 Overview of Deep Learning
18
data and improve their accuracy over time through a process called
backpropagation.
19
4.5.4 Relation between AI, ML and Deep Learning
Deep Learning (DL) is a specific type of ML that uses deep neural networks
with multiple layers to learn complex patterns in data. DL algorithms are
inspired by the structure and function of the human brain and are capable of
processing large amounts of data with high accuracy. DL has enabled
breakthroughs in fields such as computer vision, speech recognition, and natural
language processing.
20
4.5.5 Overview of HTML
21
● <h1>, <h2>, <h3>, etc. for headings
● <p> for paragraphs
● <img> for images
● <a> for links
● <ul> and <ol> for unordered and ordered lists
● <table> for tables
● <form> for input forms
HTML is an essential part of the web development stack, along with CSS
for styling and JavaScript for interactivity. With HTML, developers can create
static web pages, dynamic web applications, and everything in between.
22
4.5.6 Overview of Flask
Flask provides a simple yet powerful way to create web applications using
Python. It follows the Model-View-Controller (MVC) pattern, where the
application logic is divided into three components: the model, which represents
the data and business logic, the view, which handles user interaction and
presentation, and the controller, which handles requests and responses between
the model and view.
Flask provides several tools and libraries for handling common web
development tasks, such as routing, templating, and form validation. It also
integrates easily with popular Python libraries for data analysis and machine
learning, making it a great choice for developing data-driven web applications.
23
4.6 SOFTWARE KEY TERMINOLOGIES
RNNs are a powerful and robust type of neural network, and belong to the
most promising algorithms in use because it is the only one with an internal
memory.
Like many other deep learning algorithms, recurrent neural networks are
relatively old. They were initially created in the 1980’s, but only in recent years
have we seen their true potential. An increase in computational power along
with the massive amounts of data that we now have to work with, and the
invention of long short-term memory (LSTM) in the 1990’s, has really brought
RNNs to the foreground.
24
Because of their internal memory, RNNs can remember important things
about the input they received, which allows them to be very precise in predicting
what’s coming next. This is why they’re the preferred algorithm for sequential
data like time series, speech, text, financial data, audio, video, weather and much
more. Recurrent neural networks can form a much deeper understanding of a
sequence and its context compared to other algorithms.
LSTMs use a set of specialized memory cells, which can store information
for long periods of time and selectively forget or update that information based
on input from the current time step. This allows LSTMs to effectively capture
long-term dependencies in sequential data, such as sentences or music, and to
generate more accurate and complex predictions.
25
generating sequential data, and are widely used in many different fields of
research and application.
A single MIDI cable can carry up to sixteen channels of MIDI data, each of
which can be routed to a separate device. Each interaction with a key, button,
knob or slider is converted into a MIDI event, which specifies musical
instructions, such as a note's pitch, timing and loudness. One common MIDI
application is to play a MIDI keyboard or other controller and use it to trigger a
digital sound module (which contains synthesized musical sounds) to generate
sounds, which the audience hears produced by a keyboard amplifier. MIDI data
can be transferred via MIDI or USB cable, or recorded to a sequencer or digital
audio workstation to be edited or played back.
A file format that stores and exchanges the data is also defined. Advantages
of MIDI include small file size, ease of modification and manipulation and a
wide choice of electronic instruments and synthesizer or digitally sampled
sounds. A MIDI recording of a performance on a keyboard could sound like a
piano or other keyboard instrument; however, since MIDI records the messages
and information about their notes and not the specific sounds, this recording
could be changed to many other sounds, ranging from synthesized or sampled
guitar or flute to full orchestra.
26
4.6.4 MUSIC21
One of the key features of Music21 is its integration with other Python
libraries and tools. It can be used in conjunction with other data analysis and
visualization libraries such as pandas, numpy, and matplotlib, allowing users to
apply the same data manipulation techniques to both musical and non-musical
data. Additionally, Music21 can be used in conjunction with deep learning
frameworks like TensorFlow and PyTorch, making it a powerful tool for
generating new music using machine learning techniques.
4.6.5 TENSORFLOW
27
industry and academia. TensorFlow provides a highly efficient and scalable
platform for developing and deploying machine learning models across a wide
range of platforms, including desktop computers, mobile devices, and
large-scale distributed systems.
At its core, TensorFlow is based on a flexible and powerful data flow graph
that enables users to build and train a wide range of machine learning models,
including neural networks, decision trees, and other types of classifiers and
regressors. The library provides a rich set of tools for working with large
datasets, including powerful data transformation and preprocessing functions, as
well as support for a wide range of file formats and data sources.
TensorFlow also provides a range of high-level APIs that make it easy for
users to build complex machine learning models with just a few lines of code.
These include the Keras API, which provides a simple and intuitive interface for
building neural networks, and the Estimator API, which provides a high-level
interface for building complex models with custom training loops and other
advanced features.
4.6.6 KERAS
28
One of the main benefits of Keras is its simplicity. It abstracts away many
of the low-level details of working with neural networks, allowing users to focus
on building and training models without getting bogged down in implementation
details. Keras provides a number of pre-built layers and models that can be used
out of the box or customized to meet specific requirements.
Keras also provides a number of tools for evaluating and visualizing models.
For example, it includes built-in support for training callbacks, which can be
used to monitor training progress and adjust hyperparameters in real time. Keras
also includes tools for visualizing model performance metrics, such as accuracy
and loss, as well as tools for visualizing the structure of models, such as layer
diagrams and model summaries.
A MIDI player is a software or hardware device that plays MIDI files. MIDI
stands for Musical Instrument Digital Interface and is a protocol used for
communicating musical information between electronic devices, such as
synthesizers, computers, and MIDI controllers.
MIDI files are not audio files, but rather a set of instructions that describe
how to play music, including information such as which notes to play, the timing
of the notes, and the instrument to use. MIDI players can interpret these
instructions and play back the music, often using virtual instruments or
soundfonts to produce the sounds.
29
programs that can play MIDI files directly on a computer, while others are
plugins that can be used within a digital audio workstation (DAW) or music
production software.
Hardware MIDI players are standalone devices that can play MIDI files
directly from a USB drive, SD card, or other storage device. These devices often
have built-in speakers or can be connected to a larger sound system for
playback.
30
There are many different types of MIDI visualizers, each with its own unique
set of features and capabilities. Some common features found in MIDI
visualizers include:
1. Piano roll view: A piano roll view displays MIDI data in a graphical
representation of a piano keyboard, allowing users to see which notes are
being played and when.
2. 3D visualization: Some MIDI visualizers use 3D graphics to display MIDI
data, creating a more immersive and interactive visual experience.
3. Real-time visualization: Real-time visualization allows users to see
visualizations of MIDI data as it is being played, making it easier to
understand how different elements of the music are interacting with each
other.
31
4.7 MUSICAL CONCEPTS
4.7.1 MELODY
Melodies can be sung or played on any instrument, and they are often the
most memorable and recognizable part of a song or piece of music. A melody
can be simple or complex, and can vary in length and structure. Melodies are
often built around a particular scale or key, and they may feature repetition,
variation, or development of musical ideas.
32
to provide a memorable hook or chorus in a song. Melodies can also be used as
the basis for improvisation or for developing harmonies, counterpoint, and other
musical textures.
4.7.2 HARMONY
Harmony refers to the vertical aspect of music, or the way that two or more
notes or pitches sound when played simultaneously. Harmony is an essential
element of Western music, and is used to create chords, progressions, and other
harmonic structures.
34
4.7.3 RHYTHM
Rhythm refers to the arrangement of sounds and silences over time, creating
a sense of movement, pulse, and groove. Rhythm is one of the fundamental
elements of music and is essential to creating a sense of forward motion and
energy in a piece.
There are a variety of means to create rhythm including the use of regular,
repeating patterns of beats, as well as irregular or syncopated patterns that create
a sense of tension and release. Different musical genres and styles use rhythm in
different ways, from the driving backbeat of rock and roll to the intricate
polyrhythms of African music.
35
4.7.4 NOTES AND RESTS
There are many different types of notes in music, each with a specific
duration and value. The most common types of notes are:
In addition to these basic note values, there are also dotted notes, which add
half the value of the note to its duration, and tied notes, which combine the
duration of two or more notes into a single sound.
There are many different types of rests in music, each with a specific duration
and value. The most common types of rests are:
In addition to these basic rest values, there are also dotted rests, which add
half the value of the rest to its duration.
Rests are usually written on the same staff as notes, and are placed in the
same position as the note they correspond to in terms of duration. Understanding
rests is important for musicians when reading and writing music, as well as
when performing and improvising music. By using rests effectively, musicians
can create a sense of space, tension, and release within a piece, and can help to
create a sense of musical structure and form.
37
4.7.5 PITCH
Pitch is measured in Hertz (Hz), which represents the number of sound waves
that occur in one second. The higher the frequency of the sound wave, the higher
the pitch of the note, and vice versa.
In Western music, there are 12 distinct pitches in each octave, which are
named after the first seven letters of the alphabet (A, B, C, D, E, F, and G) and
are each accompanied by a sharp (#) or flat (b) symbol. The distance between
two adjacent notes is called a half-step, and there are 12 half-steps in an octave.
38
Pitch is a crucial component of melody, harmony, and rhythm in music, and
is often used to create tension, release, and emotional expression.
4.7.6 SCALE
Western music typically uses 7-note scales, which are made up of a specific
pattern of half-steps and whole-steps. The most commonly used scale in Western
music is the major scale, which has a specific pattern of whole-steps and
half-steps: W-W-H-W-W-W-H (where W represents a whole-step and H
represents a half-step). Another commonly used scale is the minor scale, which
also has a specific pattern of whole-steps and half-steps: W-H-W-W-H-W-W.
Scales can also be used to create modes, which are variations of a given
scale that start and end on different notes. For example, the natural minor scale
39
can be used to create the Dorian mode by starting on the second note of the scale
(E in the key of D minor) and ending on the same note.
In addition to Western scales, there are many other scales used in different
musical traditions around the world. For example, Indian classical music uses a
system of scales called ragas, which are based on specific melodic patterns and
are associated with specific moods or emotions. Similarly, Arabic music uses a
system of scales called maqamat, which are based on specific intervals and
melodic patterns.
4.7.7 CHORD
40
Chords can be major, minor, augmented, diminished, or suspended,
depending on the intervals and relationships between the notes. Major chords
have a bright and uplifting sound, while minor chords have a darker and more
melancholy sound.
41
4.7.8 KEY
Key is a set of related pitches or notes that form the basis for a piece of
music. The key determines the tonal center or "home" note of the music, as well
as the set of notes or scales that are used to create the melody and harmony.
In Western music, the most common keys are major and minor keys, which
are defined by the relationships between the notes in a scale. Major keys are
generally considered to have a bright and uplifting sound, while minor keys have
a darker and more melancholy sound.
42
4.7.9 TEMPO
Different tempos are used for different musical styles and genres. For
example, a slow tempo might be used for a ballad, while a fast tempo might be
used for a dance or rock song. Tempo can also be used to create tension and
excitement within a piece of music. A sudden change in tempo, known as a
tempo shift, can create a dramatic effect and add interest to a piece.
Musicians often use metronomes, which are devices that produce a steady
clicking sound, to help them keep a consistent tempo. In classical music, the
tempo is often indicated by Italian terms, such as allegro (fast), adagio (slow), or
andante (moderate). In popular music, the tempo is often indicated by the BPM,
which can be found on sheet music or recorded in a digital audio file.
43
4.7.10 DYNAMICS
In written music, dynamics are often indicated by symbols and Italian terms
that instruct the performer on how to play or sing the music. For example, the
symbol "p" stands for piano, which means to play softly, while the symbol "f"
stands for forte, which means to play loudly. Other symbols and terms that
indicate dynamics include mezzo piano (moderately soft), mezzo forte
44
(moderately loud), crescendo (gradually getting louder), and diminuendo
(gradually getting softer).
45
4.7.11 TIMBRE
46
4.7.12 FORM
There are many different types of musical forms, each with its own unique
characteristics and conventions. Some common forms in Western classical music
include:
47
● Ternary form: A form consisting of three sections, with the first and third
sections being identical or closely related, and the second section contrasting
with them.
● Sonata form: A form commonly used in the first movement of a sonata,
symphony, or concerto, consisting of an exposition, development, and
recapitulation section.
● Rondo form: A form in which a recurring theme alternates with contrasting
sections.
● Theme and variations: A form in which a theme is presented and then varied
in a series of subsequent sections.
● Through-composed form: A form in which each section of the music is unique
and not repeated.
48
CHAPTER 5
49
CHAPTER 6
CONCLUSION
This paper achieves the goal of designing a model which can be used to
generate music and melodies automatically without any human intervention. The
model is capable of recalling the previous details of the dataset and generating
polyphonic music using a single layered LSTM model, proficient enough to
learn harmonic and melodic note sequence from MIDI files of melodies. The
model design is described with a perception of functionality and adaptability.
Induction and method of training dataset for music generation is achieved
through this work. Moreover, analysis of the model is also impersonated for
better insights and understanding. Enhancement of model feasibility and past
and present possibilities are also discussed in this paper. Future work will aim to
test how well this model scales on a much larger dataset. We would like to
observe effects on this model by adding more LSTM units and try different
combinations of hyper parameters to see how well this model performs. We
believe follow up research can optimize this model further with lots of
computation.
50
APPENDIX I - SCREENSHOTS
DATA PREPROCESSING
51
PREPROCESSED DATA
MAPPED DATA
52
MODEL DEVELOPMENT AND TRAINING
53
GENERATED MELODY AS SEQUENCE
54
RUNNING SERVER
55
APPENDIX II - CODING
preprocess.py
import json
import os
import numpy as np
KERN_DATASET_PATH = "deutschl/test"
SAVE_DIR = "dataset"
SINGLE_FILE_DATASET = "file_dataset"
MAPPING_PATH = "mapping.json"
SEQUENCE_LENGTH = 64
ACCEPTABLE_DURATIONS = [
0.75,
1.5,
2, # half note
3,
4 # whole note
56
def load_songs_in_kern(dataset_path):
songs = []
# go through all the files in dataset and load them with music21
if file[-3:] == "krn":
songs.append(song)
return songs
return False
return True
def transpose(song):
parts = song.getElementsByClass(m21.stream.Part)
measures_part0 = parts[0].getElementsByClass(m21.stream.Measure)
key = measures_part0[0][4]
key = song.analyze("key")
if key.mode == "major":
transposed_song = song.transpose(interval)
return tranposed_song
encoded_song = []
# handle notes
if isinstance(event, m21.note.Note):
symbol = event.pitch.midi # 60
# handle rests
symbol = "r"
# if it's the first time we see a note/rest, let's encode it. Otherwise, it means we're
carrying the same
if step == 0:
encoded_song.append(symbol)
else:
encoded_song.append("_")
58
return encoded_song
def preprocess(dataset_path):
print("Loading songs...")
songs = load_songs_in_kern(dataset_path)
continue
song = transpose(song)
encoded_song = encode_song(song)
fp.write(encoded_song)
def load(file_path):
song = fp.read()
return song
songs = ""
59
# load encoded songs and add delimiters
song = load(file_path)
songs = songs[:-1]
fp.write(songs)
return songs
mappings = {}
songs = songs.split()
vocabulary = list(set(songs))
# create mappings
mappings[symbol] = i
def convert_songs_to_int(songs):
int_songs = []
# load mappings
60
with open(MAPPING_PATH, "r") as fp:
mappings = json.load(fp)
songs = songs.split()
int_songs.append(mappings[symbol])
return int_songs
def generate_training_sequences(sequence_length):
songs = load(SINGLE_FILE_DATASET)
int_songs = convert_songs_to_int(songs)
inputs = []
targets = []
for i in range(num_sequences):
inputs.append(int_songs[i:i+sequence_length])
targets.append(int_songs[i+sequence_length])
vocabulary_size = len(set(int_songs))
targets = np.array(targets)
def main():
61
preprocess(KERN_DATASET_PATH)
create_mapping(songs, MAPPING_PATH)
if __name__ == "__main__":
main()
train.py
import tensorflow.keras as keras
OUTPUT_UNITS = 38
NUM_UNITS = [256]
LOSS = "sparse_categorical_crossentropy"
LEARNING_RATE = 0.001
EPOCHS = 90
BATCH_SIZE = 64
SAVE_MODEL_PATH = "model.h5"
x = keras.layers.LSTM(num_units[0])(input)
x = keras.layers.Dropout(0.2)(x)
model.compile(loss=loss,
optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
metrics=["accuracy"])
model.summary()
return model
model.save(SAVE_MODEL_PATH)
if __name__ == "__main__":
train()
melodygenerator.py
import json
import numpy as np
63
class MelodyGenerator:
self.model_path = model_path
self.model = keras.models.load_model(model_path)
self._mappings = json.load(fp)
seed = seed.split()
melody = seed
for _ in range(num_steps):
seed = seed[-max_sequence_length:]
# make a prediction
probabilities = self.model.predict(onehot_seed)[0]
# update seed
64
seed.append(output_int)
if output_symbol == "/":
break
# update melody
melody.append(output_symbol)
return melody
return index
stream = m21.stream.Stream()
start_symbol = None
step_counter = 1
# parse all the symbols in the melody and create note/rest objects
# handle rest
if start_symbol == "r":
m21_event = m21.note.Rest(quarterLength=quarter_length_duration)
# handle note
else:
m21_event = m21.note.Note(int(start_symbol),
quarterLength=quarter_length_duration)
stream.append(m21_event)
step_counter = 1
start_symbol = symbol
else:
step_counter += 1
stream.write(format, file_name)
if __name__ == "__main__":
mg = MelodyGenerator()
print(melody)
mg.save_melody(melody)
66
app.py
from flask import Flask, render_template, send_file
app = Flask(__name__)
@app.route('/')
def index():
return render_template('index.html')
@app.route('/play_music')
def play_music():
return send_file('mel.mid', mimetype='audio/midi')
if __name__ == '__main__':
app.run(debug=True)
index.html
<html>
<head>
</head>
<body>
<h1>GENERATED MUSIC</h1>
<midi-player
src="/play_music"
sound-font visualizer="#myPianoRollVisualizer">
</midi-player>
src="/play_music">
</midi-visualizer>
67
<midi-player
src="/play_music"
sound-font visualizer="#myStaffVisualizer">
</midi-player>
src="/play_music">
</midi-visualizer>
<script
src="https://cdn.jsdelivr.net/combine/npm/tone@14.7.58,npm/@magenta/music@1.23.1/es6/
core.js,npm/focus-visible@5,npm/html-midi-player@1.5.0"></script>
</body>
</html>
68
REFERENCES
70
19. G. H. A. K. I. S. R. S. Nitish Srivastava, "Dropout: A Simple Way to
Prevent Neural Networks from Overfitting," Journal of Machine
Learning Research, vol. 15, pp. 1929-1958, 2014.
20. G. L. Z. V. D. M. L. &. W. K. Q. Huang, " Densely connected
convolutional networks.," CVPR, vol. 1, p. 3, 2017.
21. O. M. Bjørndalen, "Mido," 21 August 2011. [Online]. Available:
https://github.com/olemb/mido. [Accessed 3 November 2018].
22. C. H. L. H. S. I. Moon T, "Rnndrop: A novel dropout for rnns in asr,"
Automatic Speech Recognition and Understanding (ASRU), pp. 65-70,
2015.
23. T. a. H. G. Tieleman, "Lecture 6.5-rmsprop: Divide the gradient by a
running average of its recent magnitude," COURSERA: Neural networks
for machine learning, vol. 4, no. 2, pp. 26-31, 2018.
24. Z. T. H. a. W. X. Z. Zeng, "Multistability of recurrent neural networks
with time-varying delays and the piecewise linear activation function.,"
IEEE Transactions on Neural Networks, vol. 21, no. 8, pp. 1371-1377,
2017.
25. GoogleDevelopers, "Descending into ML: Training and Loss,"
[Online].Available:
https://developers.google.com/machinelearning/crash-course/descendin
g-into-ml/training-and-loss.
26. S. Mangal, "Music Research," 13 November 2018. [Online].
Available: https://gitlab.com/sanidhyamangal/music_research.
27. "Musical_Matrices," 10 July 2016. [Online]. Available:
https://github.com/dshieble/Musical_Matrices/tree/master/Pop_Music
_Midi. [Accessed 2 November 2018].
71