Professional Documents
Culture Documents
MusicGen Reimagined
MusicGen Reimagined
MusicGen Reimagined
Under-the-Radar Advances in AI
Music
Exploring the overlooked but remarkable progress of MusicGen
Max Hilsdorf
·
Follow
Published in
·
7 min read
·
1 day ago
266
5
An image symbolizing how Music AI products can elevate music-making for everyone. Image
generated through a conversation with ChatGPT and DALL-E-3.
How it started…
In February 2023, Google made waves with their generative music
AI MusicLM. At that point, two things became clear:
Many anticipated that the next breakthrough model would be ten times
the size of MusicLM in terms of model parameters and training data. It
would also raise the same ethical issues, including restricted access to
the source code and the use of copyrighted training material.
…all while using less training data, open-sourcing the code and model
weights, and using only commercially licensed training material.
Six months later, the hype has slowly subsided. However, Meta’s
research team FAIR has continued publishing papers and updating the
code to incrementally improve MusicGen.
While this may sound like two small improvements, it makes a big
difference. Listen for yourself! Here is a 10-second piece generated
with the original MusicGen model (3.3B parameters):
Generated track taken from the official MusicGen demo page.
Figure 2 — MusicGen: A user prompt (text) is converted to an encoded audio signal which is
then decoded to produce the final result. Image by author.
Original Audio
EnCodec music example taken from the official EnCodec demo page.
Reconstructed Audio
EnCodec music example taken from the official EnCodec demo page.
As MusicGen fully relies on EnCodec, it is a major bottleneck for the
quality of the generated music. That is why Meta decided to work on
improving EnCodec’s decoder part. In August 2023, they had
developed an updated decoder for EnCodec leveraging multi-band
diffusion [3].
One problem Meta saw with EnCodec’s original decoder was that it
tended to generate low frequencies first and higher frequencies after.
Unfortunately, this meant that any errors/artifacts in the low
frequencies would distort the high frequencies as well, drastically
decreasing the output quality.
Original Decoder
Generated track taken from the Multi-Band Diffusion demo page.
Figure 3 — MusicGen stereo update. Note that the process was not sufficiently documented in
the paper for me to be 100% sure about this. Take it as an educated guess. Image by author.
Mono
Stereo
Conclusion
MusicGen was impressive from the day it was released. However, since
then, Meta’s FAIR team has been continually improving their product,
enabling higher quality results that sound more authentic. When it
comes to text-to-music models generating audio signals (not MIDI
etc.), MusicGen is ahead of its competitors from my perspective (as of
November 2023).
References