Remove ads
Music-generating machine learning model From Wikipedia, the free encyclopedia
Riffusion is a neural network, designed by Seth Forsgren and Hayk Martiros, that generates music using images of sound rather than audio.[1] It was created as a fine-tuning of Stable Diffusion, an existing open-source model for generating images from text prompts, on spectrograms.[1] This results in a model which uses text prompts to generate image files, which can be put through an inverse Fourier transform and converted into audio files.[2] While these files are only several seconds long, the model can also use latent space between outputs to interpolate different files together.[1][3] This is accomplished using a functionality of the Stable Diffusion model known as img2img.[4]
Developer(s) |
|
---|---|
Initial release | December 15, 2022 |
Repository | github |
Written in | Python |
Type | Text-to-image model |
License | MIT License |
Website | riffusion |
The resulting music has been described as "de otro mundo" (otherworldly),[5] although unlikely to replace man-made music.[5] The model was made available on December 15, 2022, with the code also freely available on GitHub.[2] It is one of many models derived from Stable Diffusion.[4]
Riffusion is classified within a subset of AI text-to-music generators. In December 2022, Mubert[6] similarly used Stable Diffusion to turn descriptive text into music loops. In January 2023, Google published a paper on their own text-to-music generator called MusicLM.[7][8]
Seamless Wikipedia browsing. On steroids.
Every time you click a link to Wikipedia, Wiktionary or Wikiquote in your browser's search results, it will show the modern Wikiwand interface.
Wikiwand extension is a five stars, simple, with minimum permission required to keep your browsing private, safe and transparent.