Google Introduces MusicLM: AI Model Generates High-Fidelity Music from Text

Google has developed MusicLM, an AI model that generates music from text. It was trained on 280,000 hours of music and creates music at 24 kHz.

The model creates 5-minute pieces from simple text or 30-second pieces from more detailed descriptions, and can even create a musical story based on existing melodies.

AI-generated music has a long history, including writing hit songs and enhancing live performances. 

Contrary to text-to-image machine learning, where it is claimed that large datasets have contributed significantly to recent advancements, there are hurdles for AI music related to the absence of coupled audio and text data.

For instance, Stable Diffusion and OpenAI's DALL-E tool have both sparked a surge in interest from the general public.

Also the fact that music is structured along a temporal dimension presents another difficulty in AI music generation.

Consequently, compared to using a description for a still image, it is far more difficult to convey the intention of a music track using simple text.

Google is being cautious with MusicLM, as with past AI endeavors, and has no plans to release the model.

Other