Stability AI’s new text-to-audio tool is like a Midjourney for music samples

Stability AI is taking its generative AI tech into the world of music as the developer has launched a new text-to-audio engine called Stable Audio.

Similar to the Stable Diffusion model, Stable Audio can create short sound bites based on a simple text prompt. The company explains in its announcement post that the AI was trained on content from the online music library AudioSparx. It even claims the model is capable of creating “high-quality, 44.1 kHz music for commercial use”. To put that number into perspective, 44.1 kHz is considered to be CD quality audio. So it’s pretty good but not the greatest.

(Image credit: Stability AI)

A free version of Stable Audio is currently available to the public where you’re allowed to generate and download 20 individual tracks a month. Each sound bite has a 45 second runtime so they won’t be very long.

Prompting music

The text prompts you enter can be simple inputs. Listening to the samples provided by Stability AI, “Car Passing By” sounds exactly as the title suggests – a car driving by in the distance although it is a little muffled. Conversely, you can also stack on details. One particular sample has a prompt involving Ambient Techno, an 808 drum machine, claps, a synthesizer, the word “ethereal”, 122 BPM, and a “Scandinavian Forest” (whatever that means). The result of this word combination is an ambient lo-fi hip-hop beat.

We took Stable Audio out for a quick spin. We were able to enter one prompt asking the AI to create a fast-paced garage rock song from the early 2000s and it sort of accomplished the goal. The generated track matched the style although it sounded really messy.

(Image credit: Future)

Unfortunately, we couldn’t go any further besides the single input. At the time of this writing, Stable Audio is seeing a huge influx of traffic from people rushing in to try out the model. The developer recommends trying again later or the next day if you’re met with nothing but a blank screen.

There is a catch with the free version – it’s for non-commercial use only. If you want to use the content commercially, then you’ll have to purchase the $12 Stable Audio Professional monthly plan. It also offers 500 track generations a month, each with a duration of up to 90 seconds. There’s an Enterprise plan too for custom audio duration and monthly generations. You will, however, have to contact Stability AI first to set up a plan.

Imperfect tool

Do be aware the technology isn’t perfect. The content sounds fine for the most part, however certain aspects will seem off. The mix in that Ambient Techno song mentioned earlier isn’t very good in our opinion. It was like the bass and synthesizer are fighting over what will be the dominant sound, resulting in just noise. Additionally, it doesn’t appear the AI can do vocals. It only does instrumentals.

Stable Audio is interesting for sure, but not something that should be totally relied on. We should note the company is asking for feedback from users on how to improve the AI. A contact email can found on the official announcement page.

If you plan on utilizing this tech for your own purpose, we recommend checking TechRadar’s list of the best audio editors for 2023 to fix any flaw you might come across.

Stable Doodle AI can turn doodlers into artists Best AI art generators in 2023 compared Google Bard’s upcoming extensions could make it my new travel buddy