Voicebox is Meta’s new speech-related AI model

Generative AI has been the talk of the town lately and every tech firm is trying to do something around it. Meta has now announced a new generative AI model called ‘Voicebox’ for doing several speech-related tasks.

It’s not your regular text-to-speech tool, but rather an AI tool that’s claimed to help content creators, the visually impaired, and other people converse in foreign languages.

Mark Zuckerberg on the Meta Channel said that’s an AI model ‘that can do tasks it wasn’t specifically trained on’. The company believes this AI tool to be a breakthrough in generative AI for speech.

“Today, we’re announcing a breakthrough in generative AI for speech. We’ve developed Voicebox, a state of the art AI model that can perform speech generation tasks — like editing, sampling and stylizing — that it wasn’t specifically trained to do through in-context learning,” noted the blog post.

As mentioned above, the AI model can do an array of tasks from editing audio to sampling and stylizing. Following are some of the many things the AI model can do.

– Diverse text-to-speech

– Style transfer

– Content correction

– In-context text-to-speech

– Noise removal

Meta revealed that Voicebox can produce high-quality audio clips and edit pre-ordered audio. This includes removing background noise and preserving the style of the audio.

It is worth noting that the AI model is still in testing and is said to do a lot more in the future.

“In the future, multipurpose generative AI models like Voicebox could give natural-sounding voices to virtual assistants and non-player characters in the metaverse. They could allow visually impaired people to hear written messages from friends read by AI in their voices, give creators new tools to easily create and edit audio tracks for videos, and much more.”

In other news about Meta, the company announced an AI-powered music generator model earlier this week. Named MusicGen, it can generate music using text and melody.

“We present MusicGen: A simple and controllable music generation model. MusicGen can be prompted by both text and melody. We release code (MIT) and models (CC-BY NC) for open research, reproducibility, and for the music community,” tweeted Felix Kreuk, Research Engineer at Meta AI research.

MusicGen is said to be trained on 20,000 hours of music, which includes 10,000 high-quality licensed music tracks and 3,90,000 instrument-only tracks from Shutterstock and Pond5 stock media libraries.

The post Voicebox is Meta’s new speech-related AI model appeared first on Techlusive.