Nvidia unveils AI model for audio modification and generation

Nvidia stated that it does not have immediate plans to publicly release Fugatto. Credit: Below the Sky / Shutterstock.

NVIDIA has unveiled Fugatto, an AI model designed to modify voices and generate new sounds, aimed at music, film, and video game producers, Reuters reported.

The AI model, which stands for Foundational Generative Audio Transformer Opus 1, can create sound effects and music from text descriptions.

Based in California, US, Nvidia has stated that it does not have immediate plans to publicly release Fugatto.

The technology joins similar advancements from startups such as Runway and larger companies namely Meta Platforms, which generate audio or video from text prompts.

The ability to modify existing audio is said to set apart Fugatto. It can transform a piano line into a human voice or change the accent and mood of spoken words, the news publication stated.

This capability is said to distinguish the new model from other AI technologies available currently.

Access the most comprehensive Company Profiles
on the market, powered by GlobalData. Save hours of research. Gain competitive edge.

Company Profile – free
sample

Thank you!

Your download email will arrive shortly

We are confident about the
unique
quality of our Company Profiles. However, we want you to make the most
beneficial
decision for your business, so we offer a free sample that you can download by
submitting the below form

By GlobalData

Tick here to opt out of curated industry news, reports, and event updates from Verdict.

Visit our Privacy Policy for more information about our services, how we may use, process and share your personal data, including information of your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.

Nvidia’s model is said to be trained on open-source data, and the company is still considering how to release it publicly.

Nvidia applied deep learning research vice-president Bryan Catanzaro said: “If we think about synthetic audio over the past 50 years, music sounds different now because of computers, because of synthesisers.

“I think that generative AI is going to bring new capabilities to music, to video games and to ordinary folks that want to create things.”

Generative AI creators face challenges in preventing misuse such as generating misinformation or infringing on copyrights.

OpenAI and Meta have also not announced public release dates for their audio or video-generating models.

Catanzaro added: “Any generative technology always carries some risks because people might use that to generate things that we would prefer they don’t “We need to be careful about that, which is why we don’t have immediate plans to release this.”

Last week, Nvidia collaborated with protein sequencing technology provider Quantum-Si to develop its proteomics platform, Proteus, with AI and accelerated computing.