Nvidia, known for its dominant role in driving the AI revolution through its powerful GPUs, has now taken a step further by launching its own AI model, Fugatto. While Nvidia has primarily been providing the hardware behind AI advancements, Fugatto marks the company’s deeper dive into the AI world by offering a groundbreaking model that transforms audio in ways that were previously unimagined.
Fugatto is an advanced AI architecture with a staggering 2.5 billion parameters, built on over 50,000 hours of annotated audio data. The model uses a novel AI technique called Composable ART (Audio Representation Transformation), which allows it to manipulate and combine sound properties in a completely new way. By utilizing text or audio prompts, Fugatto can generate entirely new sound combinations that go beyond anything present in the original data.
For example, Fugatto can generate audio like a violin that mimics the sound of a laughing child or a factory machine that screeches with metallic anguish. The model offers incredible flexibility, allowing users to fine-tune specific aspects of the sounds, such as adjusting the emotional tone of a voice or even altering the level of sadness in a recording.
Beyond these experimental and creative transformations, Fugatto is also capable of executing common AI audio tasks like isolating vocals from music, modifying emotional content in speech, and adapting musical instruments to new sound sources. The possibilities are truly endless with this innovative AI model.
To dive deeper into the specifics of Fugatto, Nvidia offers a detailed white paper, and you can also explore various sound examples and emerging tasks on the Fugatto page. This is an exciting step forward in the AI-powered audio landscape.