Share This Article
Audio generation: Nvidia has developed a new tool
Thanks to artificial intelligence development, audio generation has reached a new level. Nvidia introduced a new AI model that can create different sounds and distort voices.
Fugatto technology and its capabilities
Fugatto technology is suitable for creating a wide range of audio content, including music and sound effects for films and social media videos. The neural network follows clearly labelled instructions. For example, it can create the sound of concrete piles being driven into the ocean.
Nvidia’s Brian Catanzaro says the tool has great potential for the music industry. Modern music is created with computers. Layering effects, synthesisers, and electronic basses are all typical tools for composers. However, quality instruments are only available to professionals. With Fugatto, ordinary users who want to express their creative ideas can do so. The AI uses an extensive open-source database to generate audio content. In this way, the neural network can satisfy a wide variety of requests.
But, using this new technology involves certain risks inherent in AI models. First, we need to regulate how content is created. Generated audio can be used for illegal purposes, especially when it comes to changing a person’s voice. Nvidia is aware of this issue and is working to minimise the risks. For this reason, the technology is not yet available to the general public.

Other developments
Notably, Nvidia is not the only developer of audio generation technology. Previously, Google’s DeepMind division announced the creation of an AI that can generate soundtracks. The rapid development of video creation tools is fueling interest in audio tools. The latter are evolving at a rapid pace, with new technologies for creating video content appearing on a regular basis. Yet these tools cannot yet provide high-quality audio for video clips.
DeepMind representatives believe that their product will be able to provide full-blown movie soundtracks in the future. V2A technology has a number of features:
- Creating audio in conjunction with video. The AI generates audio tracks that perfectly match the video content, creating a harmonious sound atmosphere.
- The tool works on the basis of a diffuse artificial intelligence model. To train it, the developers used video clips as well as human dialogue.
- The AI is able to generate content from the most concise queries. A description of a few keywords is enough for the programme to create a sound effect.
The DeepMind team has not yet presented the final result of its work. According to the technology developers, V2A is not perfected and still requires improvement. In the future, however, the AI will become a fully-fledged tool for creating soundtracks.