• Mon. Jul 1st, 2024

DeepMind’s ‘V2A’ AI technology generates video soundtracks from pixels and text prompts

By

Jun 24, 2024

Google’s DeepMind team has developed new technology that can generate soundtracks for videos using video-to-audio (V2A) technology. This technology can create music, sound effects, and speech both from text prompts and from the video’s pixels. This advancement opens up new possibilities for soundtrack composers and can be applied to both automatic video generation services and existing footage such as archive material and silent movies.

One interesting aspect of this technology is the ability to input both ‘positive prompts’ to guide the audio in a certain direction and ‘negative prompts’ to avoid certain elements. This means that users can create a wide variety of different soundtracks for the same video clip. The system can also generate audio using just video pixels, eliminating the need for text prompts if desired.

While V2A currently has some limitations, such as the quality of the audio being dependent on the video quality and imperfect lip synchronization when generating speech, Google DeepMind is working on further research to address these issues. To learn more and see additional examples of V2A technology in action, visit the Google DeepMind website.

If you’re interested in staying up to date on the latest music and gear news, reviews, deals, and features, you can sign up to receive updates directly to your inbox.

By

Leave a Reply