The Future Is Now

VALL-E X: The Most Dangerous Scammy AI Voice Cloning Tool Now Open Source

VALL-E X: The Most Dangerous Scammy AI Voice Cloning Tool Now Open Source

An open-source implementation of Microsoft’s VALL-E X zero-shot TTS model has been unveiled, allowing users to delve into the realms of advanced text-to-speech synthesis and voice cloning. This development comes as an expansion of Microsoft’s initial research paper, which lacked the code or pre-trained models necessary for hands-on exploration. With this release, the technology community gains access to a powerful tool for next-generation TTS capabilities.

VALL-E X: The Most Dangerous Scammy AI Voice Cloning Tool Now Open Source

VALL-E X is an exceptional multilingual text-to-speech model introduced by Microsoft. While the original research paper was informative, it lacked practical application due to the absence of code or pre-trained models. To bridge this gap, the dedicated team took on the challenge of reproducing the results and training our own VALL-E X model. The result of our endeavors is now available to the public, enabling a broader audience to experience the transformative potential of cutting-edge TTS technology.

VALL-E X is marked by several groundbreaking functionalities:

Moreover, VALL-E X extends its support to Chinese and Japanese languages, boasting exceptional performance across all three languages.

The voice cloning capabilities of VALL-E X facilitate the creation of voice prompts using a person’s, character’s, or one’s own voice. A speech sample of 3 to 10 seconds, along with the transcript, is all that’s needed to craft a distinct voice prompt. A user-friendly graphical interface further simplifies interactions with VALL-E X, rendering voice cloning and multilingual speech synthesis an accessible endeavor.

Notably, VALL-E X operates seamlessly on both CPU and GPU (pytorch 2.0+, CUDA 11.7, and CUDA 12.0). The model’s efficient design ensures that a GPU VRAM of 6GB is sufficient for operation without offloading.

In comparison to the Bark model, VALL-E X offers several advantages:

Regarding VRAM requirements, a 6GB GPU VRAM meets the criteria for running VALL-E X effectively. However, for longer text generation, the total length of the audio prompt and the generated audio must remain below 22 seconds to ensure optimal performance.

The open-source licensing of VALL-E X, governed by the MIT License, signifies a new era of accessibility and exploration in the realm of multilingual text-to-speech synthesis and voice cloning.

Source: mPost

Share this article
Shareable URL
Prev Post

From Minecraft to Cognitive Mastery: OpenAI’s GPT Models Redefine Learning

Next Post

Virtue Poker Relaunches With Focus on Connecting NFT Communities

Leave a Reply

Your email address will not be published. Required fields are marked *

Read next