• Home
  • About
  • Contact Us
  • Privacy Policy

Technic News

The Latest in Technology

  • New Technology
  • Cool Gadgets
  • Latest Tech & Gadgets
  • Tech & Gadget Reviews
  • Tech & Gadget News
  • Gadgets Shop

Microsoft’s VALL-E AI can mimic any voice from a short audio sample

Microsoft has shown off its latest research in text-to-speech AI with a model called VALL-E that can simulate someone’s voice from just a three-second audio sample, Ars Technica has reported. The speech can not only match the timbre but also the emotional tone of the speaker, and even the acoustics of a room. It could one day be used for customized or high-end text-to-speech applications, though like deepfakes, it carries risks of misuse. 

VALL-E is what Microsoft calls a “neural codec language model.” It’s derived from Meta’s AI-powered compression neural net Encodec, generating audio from text input and short samples from the target speaker.

In a paper, researchers describe how they trained VALL-E on 60,000 hours of English language speech from 7,000-plus speakers on Meta’s LibriLight audio library. The voice it attempts to mimic must be a close match to a voice in the training data. If that’s the case, it uses the training data to infer what the target speaker would sound like if speaking the desired text input.

Microsoft's VALL-E AI can simulate any person's voice from a short audio sample
Microsoft

The team shows exactly how well this works on the VALL-E Github page. For each phrase they want the AI to “speak,” they have a three-second prompt from the speaker to imitate, a “ground truth” of the same speaker saying another phrase for comparison, a “baseline” conventional text-to-speech synthesis and the VALL-E sample at the end. 

The results are mixed, with some sounding machine-like and others being surprisingly realistic. The fact that it retains the emotional tone of the original samples is what sells the ones that work. It also faithfully matches the acoustic environment, so if the speaker recorded their voice in an echo-y hall, the VALL-E output also sounds like it came from the same place. 

To improve the model, Microsoft plans to scale up its training data “to improve the model performance across prosody, speaking style, and speaker similarity perspectives.” It’s also exploring ways to reduce words that are unclear or missed.

Microsoft elected to not make the code open source, possibly due to the risks inherent with AI that can put words in someone’s mouth. It added that it would follow its “Microsoft AI Principals” on any further development. “Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating,” the company wrote in the “Broader impacts” section of its conclusion.

Brought to you by USA Today Read the rest of the article here.

  • Facebook
  • Twitter
  • Pinterest

Filed Under: Tech & Gadget News

  • Email
  • Facebook
  • YouTube

www.sicherversichert.de

www.service-hotel-24.com

www.virtutea.com

www.my-fly.club 

www.1-2-holiday.com

www.women-fashion-online.com

www.amer.de

www.cupado.de

Recent Posts

  • Intel reports Q4 revenue down 32% YoY to $14B vs. $14.46B est., with Data Center and AI down 33%, and a Q1 2023 revenue guidance below estimates; stock down 7%+ (Chavi Mehta/Reuters) January 26, 2023
  • ChatGPT (barely) passed graduate business and law exams January 26, 2023
  • NYC-based Pearl Health, which helps primary care doctors using data science, raised a $75M Series B led by a16z and Viking, bringing its total funding to $100M+ (AlleyWatch) January 26, 2023
  • New York City Mayor Eric Adams says Uber and Lyft will be required to have zero-emissions fleets by 2030, potentially affecting over 100K for-hire vehicles (Andrew J. Hawkins/The Verge) January 26, 2023
  • DOJ says it disrupted a major global ransomware group January 26, 2023

Copyright © 2023 · Designed by Amaraq Websites

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Cookie settingsACCEPT
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.