cross pond high tech
159.9K views | +3 today
Follow
cross pond high tech
light views on high tech in both Europe and US
Your new post is loading...
Your new post is loading...
Scooped by Philippe J DEWOST
Scoop.it!

Microsoft acquires Nuance Communications

Microsoft acquires Nuance Communications | cross pond high tech | Scoop.it

Microsoft acquired the AI speech technology company Nuance for $19.7B, its second-largest purchase after it bought LinkedIn for $26B in 2016. Microsoft reportedly wants to use Nuance's tech — which includes the transcription tool Dragon — in its health-care cloud products.

More:

  • The all-cash deal is expected to boost Microsoft's voice recognition and medical computing capabilities and offerings.
  • Dragon uses deep learning to transcribe a person's speech and improve its accuracy by adapting to their voice. It can transcribe doctor's visits, customer service calls, and voicemails.
  • Nuance has been licensing the technology to companies for years. The tech formed part of the basis for Apple's Siri, which could pose as a conflict of interest between the companies if it is still involved in Siri's operation.
  • In 2019, Microsoft and Nuance announced a partnership to incorporate AI assistants into doctors' visits. They later integrated Nuance's tech into Microsoft’s Teams.
  • The tech giant plans to implement Nuance into its cloud-based health-tech products launched in 2020, such as patient monitoring systems, electronic healthcare records, and care coordination.
  • The acquisition could also allow Microsoft to integrate advanced voice recognition into services including Teams and Bing and generate transcripts, according to Bloomberg analysts.
  • Microsoft will purchase Nuance for $56 per share, a 23% premium over its closing price Friday.
Philippe J DEWOST's insight:

Microsoft raises its voice and swallows Nuance.

No comment yet.
Scooped by Philippe J DEWOST
Scoop.it!

Facebook's AI can convert one singer's voice into another

Facebook's AI can convert one singer's voice into another | cross pond high tech | Scoop.it

AI can generate storyboard animations from scripts, spot potholes and cracks in roads, and teach four-legged robots to recover when they fall. But what about adapting one person’s singing style to that of another? Yep — it’s got that down pat, too. In a paper published on the preprint server Arxiv.org (“Unsupervised Singing Voice Conversion“), scientists at Facebook AI Research and Tel Aviv University describe a system that directly converts audio of one singer to the voice of another. All the more impressive, it’s unsupervised, meaning it’s able to perform the conversion from unclassified, unannotated data it hasn’t previously encountered.

The team claims that their model was able to learn to convert between singers from just 5-30 minutes of their singing voices, thanks in part to an innovative training scheme and data augmentation technique.

“[Our approach] could lead, for example, to the ability to free oneself from some of the limitations of one’s own voice,” the paper’s authors wrote. “The proposed network is not conditioned on the text or on the notes [and doesn’t] require parallel training data between the various singers, nor [does it] employ a transcript of the audio to either text … or to musical notes … While existing pitch correction methods … correct local pitch shifts, our work offers flexibility along the other voice

Philippe J DEWOST's insight:

The End of The Voice ?

No comment yet.
Scooped by Philippe J DEWOST
Scoop.it!

Google’s AI can now translate your speech while keeping your voice

Google’s AI can now translate your speech while keeping your voice | cross pond high tech | Scoop.it
Researchers trained a neural network to map audio “voiceprints” from one language to another. The results aren’t perfect, but you can sort of hear how Google’s translator was able to retain the voice and tone of the original speaker. It can do this because it converts audio input directly to audio output without any intermediary steps. In contrast, traditional translational systems convert audio into text, translate the text, and then resynthesize the audio, losing the characteristics of the original voice along the way. The new system, dubbed the Translatotron, has three components, all of which look at the speaker’s audio spectrogram—a visual snapshot of the frequencies used when the sound is playing, often called a voiceprint. The first component uses a neural network trained to map the audio spectrogram in the input language to the audio spectrogram in the output language. The second converts the spectrogram into an audio wave that can be played. The third component can then layer the original speaker’s vocal characteristics back into the final audio output. Not only does this approach produce more nuanced translations by retaining important nonverbal cues, but in theory it should also minimize translation error, because it reduces the task to fewer steps. Translatotron is currently a proof of concept. During testing, the researchers trialed the system only with Spanish-to-English translation, which already took a lot of carefully curated training data. But audio outputs like the clip above demonstrate the potential for a commercial system later down the line. You can listen to more of them here.
Philippe J DEWOST's insight:
This is the Voice
No comment yet.