Google Develops Human-Akin Text-To-Speech AI
In a main decision towards its “AI first” vision, Google has designed a “text to speech” AI (artificial intelligence) network that will puzzle you with its human-akin expression. The “text to speech” system of the tech giant dubbed as “Tacotron 2” gives an AI-developed computer speech that approximately goes with the tone of humans, claimed Inc.com, technology news site, this week.
At the developers conference of Google I/O 2017, Sundar Pichai, the Indian-origin CEO of company, declared that the Internet major was moving its aim to “AI first” from mobile-first and rolled out various features and products, comprising Smart Reply and Google Lens for Google Assistant and Gmail for iPhone. As per a paper posted in arXiv.org, the system first makes a spectrogram of the content and a visual depiction of how the text must sound.
That picture is put via current WaveNet algorithm of Google, which employs the picture and conveys AI nearer than ever to undetectably mirroring human voice. The algorithm can simply make artificial breaths and even learn various voices. “Our model attains a MOS (mean opinion score) of 4.53 in comparison to a MOS of 4.58 for efficiently recorded speech,” the scientists claimed to the media in an interview this week.
Based on its samples for audio, Google said that “Tacotron 2” can sense from context the dissimilarity between the verb “desert” and the noun “desert,” as well as the verb “present” and the noun “present,” and change its pronunciation for that reason. It can position stress on capitalized words and imply the appropriate inflection when asking a query more willingly than making a statement, the firm claimed to the media.
In the mean time, engineers of Google did not disclose much data but they left a huge sign for coders to understand how far they have arrived in designing this system.