Microsoft opens limited access to its neural text-to-speech AI

Kris Holt
·Contributing Writer
·1 min read

Microsoft is opening up limited access to a text-to-speech AI called Custom Neural Voice, which allows developers to create custom synthetic voices. The tech is part of an Azure AI service called Speech. Companies can use the tech for things like voice-powered smart assistants and devices, chatbots, online learning and reading audiobooks or news. They’ll have to apply for access and gain approval from Microsoft before they can harness Custom Neural Voice.

The tech can deliver more natural-sounding voices than many other text-to-speech services, according to Microsoft. Custom voices use a bank of sounds, or phonemes, to create voice fonts. Custom Neural Voice uses multiple neural networks in an attempt to make sure the prosody (the tone and duration of each phoneme) and pronunciation is accurate. That helps the AI to mimic an actor's voice correctly or use a realistic-sounding synthetic voice.

Several companies are already using the tech, including AT&T and Warner Bros. They recently installed a system at the AT&T Experience Store in Dallas, where people can interact with Bugs Bunny. Using a combination of Custom Neural Voice, augmented reality and 5G, Bugs can chat with customers in real time and move around the store to help them find a hidden golden carrot.

Eric Bauza, the actor who currently voices Bugs, recorded more than 2,000 lines and phrases with the help of Microsoft to create a voice font. Warner Bros. and Microsoft worked together to create a custom voice that taps into the character's personality and inflections. Duolingo has also used Custom Neural Voice to create quirky characters to help people learn new languages, while Progressive has wrangled the tech for its Flo chatbot.