Alexa, do I need a coronavirus test?
That’s a query that almost certainly was not in the repertoire for Amazon’s voice assistant six months ago. But the ins and outs of the coronavirus outbreak are changing Alexa’s work habits, said Manoj Sindhwani, Amazon’s vice president of Alexa Speech.
“We’re certainly seeing certain shifts,” Sindhwani told GeekWire this week. “You see a lot of people asking about COVID-19. People are not asking about ‘How long will it take for me to get to work.’ ”
There are also shifts in usage patterns, due to the fact that more people are working from home. The typical before-work and after-work peaks are being stretched out into the rest of the day. “It follows more of a weekend pattern in some ways,” Sindhwani said.
Some of the features that have been added to Alexa over the past few months are now helping users cope.
Sindhwani pointed to a twist that takes advantage of deep neural networks to make Alexa’s speech sound more natural when it’s reading the news, a Wikipedia article or other long renderings of text to speech, also known as TTS. That comes in handy when users are catching up on developments in the coronavirus crisis, or when they’re having Alexa read their kids a story.
“When we think of long-form content, and how we make that content more natural-sounding, for us it is about speaking style,” Sindhwani said. “There are separate teams that are working on what content is most relevant. That’s something that my team doesn’t focus on… But what I can tell you is, a lot of our focus in TTS has been the improvement of naturalness of long-form content.”
The ears and voice of Alexa
Sindhwani says his team is all about the “ears of Alexa and the voice of Alexa” — that is, how Alexa-enabled devices make out what users are trying to say, and how those devices deliver Alexa’s cloud-based content more clearly and naturally.
Such issues were the focus of last week’s International Conference on Acoustics, Speech and Signal Processing, which had initially been planned as an in-person conference in Barcelona but was turned into a virtual conference due to the pandemic.
“One of the [lines of] research that we’re very proud of is data-efficient learning,” Sindhwani said. “Data-efficient learning is really more about how you create a lot of data, but do not start with a lot of data on day one.”
For example, take the issue of how Alexa recognizes the wake word to start listening to your queries. Under ideal circumstances, your device would hear a crystal-clear “Alexa” (or “Computer,” or “Echo”) from close up in a quiet room. But circumstances are not always ideal, as anyone who’s shouted at their Echo knows.
Sindhwani’s team has been training Alexa’s speech-recognition model to make out the wake word under more challenging conditions by augmenting the data to introduce background noise or simulating a voice as heard from far away.
Another trick takes advantage of what’s called semi-supervised learning, which helps Alexa distinguish between the actual wake word and words that can sound similar. (For example, “Lexus” as opposed to “Alexa.”)
Thanks to data-efficient learning, it can take only 10 hours’ worth of training data to get the sort of results that would otherwise require 200 to 500 hours of training, Sindhwani said.
In addition to recognizing voices, Alexa can recognize noises such as the breaking of a window, the squeal of a smoke alarm or the sounds associated with human activity. That’s all built into Alexa Guard, which can be programmed to notify you (or the authorities) if something suspicious is heard while you’re out of the house.
The flip side of a user’s interactions with Alexa has to do with how the voice assistant speaks. Sindhwani’s team has rolled out a whole retinue of speaking styles, ranging from celebrity voices (starting with Samuel L. Jackson), to a quiet whisper, to voices with an Australian lilt or an emotional edge.
The context of the conversation
Perhaps Alexa’s biggest challenge is figuring out what you didn’t say explicitly, but can be inferred from the context of the conversation. For example, suppose you say, “Alexa, what’s the weather like in Boston today?” Now suppose that after Alexa answers that question, you say, “How about tomorrow?”
The voice assistant has to have the smarts to know that you’re still talking to the device and asking about the weather in Boston. “We started thinking, how do we leverage this information to make an even more accurate determination that you’re speaking to the device?” Sindhwani said. “That was one of the papers we published.”
Smoothing out the flow of conversation, and keeping track of the context, will become increasingly important as Alexa’s skills get more complex. For example, last month the Mayo Clinic rolled out a skill that provides information about COVID-19 and takes users through a set of yes-or-no questions to determine whether they need a coronavirus test.
At the end of the questionnaire, Alexa tells you how urgent your case sounds and whether or not you should contact a health care provider. So far, the skill has gotten 4.9 out of five stars on Amazon.
Like Alexa’s skills, the marketplace for voice services is getting more complex: According to Voicebot Research, Amazon’s share of the smart-speaker market has declined from 61% in 2019 to 53% as of this January, with Google’s second-place share rising to 30.9%.
But Sindhwani said he doesn’t obsess over the competition. Instead, he thinks about “what compelling, magical experiences we can enable for customers, and what kinds of innovations we need to make that work.”
“Think about speaking styles: Nobody else was working on speaking styles,” Sindhwani said. “We just thought that our voice is very, very natural, but as you start thinking of long-form content, when you’re telling customers news, it will sound even more natural if it’s spoken like a newscast — because that’s how people consume news. So, we tend to work in that mode.”
More from GeekWire:
- Amazon Alexa vs. Google Assistant, Round 2: Digital assistants duke it out in the desert
- Alexa gets smarter about combining AI skills, starting with dinner and a movie
- Alexa, who’s laughing now? Five years after debut, Amazon’s voice assistant defies early critics
- New Amazon feature lets users tell Alexa to delete recent voice recordings and commands