Looking for a Choice of Voices in A.I. Technology by Quentin Hardy

Looking for a Choice of Voices in A.I. Technology by Quentin Hardy

Do we, for example, associate the stereotypical voice of an English butler — think of Jarvis the computer in “Iron Man” — with a helpful and intelligent person? And why do so many people want to hear a voice that sounds like it came from a younger woman with no discernible accent?

Choosing a voice has implications for design, branding or interacting with machines. A voice can change or harden how we see each other. Where commerce is concerned, that creates a problem: Is it better to succeed by complying with a stereotype, or risk failure in the market by going against type?

For many, the answer is initially clear. Microsoft’s artificially intelligent voice system is Cortana, for example, and it was originally the voice of a female character in the video game “Halo.”

“In our research for Cortana, both men and women prefer a woman, younger, for their personal assistant, by a country mile,” said Derek Connell, senior vice president for search at Microsoft. In other words, a secretary — a job that is traditionally seen as female.

Last week, Google introduced a number of voice-based products, including Google Home, its version of Echo. All of them use Google Assistant, which also speaks in tones associated with a young, educated woman.

Google Assistant “is a millennial librarian who understands cultural cues, and can wink at things,” said Ryan Germick, who leads the personality efforts in building Google Assistant. “Products aren’t about rational design decisions. They are about psychology and how people feel.”

The company has had internal debates about whether to respond differently on questions to the computer about suicide, Mr. Connell said. “We’ve leaned to providing information about suicide prevention everywhere,” he said, as opposed to offering no advice at all.

But sometimes, if you want people to figure out quickly that they are talking to a machine, it can be better to have a man’s voice. For example, IBM’s Watson, when it talks to Bob Dylan in television commercials, has a male voice. When Ashok Goel, a professor at the Georgia Institute of Technology, adapted Watson to have a female voice as an informal experiment in how people relate to conversational machines, his students couldn’t tell it was a computer.

But Watson’s maleness is the exception. Amazon’s A.I. technology is another in the comforting female voice camp.

“Alexa was always an assistant, and female,” said Peng Shao, who worked at Amazon on the Echo and is now at a Seattle start-up, building another speech-based A.I. system. Amazon would not comment on its product.

Gender is just the starting point. Can your A.I. technology understand accents? And can it respond in a way that feels less robotic and at least mimics some sort of human empathy?

“You need a persona,” Mr. Shao said. “It’s a very emotional thing — people would get red, even get violent, if it didn’t understand them. When it did understand them, it felt like magic. They sleep next to them. This is heading for hospitals, senior care, a lot of sensitive places.”

Capital One developed a banking app on Alexa, and found it had to dial down the computer’s formality to make people comfortable talking about their finances with a computer.

“Money is inextricably linked to emotion, enabling and preventing things in your life,” said Stephanie Hay, the head of content strategy, culture and A.I. design at Capital One. At first the app said, “Hello,” but that seemed too tense. “‘Hi, there’ worked better,” she said. “She’s my friend, hanging out with me in the kitchen. I need her to be reliable and approachable, but not invasive.”

We don’t just need that computerized voice to meet our expectations, said Justine Cassell, a professor at Carnegie Mellon’s Human-Computer Interaction Institute. We need computers to relate to us and put us at ease when performing a task. “We have to know that the other is enough like us that it will run our program correctly,” she said.

That need seems to start young. Ms. Cassell has designed an avatar of indeterminate race and gender for 5-year-olds. “The girls think it’s a girl, and the boys think it’s a boy,” she said. “Children of color think it’s of color, Caucasians think it’s Caucasian.”

Another system she built spoke in what she termed “vernacular” to African-American children, achieving better results in teaching scientific concepts than when the computer spoke in standard English.

When tutoring the children in a class presentation, however, “we wanted it to practice with them in ‘proper English.’ Standard American English is still the code of power, so we needed to develop an agent that would train them in code switching,” she said.

And, of course, there are regional issues to consider when creating a robotic voice. For Cortana, Microsoft has had to tweak things like accents, as well as languages, and the jokes Cortana tells for different countries.

If a French driver goes into Germany using driving directions voiced by Nuance Communications, the computer will mispronounce the name of a German town with a French accent. The idea is to keep the driver confident by sustaining the illusion that the computer is French.

Local accents can be found in various versions of Apple’s Siri. It’s possible to localize the accent on an iPhone for the United States (“Samantha,” on to the phone’s settings), Australia (“Karen”), Ireland (“Moira”), South Africa (“Tessa”), and Britain (“Daniel.”) Apple could not say whether the English tradition of male butlers influenced its British choice.

Mr. Mars’s company, called Clinc, makes personal financial smartphone software that answers questions like “how much can I spend on a computer?” It relies on a similar Google-created female voice.

He is hoping for enough success that he can eventually test and counter stereotypes with unexpected A.I. voices. “You need to be at a certain size before you can address these questions,” said Mr. Mars, who teaches at the University of Michigan.

But maybe not too big. “I think consumers will eventually be open to exploring different voices and types,” he said. “Companies, they’ll probably stay conservative about it.”


No Comments

Sorry, the comment form is closed at this time.