Smart Speakers, Speech Recognition, and Accessibility

Since the HomePod started shipping last week, I’ve taken to Twitter on multiple occasions to (rightfully) rant about the inability of Siri—and its competitors—to parse non-fluent speech. By “non-fluent speech,” I’m mostly referring to stutterers because I am one, but it equally applies to others, such as deaf speakers.

This is a topic I’ve covered before. There has been much talk about Apple’s prospects in the smart speaker market; the consensus seems to be the company lags behind Amazon and Google because Alexa and Google Home are smarter than Siri. What is missing from these discussions and from reviews of these products is the accessibility of a HomePod or Echo or Sonos.

As I see it, this lack of consideration, whether intentional or not, overlooks a crucial part of a speaker product’s story. Smart speakers are a unique product, accessibility-wise, insofar as the voice-first interaction model presents an interesting set of conditions. You can accommodate for blindness and low vision with adjustable font sizes and screen readers. You can accommodate physical motor delays with switches. You can accommodate deafness and hard-of-hearing with closed captioning and using the camera’s flash for alerts.

But how do you accommodate for a speech impairment?

This is a difficult, esoteric issue. It’s hard enough to teach a machine to fluently understand normal speech patterns; teaching machines to understand a stutterer (or even accents) is a nigh impossible task. Yet it must be done—speech delays are disabilities too, and to not acknowledge the issue is to do those of us with such a disability a gross disservice. Smart speakers are effectively inaccessible otherwise, because the AI just isn’t good enough at deciphering your speech. You become so frustrated that you don’t want to use the product. The value proposition is diminished because, well, why bother? If you have to repeat yourself over and over, all the skills or SiriKit domains in the world mean shit if you can’t communicate.

To be clear, speech recognition is an industry-wide issue. I focus on Apple because I’m entrenched in the company’s ecosystem, but it isn’t their burden to bear alone. I bought an Echo Dot on a lark in late 2016 to try it out, and Alexa isn’t markedly better in this regard either. It would behoove Apple and its competition to hire speech and language pathologists for its Siri/Alexa/Google team, if they haven’t already. Such a professional would provide valuable insight into different types of speech and how to best work with them. That’s their job.

The reason I am pushing so hard on this topic is not only that I have a personal stake in the matter. The truth is voice has incredible potential for accessibility, as I reported for TechCrunch last year. For someone with physical motor disabilities, using your voice to control HomeKit, for instance, makes smart home devices infinitely more accessible and enjoyable. That’s why this perspective matters so much.

Personally, I can’t wait to try HomePod. Of course SiriKit needs to be improved with more capability. But for me, capabilities mean little if I can’t get Siri to understand my commands. Apple’s track record regarding accessibility gives me hope they’ll fare better at solving the problem. For this reason alone, I don’t believe they’re as far behind in the game as conventional wisdom says it is. I want to enjoy talking to my computers as much as anyone else, but it needs this first.