More Thoughts on Artificial Intelligence and Accessibility

Last week, MacRumors reported Apple will soon announce a Echo-like speaker product with Siri built in. The story cites analyst Ming-Chi Kuo of KGI Securities, who wrote in a note that “we believe there is an over 50% chance that Apple will announce its first home AI product at WWDC in June.” Kuo goes on to say Apple’s “Siri Speaker” features “excellent acoustics performance” and computing power comparable to the iPhone 6s.

Like others, I’m happy to learn about this rumor; it’s a product I’m certainly interested in using. As someone who’s long been knee-deep in Apple’s ecosystem, being able to listen to Apple Music or podcasts in the kitchen through this speaker is something I’d really enjoy. (I listen to The Daily podcast through my phone’s speaker every morning while making breakfast. It’s perfectly fine, but a Siri-in-a-box would be even better.) Given the popularity of Amazon’s Echo products with several of my colleagues and other members of the Apple community, there’s clearly appeal in something like this. If rumors are indeed true and a Siri box is forthcoming, I like the idea of Apple entering the space currently occupied by the Echo, Google Home, and now Microsoft.

During Amazon’s Black Friday sale last year, I bought an Echo Dot because of the aforementioned enthusiasm for the device from friends and colleagues, and I was curious. My logic for getting the Dot was simple: I didn’t want to pay $149 to try the product. At $49, the Dot is a small, inexpensive way to introduce someone to the platform. An introduction was exactly what I wanted, so it was the perfect choice for my needs.

Almost 6 months later, I have to admit the Dot hasn’t even been plugged in for a while. I like the thing conceptually, but so much of its appeal and utility is tied (rightfully so) to Amazon’s ecosystem—Prime Music, audiobooks, buying, etc. Aside from my Prime membership, which I’ve had since 2011 and love, I’m just not into Amazon’s content offerings. (Prime Video is an exception.) What’s more, my house doesn’t have any smart appliances, so asking Alexa to switch on/off the lights, for example, is out of the question. In short, because I’m not a heavy Amazon user, the Echo’s appeal is limited. Conversely, an Apple-branded device that integrates with Apple Music and Overcast and HomeKit (and whatever else) has immense appeal because, again, I’m wading knee-deep in Apple’s waters.

The accessibility of these "voice-first interfaces," as Ben Bajarin describes them, is somewhat of a Jekyll and Hyde situation. I've written about this phenomenon before, always lamenting how frustrating it is communicating with Siri when you have a speech impediment. While Siri has gotten much better over time in this regard, there is work to be done. The accessibility of voice-based assistants like Siri and Alexa is a vastly overlooked aspect of these devices. As accessibility coverage creeps more into the mainstream, reviewers (and companies!) must be cognizant of the effects these assistants have on people with speech delays. They're disabilities too.

This matters because, despite the frustrations I've felt trying to talk to Siri over the years, voice has an incredible amount of upside as an assistive technology. The usefulness already is apparent to me. I like to cook, and I invoke Siri all the time to set timers. Likewise, I like asking Siri for quick updates of sports scores, which is easier than looking in MLB At Bat, for instance. In broader terms, voice is a obvious use case for, say, accessing HomeKit devices. For people (like me) who have physical motor impairments, manipulating door locks or light switches can be tricky or a literal pain. Thus, asking Siri to perform these tasks is a ideal solution because all you need to do is ask.

The Echo Dot, in particular, has some accessibility benefits of its own. For one thing, its voice parser seems to be pretty good at deciphering my stutter. I’ve never had to repeat myself fifty times because the device couldn’t understand me. For another, I appreciate the blue ring on the top of the device when you say the wake word (“Alexa” for me). It’s a nice visual cue that Alexa is listening to you. I’d like Apple’s speaker to have such a cue—it’s a small detail that makes the user experience better.

Overall, though, an accessible Siri Speaker (or Echo or whatever) is to be at least adequately competent at parsing your speech. The voice is the whole ballgame for this class of devices—if stutterers like me have difficulty, it’s game over.

Here’s Sonia Paul, in a piece for Backchannel on voice-driven UIs and accents (emphasis mine):

To train a machine to recognize speech, you need a lot of audio samples. First, researchers have to collect thousands of voices, speaking on a range of topics. They then manually transcribe the audio clips. This combination of data — audio clips and written transcriptions — allows machines to make associations between sound and words. The phrases that occur most frequently become a pattern for an algorithm to learn how a human speaks.

But an AI can only recognize what it’s been trained to hear. Its flexibility depends on the diversity of the accents to which it’s been introduced.

I hope Apple and Amazon and other companies are investing in training Siri and her ilk to learn speech impediments. If voice is the future, as many in the commentariat believe it to be, then accessibility must be looked at differently. In the same way the iPhone made computing more accessible to me, I hope someday to have a voice assistant that I can talk to naturally.

The Echo Dot isn’t for me, but Apple’s competitor may very well will be. Its accessibility story will be interesting, and if Ming-Chi Kuo is right, then I can’t wait for WWDC to hear it.