ChatGPT has been in the news for the past few months, touted as a revolutionary step forward for AI in general and search in particular. ZDnet.com has a good overview of what it is:

ChatGPT is a natural language processing tool driven by AI technology that allows you to have human-like conversations and much more with a chatbot. The language model can answer questions, and assist you with tasks such as composing emails, essays, and code.

A more advanced version of ChatGPT powers Microsoft’s new Bing search features, and Google is prototyping their own AI with Bard. Both can appear to be true AI, with answers surprisingly complete and reliable, or prime examples of the limitations of the technologies. The latter is what I want to focus on.

Personal Assistants

The era of AI personal assistants was ushered in 12 years ago with Siri on the iPhone. Alexa and Google Assistant followed, but after all this time they remain a promise unfulfilled. Why?

Language Barrier

Personal assistants require that we learn how to talk to them, and there is no Rosetta Stone. Supposedly, the more you use an assistant the better it gets at understanding you, but you still have to figure out how to properly word your requests, and there are functions that are simply not supported. I can ask Siri to “remind me to complete my taxes” and it will create a task in my default list, but if I ask it to “delete the task complete my taxes” or “complete the task complete my taxes” I’m told that’s not possible.

They also do not do well with multiple requests. You could tell a person to turn off the lights and lower the thermostat before coming to bed, but a personal assistant would need those requests split into at least two separate commands.

The most maddening aspect of this is that even when you know a personal assistant can do something, you may not be able to speak to it in a way to get the results you want.

Lack of Contextual Awareness

Your phone can provide general location information, and smart speakers can be assigned to rooms in your home, but assistants still struggle to know where you are. If I’m in the living room and I ask Siri to turn on the lights, if the HomePod in the living room is the device that hears me then the living room lights will come on. However, if the HomePod in my bedroom hears me first, it will turn on the bedroom lights instead. I can get around that by specifying the room (e.g. “turn on the living room lights”), but that’s not needed when people talk to each other because we have an awareness beyond the request alone.

I think this example illustrates the issue quite well. One day I was looking at a laundry basket with clean socks that needed to be put away. It wasn’t something I was likely to forget, but I asked Siri to create a reminder anyway, and what I saw in my list later that day was “put away sucks.” The sentiment may be accurate, but obviously Siri didn’t know I was looking at a basket of socks. All it could do was transcribe what it thought were my words without any other context.

The lack of contextual awareness has been infuriating when trying to control music in the car (and there may be a language barrier here too). I have cellular service turned off for music so that it will only play what’s already on my phone. If I ask Siri to play music from an artist, it will always try to stream from Apple Music as opposed to playing what’s downloaded on my device and available offline. This causes the request to fail. No mater what I try I cannot get Siri to play only the music on my phone without using a playlist. The assistant should know that cellular data is off and that it can only play what’s on my phone.

Lack of Understanding

Personal assistants don’t actually understand our words, or at least not in the way people do. Autocorrect is a great example of this, and an article by Jason Cross for MacWorld describes the situation perfectly:

Whether you’re talking about voice assistants like Siri or Alexa, voice dictation, or autocorrect, tech companies like to say they’re employing “natural language processing.”

But true natural language processing remains beyond the reach of any of these consumer systems. What we’re left with is a machine-learning-powered statistical analysis of the parts of speech that is almost entirely devoid of semantic meaning.

Consider the following: “Go down to the corner store and get me a stick of butter. Make sure it’s unsalted.”

If I were to ask someone what “it” refers to, anyone would immediately know I’m referring to the butter, even though, grammatically, “it” could just as well refer to the store. But who ever heard of an unsalted store? If we change that second sentence to “Check that it’s open today,” we know “it” refers to the store.

This is pretty trivial stuff for humans, but computers are terrible at it, because language systems are built without an understanding of what words actually mean, only which types of words they are and how they are spelled.

I feel like I’m frequently fighting against the AI, and that it will change a correct word to something semantically incorrect more often than I’d like. (“Well” and “we’ll” are prime examples of autocorrect battles I fight regularly.)

An Unknown Future

The language barrier, lack of contextual awareness, and lack of understanding come together to create a lack of reliability. If something isn’t reliable, it can’t be trusted, and that’s the greatest hurdle to clear. Right now, I feel like the bar for what we are confident personal assistants can do remains pretty low, and it’s been there for a while. Getting a weather report or driving directions, adding a task, making a call, finding out a sports score—they’re all helpful things but no longer revolutionary. There are plenty of areas for incremental improvement that Apple, Google, and Amazon should be pursuing today, but I’m not sure they are priorities.

There will be another big leap forward, but after 12 years of modest gains I couldn’t guess when that will occur. I think it will require something beyond what we currently have and I wouldn’t be surprised if it doesn’t involve AR glasses. Many doors could be opened if an assistant could “see” what we see.