Human language isn’t the best way to chat with Siri or Alexa, probably

The year is 2019. Voice-controlled digital assistants are great at simple commands such as “set a timer…” and “what’s the weather?”, but frustratingly little else.

Human language seems to be an ideal interface for computer systems; it is infinitely flexible and the user already knows how to use it! But there are drawbacks. Computer systems that aim to understand arbitrary language are really hard to build, and they also create unrealistic expectations of what the system can do, resulting in user confusion and disappointment.

The next frontier for voice assistants is complex dialogue in challenging domains such as managing schedules, analysing data, and controlling robots. The next generation of systems must learn to map ambiguous human language to precise computer instructions. The mismatch between user expectations and system capabilities is only worsened in these scenarios.

What if we could preserve the familiarity of natural language, while better managing user expectations and simplifying the system to boot? That’s exactly what my student Jesse Mu set out to study. The idea was to use what we called a restricted language interface, one that is a well-chosen subset of full natural language.

Jesse designed an experiment where participants played an interactive computer game called SHRDLURN. In this game, the player is given a set of blocks of different colours, and a “goal”, which is the winning arrangement of blocks. The player types instructions to the computer such as “remove the red blocks” and the computer tries to execute the instruction. The interesting bit is that the computer doesn’t understand language to begin with. In response to a player instruction, it presents the player with a list of block arrangements, and the player picks the arrangement that fits their instructions. Over time, the computer learns to associate instructions with the correct moves, and the correct configuration starts appearing higher up in the list. The system is perfectly trained when the first guess on its list is always the one the player intended.

description
The figure above shows some example levels from the game. How would you instruct a computer to go from the start to the goal?

Sixteen participants took part in our experiment. Half of them played the game with no restriction, but the other half were given specific instructions: they were only allowed to use the following 11 words: all, cyan, red, brown, orange, except, leftmost, rightmost, add, remove, to.

We measured the quality of the final system (i.e., how successfully the computer learnt to map language to instructions) as well as the cognitive load on participants. We found, unsurprisingly, that in the non-restricted setting people used a much wider variety of words, and much longer sentences. However, the restricted language participants seemed to be able to train their systems more effectively. Participants in the restricted language setting also reported needing less effort, and perceived their performance to be higher.

examples
The figure above illustrates gameplay. A: Game with start and goal states and 2 intermediate states. B: The player issues a language command. “use only…” message appears only for players in restricted condition. C: The player scrolls through candidate configurations until she finds the one matching the meaning of the command. The correct interpretation (bottom) solves the puzzle.

By imposing restrictions, we achieved the same or better system performance, without detriment to the user experience – indeed, participants reported lower effort and higher performance. We think that a guided, consistent language helps users understand the limitations of a system. That’s not to say we’ll never desire a system that understands arbitrary human language. But given the current capabilities of AI systems, we will see diminishing returns in user experience and performance by attempting to accommodate arbitrary natural language input. Rather than considering one of two extremes – a specialised graphical user interface vs a completely natural language interface, designers should consider restricted language interfaces which trade-off full expressiveness for simplicity, learnability and consistency.

Here’s a summary in the form of a poem:

It was not meant to be this way
You cannot understand

This human dance of veiled intent
The spoken word and written hand

But let us meet at halfway point
And share our thoughts with less

To know each other’s will and wish
— not guess

Want to learn more about our study? Read it here (click to download PDF) or see the publication details below:

Mu, Jesse, and Advait Sarkar. “Do We Need Natural Language?: Exploring Restricted Language Interfaces for Complex Domains.” In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, p. LBW2822. ACM, 2019. https://dl.acm.org/citation.cfm?doid=3290607.3312975

Talking to a bot might help with depression, but you won’t enjoy the conversation

Mental illness is a significant contributor to the global health burden. Cognitive Behavioural Therapy (CBT) provided by a trained therapist is effective. But CBT is not an option for many people who cannot travel long distances, or take the time away from work, or simply cannot afford to visit a therapist.

To provide more scalable and accessible treatment, we could use Artificial Intelligence-driven chatbots to provide a therapy session. It might not (currently) be as effective as a human therapist, but it is likely to be better than no treatment at all. At least one study of a chatbot therapist has shown limited but positive clinical outcomes.

My student Samuel Bell and I were interested in finding out whether chatbot-based therapy could be effective not just clinically, but also in terms of how patients felt during the sessions. Clinical efficacy is only one marker of a good therapy session. Others include sharing ease (i.e., does the patient feel able to confide in the therapist), smoothness of conversation, perceived usefulness, and enjoyment.

To find out, we conducted a study. Ten participants with sub-clinical stress symptoms took part in two 30-minute therapy sessions. Five participants had their sessions with a human therapist, conducted via chat through an internet-based CBT interface. The other five had therapy sessions with a simulated chatbot, through the same interface. At the end of the study, all participants completed a questionnaire about their experience.

We found that in terms of sharing ease and perceived usefulness, neither the human nor the simulated chatbot emerged the clear winner, although participants’ remarks suggested that they found the chatbot less useful. In terms of smoothness of conversation and enjoyment, the chatbot was clearly worse.

Participants felt that the chatbot had a poor ability to “read between the lines”, and they felt that their comments were often ignored. One participant explained their dissatisfaction:

“It was a repetition of what I said, not an expansion of what I said.”

Another participant commented on the lack of shared experience:

“When you tell something to someone, it’s better, because they might have gone through something similar… there’s no sense that the robot cares or understands or empathises.”

Our study has a small sample size, but nonetheless points to clear deficiencies in chatbot-based therapy. We suggest that future research into chatbot CBT acknowledges and explores these areas of conversational recall, empathy, and the challenge of shared experience, in the hope that we may benefit from scalable, accessible therapy where needed.

Want to learn more about our study? Read it here (PDF) or see the publication details below:

Bell, Samuel, Clara Wood, and Advait Sarkar. “Perceptions of Chatbots in Therapy.” In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, p. LBW1712. ACM, 2019. https://dl.acm.org/citation.cfm?id=3313072