The year is 2019. Voice-controlled digital assistants are great at simple commands such as “set a timer…” and “what’s the weather?”, but frustratingly little else.
Human language seems to be an ideal interface for computer systems; it is infinitely flexible and the user already knows how to use it! But there are drawbacks. Computer systems that aim to understand arbitrary language are really hard to build, and they also create unrealistic expectations of what the system can do, resulting in user confusion and disappointment.
The next frontier for voice assistants is complex dialogue in challenging domains such as managing schedules, analysing data, and controlling robots. The next generation of systems must learn to map ambiguous human language to precise computer instructions. The mismatch between user expectations and system capabilities is only worsened in these scenarios.
What if we could preserve the familiarity of natural language, while better managing user expectations and simplifying the system to boot? That’s exactly what my student Jesse Mu set out to study. The idea was to use what we called a restricted language interface, one that is a well-chosen subset of full natural language.
Jesse designed an experiment where participants played an interactive computer game called SHRDLURN. In this game, the player is given a set of blocks of different colours, and a “goal”, which is the winning arrangement of blocks. The player types instructions to the computer such as “remove the red blocks” and the computer tries to execute the instruction. The interesting bit is that the computer doesn’t understand language to begin with. In response to a player instruction, it presents the player with a list of block arrangements, and the player picks the arrangement that fits their instructions. Over time, the computer learns to associate instructions with the correct moves, and the correct configuration starts appearing higher up in the list. The system is perfectly trained when the first guess on its list is always the one the player intended.
Sixteen participants took part in our experiment. Half of them played the game with no restriction, but the other half were given specific instructions: they were only allowed to use the following 11 words: all, cyan, red, brown, orange, except, leftmost, rightmost, add, remove, to.
We measured the quality of the final system (i.e., how successfully the computer learnt to map language to instructions) as well as the cognitive load on participants. We found, unsurprisingly, that in the non-restricted setting people used a much wider variety of words, and much longer sentences. However, the restricted language participants seemed to be able to train their systems more effectively. Participants in the restricted language setting also reported needing less effort, and perceived their performance to be higher.
By imposing restrictions, we achieved the same or better system performance, without detriment to the user experience – indeed, participants reported lower effort and higher performance. We think that a guided, consistent language helps users understand the limitations of a system. That’s not to say we’ll never desire a system that understands arbitrary human language. But given the current capabilities of AI systems, we will see diminishing returns in user experience and performance by attempting to accommodate arbitrary natural language input. Rather than considering one of two extremes – a specialised graphical user interface vs a completely natural language interface, designers should consider restricted language interfaces which trade-off full expressiveness for simplicity, learnability and consistency.
Here’s a summary in the form of a poem:
It was not meant to be this way
You cannot understand
This human dance of veiled intent
The spoken word and written hand
But let us meet at halfway point
And share our thoughts with less
To know each other’s will and wish
— not guess
Want to learn more about our study? Read it here (click to download PDF) or see the publication details below:
Mu, Jesse, and Advait Sarkar. “Do We Need Natural Language?: Exploring Restricted Language Interfaces for Complex Domains.” In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, p. LBW2822. ACM, 2019. https://dl.acm.org/citation.cfm?doid=3290607.3312975
One thought on “Human language isn’t the best way to chat with Siri or Alexa, probably”