People reluctant to use self-driving cars, survey shows

Autonomous vehicles are going to save us from traffic, emissions, and inefficient models of car ownership. But while songs of praise for self-driving cars are regularly sung in Silicon Valley, does the public really want them?

That’s what my student Charlie Hewitt, and collaborators Ioannis Politis and Theocharis Amanatidis set out to study. We decided to conduct a public opinion survey to find out.

However, we first had to solve two problems.

  1. When Charlie started his work, there were no existing surveys designed specifically around autonomous vehicles. We had some surveys for technology acceptance in general, and some for cars, which are a good start. So we combined those and introduced some additional information. This resulted in the creation of a new survey designed specifically for autonomous vehicles. We called it the Autonomous Vehicle Acceptance Model, or AVAM for short.
  2. When people think of self-driving cars, they generally picture a futuristic pod with no steering wheel or controls, that they just step into and get magically transported to their destination. However, the auto industry differentiates between six levels of autonomy. Previous studies had attempted to get people’s attitudes to each of these levels, but it turns out people can’t picture these different levels of autonomy very well, and don’t understand how they differ. So, Charlie created short descriptions to explain the differences between them. These vignettes are a key part of the AVAM, because they help the general public understand the implications of different levels of autonomy.

Here are the six levels of autonomous vehicles as described in our survey:

  • Level 0: No Driving Automation. Your car requires you to fully control steering, acceleration/deceleration and gear changes at all times while driving. No autonomous functionality is present.
  • Level 1: Driver Assistance. Your car requires you to control steering and acceleration/deceleration on most roads. On large, multi-lane highways the vehicle is equipped with cruise-control which can maintain your desired speed, or match the speed of the vehicle to that of the vehicle in front, autonomously. You are required to maintain control of the steering at all times.
  • Level 2: Partial Driving Automation. Your car requires you to control steering and  acceleration/deceleration on most roads. On large, multi-lane highways the vehicle is equipped with cruise-control which can maintain your desired speed, or match the speed of the vehicle to that of the vehicle in front, autonomously. The car can also follow the highway’s lane markings and change between lanes autonomously, but may require you to retake control with little or no warning in emergency situations.
  • Level 3: Conditional Driving Automation. Your car can drive partially autonomously on large, multi-lane highways. You must manually steer and accelerate/decelerate when on minor roads, but upon entering a highway the car can take control and steer, accelerate/decelerate and switch lanes as appropriate. The car is aware of potential emergency situations, but if it encounters a confusing situation which it cannot handle autonomously then you will be alerted and must retake control within a few seconds. Upon reaching the exit of the highway the car indicates that you must retake control of the steering and speed control.
  • Level 4: High Driving Automation. Your car can drive fully autonomously only on large, multi-lane highways. You must manually steer and accelerate/decelerate when on minor roads, but upon entering a highway the car can take full control and can steer, accelerate/decelerate and switch lanes as appropriate. The car does not rely on your input at all while on the highway. Upon reaching the exit of the highway the car indicates that you must retake control of the steering and speed control.
  • Level 5: Full Driving Automation. Your car is fully autonomous. You are able to get into the car and instruct it where you would like to travel to, the car then carries out your desired route with no further interaction required from you. There are no steering or speed controls as driving occurs without any interaction from you.

Before you read on, think about each of those levels. What do you think are the advantages and disadvantages of each? Which would you be comfortable with and why?

We sent our survey to 187 drivers recruited from across the USA, and here’s what we found:

Result 1: our respondents were not ready to accept autonomous vehicles.

We found that on many measures, people report a lower acceptance of higher automation levels. People perceive higher autonomy levels as being less safe, they report lower intent to use them, and higher anxiety with higher autonomy levels.

We compared some of the results with those from an earlier study, conducted in 2014. We had to make some simplifying assumptions, as the 2014 study wasn’t conducted with the AVAM. However, we still found that our results were mostly similar: both studies found that people (unsurprisingly) expected to have to do less as the level of autonomy increased. Both studies also found that people showed lower intent to use higher autonomy vehicles, and poorer general attitude towards higher autonomy. Self-driving cars seem to be suffering in public opinion!

Result 2: the biggest leap in user perception comes with full autonomy.

We asked people how much they would expect to have to use their hands, feet and eyes while using a vehicle at each level of autonomy. Even though vehicles at the intermediate levels of autonomy (3 and 4) can do significantly more than levels 1 and 2, people did not perceive the higher levels as requiring significantly less engagement. However, at level 5 (full autonomy), there was a dramatic drop in expected engagement. This was an interesting and new finding (albeit not entirely surprising). One explanation for this is that people only really perceive two levels of autonomy: partial and full, and don’t really care about the minor differences in experience with different levels of partial autonomy.

All in all, we were fascinated to learn about people’s attitudes to self-driving cars. Despite the enthusiasm displayed by the tech media, there seems to be a consistent concern around their safety and reluctance to adopt amongst the general public. Even if self-driving cars really do end up being safer and better in many other ways than regular cars, automakers will still face this challenge of public perception.

And now, a summary poem:

The iron beast has come alive,
We do not want it, do not want it
Its promises we do not prize
It does not do as we see fit

Only when we can rely
On iron beast with its own eye
Only then will we concede
And disaffection yield to need

If you’re interested in using our questionnaire or our data, please reach out! I’d love to help you build on our research.

Want to learn more about our study? Read it here (click to download PDF) or see the publication details below:

Charlie Hewitt, Ioannis Politis, Theocharis Amanatidis, and Advait Sarkar. 2019. Assessing public perception of self-driving cars: the autonomous vehicle acceptance model. In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI ’19). ACM, New York, NY, USA, 518-527. DOI: https://doi.org/10.1145/3301275.3302268

Human language isn’t the best way to chat with Siri or Alexa, probably

The year is 2019. Voice-controlled digital assistants are great at simple commands such as “set a timer…” and “what’s the weather?”, but frustratingly little else.

Human language seems to be an ideal interface for computer systems; it is infinitely flexible and the user already knows how to use it! But there are drawbacks. Computer systems that aim to understand arbitrary language are really hard to build, and they also create unrealistic expectations of what the system can do, resulting in user confusion and disappointment.

The next frontier for voice assistants is complex dialogue in challenging domains such as managing schedules, analysing data, and controlling robots. The next generation of systems must learn to map ambiguous human language to precise computer instructions. The mismatch between user expectations and system capabilities is only worsened in these scenarios.

What if we could preserve the familiarity of natural language, while better managing user expectations and simplifying the system to boot? That’s exactly what my student Jesse Mu set out to study. The idea was to use what we called a restricted language interface, one that is a well-chosen subset of full natural language.

Jesse designed an experiment where participants played an interactive computer game called SHRDLURN. In this game, the player is given a set of blocks of different colours, and a “goal”, which is the winning arrangement of blocks. The player types instructions to the computer such as “remove the red blocks” and the computer tries to execute the instruction. The interesting bit is that the computer doesn’t understand language to begin with. In response to a player instruction, it presents the player with a list of block arrangements, and the player picks the arrangement that fits their instructions. Over time, the computer learns to associate instructions with the correct moves, and the correct configuration starts appearing higher up in the list. The system is perfectly trained when the first guess on its list is always the one the player intended.

description
The figure above shows some example levels from the game. How would you instruct a computer to go from the start to the goal?

Sixteen participants took part in our experiment. Half of them played the game with no restriction, but the other half were given specific instructions: they were only allowed to use the following 11 words: all, cyan, red, brown, orange, except, leftmost, rightmost, add, remove, to.

We measured the quality of the final system (i.e., how successfully the computer learnt to map language to instructions) as well as the cognitive load on participants. We found, unsurprisingly, that in the non-restricted setting people used a much wider variety of words, and much longer sentences. However, the restricted language participants seemed to be able to train their systems more effectively. Participants in the restricted language setting also reported needing less effort, and perceived their performance to be higher.

examples
The figure above illustrates gameplay. A: Game with start and goal states and 2 intermediate states. B: The player issues a language command. “use only…” message appears only for players in restricted condition. C: The player scrolls through candidate configurations until she finds the one matching the meaning of the command. The correct interpretation (bottom) solves the puzzle.

By imposing restrictions, we achieved the same or better system performance, without detriment to the user experience – indeed, participants reported lower effort and higher performance. We think that a guided, consistent language helps users understand the limitations of a system. That’s not to say we’ll never desire a system that understands arbitrary human language. But given the current capabilities of AI systems, we will see diminishing returns in user experience and performance by attempting to accommodate arbitrary natural language input. Rather than considering one of two extremes – a specialised graphical user interface vs a completely natural language interface, designers should consider restricted language interfaces which trade-off full expressiveness for simplicity, learnability and consistency.

Here’s a summary in the form of a poem:

It was not meant to be this way
You cannot understand

This human dance of veiled intent
The spoken word and written hand

But let us meet at halfway point
And share our thoughts with less

To know each other’s will and wish
— not guess

Want to learn more about our study? Read it here (click to download PDF) or see the publication details below:

Mu, Jesse, and Advait Sarkar. “Do We Need Natural Language?: Exploring Restricted Language Interfaces for Complex Domains.” In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, p. LBW2822. ACM, 2019. https://dl.acm.org/citation.cfm?doid=3290607.3312975

Setwise Comparison: a faster, more consistent way to make judgements

I originally wrote this post in 2016 for the Sparrho blog.

Have you ever wondered whether doctors are consistent in their judgements? In some cases, they really aren’t. When asked to rate videos of patients with multiple sclerosis (a disease that causes impaired movement) on a numeric scale from 0 being completely healthy to 4 being severely impaired, clinicians struggled to be consistent, often giving the same patient different scores at different times, and disagreeing amongst themselves. This difficulty is quite common, and not unique to doctors — people often have to assign scores to difficult, abstract concepts, such as “How good was a musical performance?” or “How much do you agree or disagree with this statement?” Time and time again, it has been shown through research that people are fundamentally inconsistent at this type of activity, no matter the setting or level of expertise.

The field of ‘machine learning’, which can help to automate such scoring (e.g. automatically rating patients according to their disability), is based on the method that we can give the computer a set of examples for which the score is known, in the hope that the computer can use these to ‘learn’ how to assign scores to new, unseen examples. But if the computer is taught from examples where the score is inconsistently assigned, the result is that the computer learns to assign inconsistent, unusable scores to new, unseen examples.

To solve this problem, we brought together an understanding of how humans work with some mathematical tricks. The fundamental insight is that it is easier and more consistent for humans to provide preference judgements (e.g. “is this higher/lower/equal to that?”) as opposed to absolute value judgements (e.g. “is this a 4 or a 5?”). The problem is, even if you have as few as 50 items to assign scores, you already have 50 x 49 = 2450 ways of pairing them together. This balloons to nearly 10,000 comparisons when you have 100 items. Clearly, this doesn’t scale. So we scale this using a mathematical insight: namely, that if you’ve compared A to B, and B to C, you can guess with reasonably high accuracy what the relationship is between A and C. This ‘guessing’ is done with a computer algorithm called TrueSkill, which was originally invented to help rank people playing multiplayer games by their skill, so that they could be better matched to online opponents. Using TrueSkill, we can reduce the number of comparisons required by a significant amount, so that increasing the number of items no longer results in a huge increase in comparisons. This study has advanced our understanding of how people quantify difficult concepts, and has presented a new method which balances the strengths of people and computers to help people efficiently and consistently provide scores to many items.

Why is this important for researchers in fields other than computer vision?

This study shows a new way to quickly and consistently have humans rate items on a continuous scale (e.g. “rate the happiness of the individual in this picture on a scale of 1 to 5”). It works through the use of preference judgements (e.g. “is this higher/lower/equal to that?”) as opposed to absolute value judgements (e.g. “is this a 4 or a 5?”), combined with an algorithmic ranking system which can reduce the need to compare every item with every other item. This was initially motivated by the need to have higher-quality labels for machine learning systems, but can be applied in any domain where humans have difficulty placing items along a scale. In our study we showed that clinicians can use our method to achieve far higher consistency than was previously thought possible in their assessment of motor illness.

We built a nifty tool to help clinicians perform Setwise Comparison, which you can see in the video below: https://www.youtube.com/watch?v=Q1hW-UXU3YE

Why is this important for researchers in the same field?

This study describes a novel method for efficiently eliciting high-consistency continuous labels, which can be used as training data for machine learning systems, when the concept being labelled has unclear boundaries — a common scenario in several machine learning domains, such as affect recognition, automated sports coaching, and automated disease assessment. Label consistency is improved through the use of preference judgements, that is, labellers sort training data on a continuum, rather than providing absolute value judgements. Efficiency is improved through the use of comparison in sets (as opposed to pairwise comparison), and leveraging probabilistic inference through the TrueSkill algorithm to infer the relationship between data which have not explicitly been compared. The system was evaluated on the real-world case study of clinicians assessing motor degeneration in multiple sclerosis (MS) patients, and was shown to have an unprecedented level of consistency, exceeding widely-accepted clinical ‘gold standards’.

To learn more

If you’re interested in learning more, we reported this research in detail in the following publications:

Setwise Comparison: Consistent, Scalable, Continuum Labels for Machine Learning
Advait Sarkar, Cecily Morrison, Jonas F. Dorn, Rishi Bedi, Saskia Steinheimer, Jacques Boisvert, Jessica Burggraaff, Marcus D’Souza, Peter Kontschieder, Samuel Rota Bulò, Lorcan Walsh, Christian P. Kamm, Yordan Zaykov, Abigail Sellen, Siân E. Lindley
Proceedings of the 34th Annual ACM Conference on Human Factors in Computing Systems (CHI 2016) (pp. 261–271)

Setwise comparison: efficient fine-grained rating of movement videos using algorithmic support – a proof of concept study
Saskia Steinheimer, Jonas F. Dorn, Cecily Morrison, Advait Sarkar, Marcus D’Souza, Jacques Boisvert, Rishi Bedi, Jessica Burggraaff, Peter Kontschieder, Frank Dahlke, Abigail Sellen, Bernard M. J. Uitdehaag, Ludwig Kappos, Christian P. Kamm
Disability and Rehabilitation, 2019
(This was a writeup of our 2016 CHI paper for a medical audience)