Should we teach good behaviour to Artificial Intelligence (AI) through our feedback, or should we try and tell it a set of rules explaining what good behaviour is? Both approaches have advantages and limitations, but when we tested them in a complex scenario, one of them emerged the winner.
If AI is the future, how will we tell it what we want it to do?
Artificial intelligence is capable of crunching through enormous datasets and providing us assistance in many facets of our lives. Indeed, it seems this is our future. An AI assistant may help you decide what gifts to buy for a friend, or what books to read, or who to meet, or what to do on the weekend. In the worst case, of course, this could be dystopian – AI controls us, and not the other way around, we’ve all heard that story – but in the best case, it could be incredibly stimulating, deeply satisfying, and profoundly liberating.
But an important and unsolved problem is that of specifying our intent, our goals, and our desires, for the AI system. Assuming we know what we want from the AI system (this is not always the case, as we’ll see later), how do we teach the system? How do we help the system learn what gifts might be good for a friend, what books we might like to read, the people we might like to meet, and the weekend activities we care about?
There are many parts to this problem, and many solutions. The solution ultimately depends on the context in which we’re teaching the AI, and the task we’re recruiting it to do for us. So in order to study this, we need a concrete problem. Luckily for me, Ruixue Liu decided to join us at Microsoft for an internship in which she explored a unique and interesting problem indeed. The problem we studied was how to teach an AI system to give us information about a meeting, where for some reason, we can’t see the meeting room.
Our problem: eyes-free meeting participation
When people enter a meeting room, they can typically pick up several cues: who is in the meeting? Where in the room are they? Are they seated or standing? Who is speaking? What are they doing? Research shows that not having this information can be very detrimental to meeting participation.
Unfortunately, in many modern meeting scenarios, this is exactly the situation we find ourselves in. People often join online meetings remotely without access to video, due to device limitations, poor Internet connections, or because they are engaged in parallel “eyes-busy” tasks such as driving, cooking, or going to the gym. People who are blind or low vision also describe this lack of information as a major hurdle in meetings, whether in-person or online.
We think an AI system could use cameras in meeting rooms to present this information to people who, for whatever reason, cannot see the meeting room. This information could be relayed via computer-generated speech, or special sound signals, or even through haptics. Given that the participant only has a few moments to understand this information as they join a meeting, it’s important that only the most useful information is given to the user. Does the user want to know about people’s locations? Their pose? Their clothes? What information would be useful and helpful for meeting participation?
However, what counts as ‘most useful’ varies from user to user, and context to context. One goal of the AI system is to learn this, but it can’t do so without help from the user. Here is the problem: should the user tell the system what information is most useful, by specifying a set of rules about what information they want in each scenario, or should the user give feedback to the system, saying whether or not it did a good job over the course of many meetings, with the aim of teaching it correct behaviour in the long term?
Our study, in which we made people attend over 100 meetings
Don’t worry – luckily for the sanity of our participants, these weren’t real meetings. We created a meeting simulator which could randomly generate meeting scenarios. Each simulated meeting had a set of people – we generated names, locations (within the room), poses, whether they were speaking or not, and several other pieces of information. Because we were testing eyes-free meeting participation, we didn’t visualise this information – the objective was for the user to train the system to present a useful summary of this information in audio form.
We conducted a study in which 15 participants used two approaches to ‘train’ the system to relay the information they wanted. One approach was a rule-based programming system, where the participant could specify “if this, then that”-style rules. For example, “if the number of people in the meeting is less than 5, then tell me the names of the people in the meeting”.
The other approach was a feedback-based training system (our technical approach was to use a kind of machine learning called deep reinforcement learning). In the feedback-based training system, the user couldn’t say what they wanted directly, but instead, as they went to various (simulated) meetings, the system would do its best to summarise the information. After each summary, the user provided simple positive/negative feedback, answering “yes” or “no” to the question of whether they were satisfied with the summary.
Each participant tried both systems, one after the other in randomised order. We let participants play around, test and tweak and teach the AI as much as they liked, and try out the system’s behaviour on as many simulated meetings as they liked. Many participants “attended” well over 100 meetings, with two participants choosing to attend nearly 160 meetings over the course of the experiment! Who knew meetings could be such fun!
We asked participants to fill out a few questionnaires about their experience of interacting with both systems, and we conducted follow-up interviews to talk about their experience, too.
Participants reported a significantly lower cognitive load and higher satisfaction when giving the system rules, than giving feedback. Thus, it was easier and more satisfactory to tell the AI how to behave, than to show it how to behave through feedback.
Rule-based programming gave participants a greater feeling of control and flexibility, but some participants found it hard at the beginning of the experiment to formulate rules from scratch. Participants also found it hard to understand how different rules worked together, and whether conflicting rules had an order of precedence (they did not).
Feedback-based teaching was seen by participants as easier, but much more imprecise. There were instances where the system did something almost correct, but because the user could only say whether the behaviour was good or bad, they did not have the tools to give more nuanced feedback to the system. Moreover, people don’t just know their preferences, they figure them out over time. With feedback-based teaching, participants worried that they were ‘misleading’ the system with poor feedback at the early stages of training, while they were still figuring out what their preferences were.
Based on our results, we would recommend a rule-based programming interface. But as explained, we found several advantages and limitations to both approaches. In both cases, we found that the first step was for the human to figure out what they wanted from the system! This is hard if the user doesn’t have a clear idea of what the system can and can’t do; our first recommendation is for system designers to make this clear.
Our participants also had a hard time in both cases expressing their preferences exactly: with rules, it was because the rule-based programming language was complex, and with feedback-based teaching, it was because yes/no feedback isn’t precise enough. Our second recommendation is to make clear to users what actions they need to take to specify certain preferences.
Finally, it was difficult for participants to understand the system they finally trained; it was difficult to know what rules would apply in certain scenarios, and they also found the feedback-trained system to be unpredictable. Our third recommendation is to provide more information as to why the system does what it does in certain scenarios.
In the future, we should consider blending the two approaches, to get the best of both worlds. For example, the feedback-based system could be used to generate candidate rules, to help users form a better idea of their preferences, or detect hard-to-specify contexts. Rule-based systems could help define context, explain behaviour learnt by the system, and provide a way for specifying and editing information not captured by the feedback-trained system. We aren’t sure what this might look like, but we’re working on it. Until then, let’s aim to tell, and not show, what we want our AI to do.
Here’s a summary poem:
Yes, no, a little more
What do you want?
I can do this, this, and this
But that I can’t
Tell me and I’ll show you
What you can’t see
I’ll do my best to learn from
What you tell me
Want to learn more? Read our study here (click to download PDF), and see the publication details below:
Liu, Ruixue, Advait Sarkar, Erin Solovey, and Sebastian Tschiatschek. “Evaluating Rule-based Programming and Reinforcement Learning for Personalising an Intelligent System.” In IUI Workshops. 2019. http://ceur-ws.org/Vol-2327/#ExSS