This article is based on the following publication [PDF]:
Sruti Srinivasa Ragavan, Advait Sarkar, and Andrew D Gordon. 2021. Spreadsheet Comprehension: Guesswork, Giving Up and Going Back to the Author. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 181, 1–21. DOI: https://doi.org/10.1145/3411764.3445634
What’s the problem?
Not a year goes by without a spreadsheet horror story making headlines – British readers might most recently recall the error in Public Health England’s test-and-trace pipeline that led to the delayed contact tracing of 16,000 positive COVID-19 cases. Others might recall the infamous Reinhart-Rogoff paper “Growth in a time of Debt” that led several governments to favour severe austerity measures in an attempt to reduce their debt-to-GDP, but which was discovered later to be based upon a flawed analysis owing in part to a spreadsheet error. While these sensational stories make for great headlines, it is important to remember that for every high-profile spreadsheet error, there are millions of people who are quietly empowered by spreadsheets every day to understand their data, make better decisions, and improve their lives and businesses.
Nonetheless, helping to prevent, detect, and fix errors in spreadsheets is a central and enduring research problem for spreadsheet researchers.
To date, almost all design research towards mitigating spreadsheet errors has been aimed at the author – the person who writes the spreadsheet, and in particular the person who writes spreadsheet formulas. Spreadsheet formulas are, of course, what make spreadsheets programs and since many researchers in this area have a computer science background, viewing a spreadsheet as a bunch of code can be reassuringly familiar.
However, this neglects two important facts. First – spreadsheets are not just code! In fact, the majority of spreadsheets contain few or no formulae at all! Spreadsheets also contain data, labels, colouring, data validations, charts, notes and comments. Second: most of the time people spend with spreadsheets is as a reader, not as an author. Most spreadsheets are used and read by multiple people, including the author, and many spreadsheets are collaboratively authored. Spreadsheet authors may themselves spend more time reading and reviewing their work than writing.
When we came across a study which found that errors in comprehension are among the top 5 most common sources of spreadsheet errors, it became clear that we needed to study the process of spreadsheet comprehension in much greater detail, to identify the moments where errors could occur, and, potentially, design to prevent them.
Spreadsheets are not just code. In fact, the majority of spreadsheets contain few or no formulae at all.
How did we study it?
To understand spreadsheet comprehension, we designed a study in which we observed people who had been sent an unfamiliar spreadsheet by their colleagues, while they tried to understand it. These were real spreadsheets that participants actually needed to understand as part of their day-to-day work. We asked participants to think aloud as they read the spreadsheet: what were they looking at, and why? When were they confused? What strategies were they trying to apply?
We made detailed screen and audio recordings as they read their spreadsheets, speaking out aloud. Then, we chopped each recording into 20-second segments. For each segment, my colleague and I noted down the following. In that moment,
- What types of information was the participant seeking?
- What strategies was the participant employing to get that information?
- What barriers and pain points were they encountering?
This, as you might imagine, was not an easy task. We didn’t have predefined lists of information types, strategies, or barriers, so this had to be developed iteratively. Moreover, it was not always clear how to classify a certain segment. Often we would find ourselves re-watching and discussing a 20-second segment at half speed for several minutes before we reached a consensus regarding what the participant was doing!
What did we find?
The analysis took weeks, but it was well worth it. The fine-grained classification of user activities allowed us to paint the most detailed picture of spreadsheet comprehension to date. Here is what we found.
Our first finding was that spreadsheet comprehension is a lot more than just formula comprehension. Prior work had largely taken the view of ‘spreadsheets as code’ and therefore focused on difficulties in understanding formulas. While our participants also faced difficulties understanding formulas, we also saw spreadsheets replete with comprehension challenges that contained no formulas at all! People have difficulties understanding data, interpreting and comparing charts, data validation rules, conditional formatting rules, and even just the formatting applied to a cell (e.g., why is the text in this cell red?). So when we think about spreadsheet comprehension, we need to think about all these activities and not just about formulas.
Our second finding was that participants spent a whopping 40% of their time on what we call ‘information seeking detours’. Often, participants needed to navigate away from the specific part of the spreadsheet they were trying to understand to a different part, or even away the spreadsheet application entirely to the web, documents, or emails, where they could gather the information they needed to continue. For example, it was very common while trying to understand a formula for the user to visit each of the cell references mentioned in the formula and scan the area of the spreadsheet for labels and other documentation. Such context switches are not just productivity loss, but we also found that these switches could themselves introduce errors, such as when one participant went to another sheet to look up a cell reference, but subsequently misremembered it when typing it in.
Participants spent a whopping 40% of their time on ‘information seeking detours’, navigating away from the part of the spreadsheet they were trying to understand.
Our third finding was an astonishing reliance on guesswork. Often, information seeking detours led to a dead end, and the participant was unable to find out what they needed. So, they guessed. Sometimes they were able to frame their guesses explicitly and test them in some form. For example, they might guess that a number might be the sum of certain other numbers – they could verify this guess by adding up the numbers themselves to see if that was the case. But often these guess couldn’t be easily tested and so participants just continued with unverified assumptions. Such guesses are clearly a problem, as an incorrect assumption could cascade into a host of incorrect understandings all over the spreadsheet. In some cases it was impossible even to make a plausible guess, and in that situation the participants needed to consult with the spreadsheet’s author. The author was not always readily available and so this shows another source of productivity loss. So we need to think about the problem of missing information to help users with spreadsheet comprehension.
Finally, while it’s tempting to say that these comprehension difficulties might be because the spreadsheet was written poorly, with lack of explanations, this was not the case at all. All the spreadsheets we saw in our study contained some form of good layout, colours, documentation – all the best practices that are recommended to build good spreadsheets. So the comprehension difficulties were not because of the lack of authorial effort, but despite all the authors did to make the spreadsheets more comprehensible. In the paper, we make some arguments for why this might be the case, and what we could do about it.
What can we do about it?
There are two sides of the comprehension equation: the reader, and the author. From the reader’s perspective, we can improve comprehension in spreadsheets by making hidden information more visible (such as conditional formatting, number formatting, and data validation). We can also reduce the number and the cost of information seeking detours, for example, by allowing users to view other portions of the spreadsheet without navigating away from the portion they were interested in understanding.
From the author’s perspective, we need to design for more fluid and relevant annotation, enabling authors to write explanations and documentation inline with the data, and detecting areas of the spreadsheet, potentially using machine learning, that are likely to benefit from additional explanation.
Conclusion
Our study is the first fine-grained analysis of spreadsheet comprehension. It revealed a number of previously unknown issues and new opportunities for designing tools and interfaces that will make visible what is hidden in spreadsheets, in ways that benefit the millions of users who depend on spreadsheets every day.
What to learn more? Read our paper here [PDF], and see the publication details below:
Sruti Srinivasa Ragavan, Advait Sarkar, and Andrew D Gordon. 2021. Spreadsheet Comprehension: Guesswork, Giving Up and Going Back to the Author. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 181, 1–21. DOI: https://doi.org/10.1145/3411764.3445634
Acknowledgements
This article reports joint work by Sruti Srinivasa Ragavan, Advait Sarkar, and Andy Gordon.