Getting the Complete Picture with Usability Testing
by Dr. Bob Bailey
Good usability testing definitely provides an opportunity for clear-cut improvements in the usability of Web sites. In a previous article, I discussed the value of iterative design. There is a second major consideration for success with usability testing. In order to increase the chances of success with usability testing, it is important to measure effectiveness, efficiency, and satisfaction–they all measure different aspects of the usability of a Web site. If only one or two of these measures are used, it would provide an incomplete or partial picture of the possible human performance and user satisfaction results.
Five recent tests that I have conducted will be used to illustrate the different types of information that comes from the three different usability measures.
Percent Correct (Effectiveness)
Effectiveness is a way of trying to understand better how successfully users will be able to use a site. This is usually measured by determining how many scenarios, test questions, tasks or other types of activities that participants are able to complete.
In Test 1, participants correctly completed 65% of their tasks in the baseline test, and in Test 2 on the same Web site, they got 73% correct. This meant that the overall increase was 8 percentage points. On closer inspection it was clear that most of the improvement came on tasks where users initially had the most trouble. In Test 3, participants initially got 60% correct, and in Test 4 on the same Web site, after designers made many changes, participants got 76% correct-this was a 16 percentage point improvement.
Time to Complete Each Scenario (Efficiency)
Efficiency is usually measured by determining how quickly users are able to complete scenarios, test questions, or tasks. It is possible that users are able to successfully complete an activity, but they take "forever" to do it. In fact, they may take so long to complete a web-based task that they find it easier to call someone on the telephone or walk to someone else’s desk and simply ask them.
In Tests 1 and 2, referred to previously, participants took less time to complete 8 of the 10 test scenarios in the retest. In Tests 3 and 4, the participants took less time to complete 7 of the 10 scenarios. In both of these test-retest situations, the time to correctly complete the activities became much faster because of the decisions made by designers.
In Test 3, I divided the participants into three groups. Each represented a different user type or a different segment of the user population. Each user group took about the same amount of time in responding to the scenarios; however, the time needed to make a correct response was substantially faster than when making a wrong response.
Satisfaction is a set of subjective responses a user may have when using a system. Over the years, I have watched many test sessions that usability professionals conduct where the only thing they measured was the participants’ like or dislike of the Web site’s interface. They would have people try to complete certain tasks while continually asking them how they felt about the site’s components. This is actually an indirect, and rather weak, way of measuring a user’s satisfaction with the site’s design. Some usability professionals follow up these “think aloud” sessions with having the participants complete a questionnaire or survey related to satisfaction about the site.
I use the System Usability Scale (SUS) to measure user satisfaction with Web sites. This satisfaction scale has been around for several years, and is used by many usability testers. SUS scores can range from 0 (very little satisfaction) to 100 (very high satisfaction). Average satisfaction scores are usually between 65 and 70. In Test 1, for example, the SUS score was 63, and this increased to 73 in Test 2, which was a 14% improvement. The SUS scale uses 10 categories to evaluate satisfaction. In Tests 1 and 2, satisfaction increased in 9 of the 10 categories. In Tests 3 and 4, the SUS satisfaction scores increased from 40 to 56, which was a 29% improvement.
Assessing Correlations Among the Three Usability Measures
In the testing, the only correlation that consistently shows a reliable relationship between variables is “the time to perform a task” and “the number of pages viewed.” These correlations tend to be fairly high, and are usually reliable (statistically significant). In other words, there is a high likelihood that they will show up in most usability tests. These correlations have ranged from .71 to .88. We usually do not find other reliable correlations, which suggests that the measurements taken for number correct, completion time, and satisfaction all appear to be measuring different, somewhat independent, aspects of the user experience.
Test 5 provided some correlations that were typical of those found in most previous usability tests. There were 21 participants in this test. The results are shown in the following table. Note that there is only one reliable correlation, and six that are not reliable. Again, the fact that only one was reliable suggests that only that correlation has a high probability of being repeated in other tests. The others are too low to be counted on in other tests.
Number correct and time to complete
Number correct and pages visited
Number correct and years worked
Time to complete and pages visited
Satisfaction and number correct
Satisfaction and time to complete
Satisfaction and years worked
The lack of reliable correlations among the three measures in testing argues for ensuring that we collect data on the user experience that covers more than the participant's satisfaction level, or even how many they get correct (accuracy). To get the best possible picture of Web site usability, it requires the collection of information about their effectiveness (percent correct), their efficiency (time to complete), and their satisfaction level.