Determining the Correct Number of Usability Test Participants
by Dr. Bob Bailey
Often we are faced with determining the appropriate number of participants for our usability tests. Too many participants increase the cost and development time. If the number of participants is too small, the testing process may fail to detect important problems and result in reduced site usability.
There are several ways to determine the correct number of participants, but this article will discuss a formula to estimate the number of participants needed. This formula is called cumulative binomial probability.
When to Use Cumulative Binomial Probability
The binomial probability formula can be used to help estimate the appropriate number of test participants when something is already known about the usability of the site. We use the formula to determine the likelihood that any problem has a certain probability of occurring at least once during a testing session.
The simple formula is 1 - (1 - p)n
p = the probability of a usability problem occurring during testing
n = the number of test participants
An Example, "Confusing Link Names"
Assume that the link name that you developed for your new site confuses 50% of potential users. The reason you know about the confusing link is because 50% of your users commented via email and help desk calls that they were confused by the link name.
Considering the following table, how many users should be tested to reliably determine if the link name will confuse any one participant?
of Link Name
|Probability of Users|
|1||50% (.50)||1 - (.50)1 - .50||50% (.50)|
|2||50% (.50)||1 - (.50)2 - .75||75% (.75)|
|3||50% (.50)||1 - (.50)3 - .87||87% (.87)|
|5||50% (.50)||1 - (.50)5 - .97||97% (.97)|
|6||50% (.50)||1 - (.50)6 - .98||98% (.98)|
|7||50% (.50)||1 - (.50)7 - .99||99% (.99)|
|8||50% (.50)||1 - (.50)8 - .99||99% (.99)|
In this example, the likelihood that the link names will confuse any one user is 0.50 or 50%. So in a well-designed usability test, the likelihood that the link name will confuse any one test participant is also 0.50 (or 50%). This means that we will have a 50-50 chance of identifying the potentially confusing link name if we use one participant. If two participants are used, the probability of identifying the confusing link name increases to 75%. If three test participants are used, the likelihood that at least one of them will have difficulty when viewing the confusing link name is 87%. As the number of test participants increases, so does the probability of finding the confusing link name. In this example, if five users are tested, the likelihood that at least one of them will be confused by the link name is 97%.
For reliability, it would be best to use probabilities of either .97 (blue background) or .99 (green background). So in our example, you would need between five and seven or eight participants for a probability from .97 to .99.
Using five test participants obviously makes the chances of detection of the confusing link problem by at least on participant very high (likelihood is almost 100% of the time). We have a 100% chance that at least one test participant will identify the link name as a problem.
Dealing with Rare Problems
The lower the incidence of a specific problem may mean that there is less of a need to identify and fix that problem. But a lower incidence does not necessarily mean the problem should not be identified and fixed. There may still be some problems that can only be discovered by users with special skill sets, limited education or experience, or certain disabilities. These problems, although infrequent, may result in serious problems for that group of users.
We frequently ask, "How many users do we really need to detect certain reported problems in an interface?" The cumulative binomial probability formula helps us estimate the number of participants when we have existing information that a problem exists. The fewer the people experiencing the problem, the more test participants we will need to ensure that an identified issue is experienced by at least one of the participants. You will need fewer participants if an issue is experienced by many users. You will need more participants to be confident that at least one participant will experience a usability issue that is known and serious, but rarely impacts actual users.