Canonical
                
                
              
              on 2 March 2012
            
Every three months, I conduct benchmark usability testing. I’m calling these tests ‘benchmark testing’ because the aim of these sessions is to measure our progress towards achieving a great user experience with Ubuntu. Last testing took place in October 2011. I am now preparing for testing 12.04 to take place a couple of weeks from now.
When I publish the results of usability testing, I get many questions about my process. So I have thought that the best way to explain how I approach usability is to take you along the preparation and execution of my benchmark testing. Over the next month, I will take you, step by step through my process, from recruiting participants, to writing a test protocol to conducting and analysing usability sessions and writing up results. This will afford you the possibility of ‘accompanying me’, so to speak, and of conducting usability in parallel, if you are so inclined.
For this post, I walk through the first stage of any testing: recruiting participants.
Recruiting
This is a crucial part of any successful and meaningful testing. Some argue that just anyone you can get hold of will do. This attitude, in my view, puts the software before the people who will use it, and carries the implicit assumption that software, by its very nature, is usable. But the simple fact, which we actually all realise, is that it isn’t. Take music players, for instance. The challenge for this type of software is to fit into the lives of people who want to listen to music. It doesn’t have to work well for those who don’t listen to music but who are, for instance, heavily into photo editing. In a word, testing your software with your grandmother or your partner might not provide all the feedback you need to create a user-friendly product if they are not engaged in the activities your software is meant to facilitate.
So, the basic idea is: in preparing the testing, recruit the right people. The type of participants you work with will determine the quality and reliability of the results you get.
There are some basic rules for writing a screener questionnaire.
Rule 1: Recruit according to your testing goals
Is your goal to test, for instance, adoption: that is, are you going to assess how new users respond to your software the first time they encounter it and how delighted they are by it? Alternatively, is your goal to test learning: do you want to assess how easily a novice can figure out how to use your software and how they progress over time? Or are you really interested in expert usage: do you want to assess how performative your software is in a specific context of use involving expert tasks? There are, of course, other scenarios as well. The point here is that you need to be clear about your goal before you begin.
With Unity, we have 2 basic goals: 1) adoption: we want to know how easy to use and attractive Unity is to someone who has not encountered it before; and 2) expert usage: we want to know how performative Unity is with highly competent users who are fairly familiar with it.
Given these very different goals, I will need to conduct 2 different user testing sessions with different recruiting screeners or questionnaires, and different protocols.
In this blog, I concentrate on my first project, to test for adoption.
Rule 2: Know your software
You need to review your software carefully: you need to (1) identify the main purpose of the software and the activities or tasks that it is meant to facilitate; and (2) identify where you think potential usability weaknesses are.
When I prepare a usability test, and before I even think about recruiting participants, I spend a significant amount of time trying out the software, and even more time discussing with the designers and developers their own concerns. From this evaluation of the usefulness and usability of the software, I’m able to sketch a profile of participants. Bear in mind that, given my goals as set out above, the participants will need to be able to use the software right away even if they’ve never used Ubuntu, since I am not testing for learning.
Given what Unity aims to allow users to do, we need to confirm (or not) in the testing that Unity users can easily get set up for and can conduct at least the following activities:
- writing, saving, printing documents
- finding, opening applications
- listening to music
- watching a movie
- managing and editing photos
- customising their computer: organising icons and short-cuts and changing setting
- browsing the internet
- communicating
Additionally, the OS should make it easy for users to:
- multi task
- navigate and use special features like alt-tab
- be aware of what’s going on with their computer
- create short-cuts
- understand icons, notifications and generally the visual language
In this instance, I want as well to test the new features we have designed since 11.10
Given my goals, my recruitment screener should be written in a way that will provide me with participants who engage in these activities on a regular basis.
Rule 3: Make sure you have an appropriate number of participants, with an appropriate range of expertise, with appropriately different experiences
I’ve often heard it said that all you need is a handful of participants – for example, 5 will do. While this may be true for very specific testing, when your participants come from a homogeneous group (for example, cardiologists, for testing a piece of cardiology software), it is not true generally. Much more often, software is meant to be used by a variety of people who have differing goals, and differing relevant experience and contexts of use.
You need to take these into account for 2 purposes: 1) to be able to test the usefulness and appropriateness of the software for different users; and 2) to be able to assess the reasons and origins of any usability problem that you find – these can be explained by comparing differences between users. A usability problem will have a different design solution if it is created by a user’s lack of expertise than if it is created by a shortcoming of the software that stumped all user groups. It will also help rate the severity of the discovered problems.
Some of the factors a competent recruiting will take into account are:
Different levels of expertise: for example, in the case of software for photo-editing, you probably need to assess the ease of use for people who have been editing their photos for more than 5 years, and for those who have been editing for less than 1 year. Expertise can be reflected in the length of time they have been engaged in the activity and also in the complexity of their activities. You may want to recruit people who do basic editing, like eliminating red-eye; and then, to compare their use of your software to the use by people who do special effects, montages, presentations and the like. This way, you get feedback on a wide range of the software’s features and functionalities.
Different kinds of uses: potential users will have different needs and different potential uses for the software. For example, if the software is healthcare related, it may well be used by doctors, nurses, radiologists – and sometimes even patients. It is useful, when considering recruiting, to include participants from these various professions and other walks of life, so that you will be able to determine how well your software serves the range of needs, processes and work conditions represented by the likely (range of) users.
Different operating systems: you may want to select participants who use, at least, Windows, Mac and Ubuntu. Users who are new to Ubuntu have acquired habits and expectations from using another OS. These habits and expectations become with time equated with ease of use for these users because of their familiarity. Recruiting participants with different habits and expectations will help you to understand the impact of these expectations as well as receptivity to innovation.
Recruiting your participants with precision will allow you to understand the usability of your software in a complex and holistic way and will dictate more innovative and effective design solutions.
Keep in mind, however, that the more diverse the kinds of persons who you envisage will be primary users for the software are, the larger the number of participants you will need. You should recruit at the very least 5 similar participants per group – for instance, in the healthcare example, at least 5 doctors, 5 nurses, and 5 patients.
A few more things to consider explicitly putting into your questionnaire/screener, particularly if you are writing it for a recruiting firm:
It is advisable to have a mix of male and female participants;
Participants from different age groups often have different experiences with technologies, and so you should include a good mix of ages;
The perceived level of comfort with a computer can also help the moderator understand the participant’s context of use. A question about how participants assess themselves as computer users can very often be helpful;
You should always add a general open question to your screener to judge the degree of facility with which the potential participant expresses ideas and points of view. The moderator is dependent on the participant to express, in a quite short amount of time, the immediate experience of using the software. Consequently, being able to understand the participant quickly and precisely is vital to obtaining rich and reliable data. The individual who makes the recruitment needs to be able to evaluate the communication proficiency of the potential participant.
Rule 4: Observe the basics of writing the recruitment screener
The most reliable way to obtain the desired participants is to get them to describe their behaviours rather than relying on their judgment when they respond to the screening questionnaire. For example, if you want a participant who has a good experience in photography, instead of formulating your questions as:
Question: Do you have extensive experience in photography?
Choice of answers:
Yes
No
You should formulate your question in a way to make sure the person has some level of familiarity with photography:
Question:  During the last 6 months I have taken:
Choice of answers:
between 20 and 50 photos a month [Recruit]
Less than 20 photos a month [Reject]
By matching potential participants to actual behaviours, you can make a reasonable guess, for example, here, that someone who has been taking 50 photos every months in the last 6 months is indeed competent in photography, whereas when you rely on the person’s own assessment that they have extensive experience, you can’t know for sure that they are using the same criteria as you do to evaluate themselves.
Your screener should be created from a succession of questions representing a reasonable measure of familiarity and competence with the tasks you will test in your software.
That said, your screener should not be too long, as the recruitment agency personnel will probably spend no more than 10 minutes to qualify candidates they are speaking with on the phone. At the same time though, you need to ensure that you cover questions about all the key tasks that you will ask participants to perform during the test.
Summing up
Let me sum up the basics I’ve just covered by showing you the requirements I have in my screener for testing the ease of use of Unity by the general public user, not necessarily familiar with Ubuntu. They include that:
- there should be a mix of males and females;
- there should be a variety of ages;
- participants should not have participated in more than 5 market research efforts (because people who regularly participate in market research might not be as candid as others would be);
- there should be a mix of Windows, Mac and Ubuntu users;
- participants should:
- have broadband at home (being an indicator of interest in and use of computer during personal time);
- spend 10 hours or more per week on computer for personal reasons (which shows engagement with activities on computer);
- be comfortable with the computer, or be a techy user;
- use 2 monitors on a daily basis (I want to test our new multi-monitor design) to carry out a variety of activities online (part of the designs I want to test relate to managing documents, photos, music, and so forth and I want my participants to be familiar with these activities already);
- use alt-tab to navigate between applications and documents (another feature I intend to test for usability);
- have a general interest in technologies (I want to make sure that their attitude towards new technologies is positive, so they are open naturally to our design);
- express ideas and thoughts clearly.
In closing let me add that testing with friends and relatives is very difficult at many levels. First, you can’t ask all the questions you need to: there are many ‘common understandings’ that prevent the moderator from asking ‘basic/evident/challenging’ questions that might need to be asked to participants. Second, participants might not be sincere or candid about their experience: someone who knows you and understands your commitment to the software might not express what they think, and they may not identify problems they are experiencing and thus, they might minimise the impact of a usability issue or even take the blame for it. Third, of course, they might not fit as precisely as they should the recruitment screener.
Feel free to use this screener to recruit participants if you would like to conduct testing sessions along with the ones I will be doing at Canonical.
In a couple of days, I will write a blog post about writing the protocol for this round of testing – which is the next step you’ll need to take while you’re waiting for participants to be recruited.


