If you are reading this book, then you probably already know what usability testing is, but we don't want to make any assumptions about that, which takes us right to the essence of this book: usability testing is about not making assumptions. In fact, it takes the opposite approach. Usability studies are executed in order to gain concrete, actionable insights. Instead of assuming that users will use an interface in a certain manner, usability tests are run to actually monitor their use of the interface, identify where they stumble and what they appreciate, hear their thoughts, understand their decisions and ultimately use this information to improve the product.
Let's take a step back and talk about usability. Usability refers to how easy an interface is to use. It is a quality that every interface inherently possesses. Issues arise when this quality is not very pronounced. ISO 9241-11 defines usability as the "extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use".
Effectiveness (how well the user achieves their goal) and efficiency (how much effort is required from the user to achieve their goal) are unsurprising in the ISO description. Satisfaction, however, is interesting: even if an interface is effective (meaning that it does what it needs to do) and efficient (meaning that it does not require a lot of effort to do it), how the user feels about using the interface is also relevant. For every app someone uses, there is probably a plethora of competing apps that do exactly the same thing. So how does the user choose? Oftentimes, they go by the very subjective feeling of being satisfied or not. The evaluation of the usability of an interface is therefore based on both very objective measures, such as the task completion rate or the time spent on a task, but also on the very subjective perception of the individual user using the interface. Did the user like the design? Are the colors pleasing? Does the interface make the user feel good about themselves? Understanding what users like or dislike, where they struggle, and which tasks come easy to them, helps with the following:
- Evaluate the usability of the interface
- Identify areas to fix and patterns to avoid
- Determine a usability baseline across product iterations
- Compare an interface with the competition
A more formal approach to evaluate the usability of a product is to run a usability test. Usability testing refers to "evaluating a product or service by testing it with representative users" (https://www.usability.gov/how-to-and-tools/methods/usability-testing.html). Usability testing is a subdiscipline of User Experience (UX). Its goal is to ensure that a given product is easy to use and that the user's experience with the product is intuitive, useful, and satisfying. Essentially, users, who are representative of the target users, are monitored while using an interface to complete tasks that are relevant to their needs. A usability test can be run at any stage of the product development lifecycle. Usability tests in the early design stages can be used to validate a navigation concept using wireframes, for example, or test the usability of new features using early prototypes, or benchmark the ease-of-use of the final product. The feedback is then used to refine and improve the user interface, ideally in an iterative process.
We want to point out that usability testing is not market research. While usability testing is mostly focused on the interaction with a particular product, market research is usually less specific; usability testing is about ease-of-use, whereas market research is mostly about user opinions or past experience. Usability testing does not require a large number of participants in order to generate valuable results, whereas most market research tools depend on large, statistically relevant sample sets.
Usability testing is also not Quality Assurance (QA). Quality assurance is performed by qualified testers, whereas usability testing is preferably run with non-QA testers. Quality assurance is aimed at finding bugs, whereas usability testing is preferably run on bug-free implementations. Usability testing usually involves a user completing a task the way they normally would, while quality assurance testers will repeat that same task multiple times in order to mimic every possible permutation a real user might encounter.
Usability tests can be run in different ways and for various types of studies. Let's first look into the different ways (or methodologies) of executing a usability test. These are mainly differentiated by the following characteristics:
- In-person versus remote
- Moderated versus unmoderated
This is the traditional way to run a usability test: users who meet the user profile requirements are invited to participate in a usability study at a lab. This methodology is almost always moderated, meaning that there is always a UX researcher present who moderates the user sessions.
In-person usability testing
Connectivity and the internet have made it possible for us to run usability studies remotely using software that allows the moderator to view and capture the participant’s screen, hear their audio input, and view their faces. Enhanced distribution and recording tools have also made it possible to target participants offline and receive high-fidelity user videos and feedback.
Remote usability testing
Remote usability testing allows companies to gain insights into user behavior in their natural environment and on their own schedule. It involves fewer logistics, allows participation regardless of location, and is quicker and cheaper to execute compared to in-person studies, while still delivering valuable insights and feedback. In today's globally connected world, quickly getting local insights is imperative for providing the same standard of user experience while remaining culturally specific.
Remote studies make it easy to include participants from different geographic locations in the same study. Essentially, anyone with a smart device and an internet connection can participate. Imagine that the product under testing is a social app aimed at allowing people to connect with each other when in proximity of another person using the same app. The typical users are city dwellers in densely populated capitals around the globe. Setting up in-person studies in each of the major cities would be logistically complex and expensive. This is where remote studies provide the biggest benefit.
In a remote study, study participants use their own devices. That variety of devices combined with operating system versions (and, optionally, browser versions) and connectivity speeds is near impossible to match in a lab setting. Imagine a wayfinding app that provides directions to a selected target which is meant to be rolled out globally. A remote, globally distributed study will allow the study sponsor to gather realistic feedback with regards to how the app deals with the connectivity available locally on typical devices.
It is easier to recruit study participants when there is no travel required as it lowers the hurdle for participants to take part in the study. A remote usability study can be completed within as little as half an hour. Imagine a study for a restaurant supply ordering site where the target user group is restaurant chefs. These folks work so many hours that it is very hard to convince them to travel to a lab. Or imagine a company that develops products that are mostly sold in rural areas, such as farming or harvesting equipment. It will be challenging enough to find participants for this study, let alone asking these participants to travel to an in-person study location.
Requiring participants to travel to a lab not only substantially increases the time required for the participants, but also their compensation. As remote studies eliminate the need to travel to a lab location for both the UX researcher and the participants, the costs associated with travel do not apply.
As no lab space is required, it follows that there are no costs for it. In a remote study, the study participants are either in an environment that they select or one mandated by the study (for example, if the study requires that the participants complete the study at their place of work), but they are never required to come into a lab.
A remote study allows the participant to remain in their own environment using their own devices. Any UX study is by definition an artificial event since participants knowingly sign up to provide feedback. We can reduce the level of artificiality around a study in a number of ways. One significant factor is the participant's surroundings and the devices they are using. The participant will feel more comfortable in their own environment than in a lab environment. Imagine that the website being tested is a banking website and the target users are existing customers. If the study requires that real tasks are completed on the website, participants will probably feel more comfortable displaying their banking details on their screens while in their homes than in a lab.
In a remote study, the study participants are required to use their own devices to complete the tasks. This is another significant advantage of the remote methodology because it removes the hurdle of participants having to familiarize themselves with new hardware and/or software. Imagine the usability test of some gaming software. Gamers can be very particular about the equipment they use to play. The remote study allows them to play using the devices they are familiar with, and thus eliminates the hurdle of first having to get used to devices provided to them in a lab. Another relevant scenario is users who require assistive technology, such as a screen reader; using their own devices will ensure that they do not first have to configure settings to suit their individual needs.
Capturing the participant's own environment can also help discover important aspects of how users might interact with the product. Imagine testing a TV-streaming device. The study sponsor will be interested in understanding the individual setup of the participants' environment, such as where they sit when using the device, what the distance to the TV is, or what lighting is available. These settings are difficult to replicate to everyone's individual taste in a lab.
Interruptions can also provide valuable insight into a product's usability with respect to how easy it is to pick back up after an interruption. This is relevant for any interface that is frequently or typically used in an environment or at a time when interruptions might occur, such as a colleague coming over to the user's desk or the doorbell ringing, and so on.
In a remote study, it is possible to send participants out into “the wild” and ask them to test an interface in a real situation. Imagine an app for browsing a conference program and navigating within a conference center. Testing the app in the participant’s home will probably not yield the same insights as compared to when they are actually using it in the conference halls and trying to find the lecture they are interested in via the built-in GPS function. Another example is a scenario where a customer is ordering ahead and picking up their order in a store or restaurant. Study participants are first asked to use an app or a website to order something and are then asked to go and pick it up. These types of studies provide insight into the broader customer experience, such as how easy it was to match up the order with the orderer, whether the order was ready at the time promised, and so on.
When the product to be tested is not a production version, such as an app that can be downloaded from an app store, a website that is publicly accessible, or a product that the user already owns, the user will have to be adequately equipped. The study sponsor will either have to provide early versions of their designs, their software, or their product to each participant for testing.
This can usually be managed when planning the study, and we will talk about this in more detail in the following chapter. It only proves to be insurmountable when the study sponsor cannot or does not want to share these early versions outside of their lab or company for security or logistical reasons. In this case, only in-person testing is possible.
Some people will say that not being able to run the study in a controlled environment, such as a lab, is a disadvantage of remote methodologies, but we would argue that it provides a more natural context to the study. How likely is it that the user of an app will always have a completely quiet environment, with perfect lighting, free of any interruptions when they are using it? No one lives in a vacuum: the phone will ring, someone will be at the door, the participant’s child will want attention, and so on. If you remember the definition of usability testing, it states that the interface is tested by “representative users”. You can only really judge whether an interface is easy to use when it is being used by the people whom the products target in order to achieve a goal the users would actually have and in the environment that the users would naturally be using the interface in.
Another disadvantage of the remote methodologies that we try to turn into a benefit is that users might run into issues when installing or setting up the interface to be tested. When this setup is identical to how the actual product would be set up, this is valuable feedback regarding how easy it is to get started with the product under testing. However, when the study is based on early prototypes, their installation might require additional time and effort in order to get the participant set up correctly, and this is admittedly more difficult to do remotely than it would be if both parties were present in the same lab together.
In a remote moderated usability study, the UX researcher and the study participant are both present and connected, but not physically in the same room. The UX researcher serves as a moderator and observer during the session. Moderated studies use the internet and collaborative software so the UX researcher can moderate and observe the participant's view of the product being tested and their facial expressions if using a webcam.
All of the advantages of the remote methodologies listed previously apply to the moderated variation as well. In addition, the following are advantages of the remote moderated study when compared to the unmoderated variation.
If the remote moderated study uses software that allows the UX researcher to see the participant while they are completing the study, the facial expressions and potentially the body language of the participant may provide additional insight and context. This is often considered an advantage of the remote moderated study over the unmoderated variation.
In a remote unmoderated usability study, the UX researcher sets up the UX study and analyzes the results, but contrary to the moderated version, the study participants participate when it is convenient for them to do so, and without the moderation of the UX researcher. This allows the study participants to complete the study tasks where and when it is most natural to them.
All of the advantages of the remote methodologies listed previously apply to the unmoderated variation as well. In addition, the following are advantages of remote unmoderated studies when compared to the moderated variation.
Not being in a lab will already allow the participants to feel less inhibited and under observation, but remote unmoderated studies allow the participants to act completely natural. Imagine a music-streaming app. Ideally, the participants should use the app where and when they would naturally use it. This may be while working out or riding a bike, or even doing household chores. Participants might feel inhibited to work out or dance while being observed by a UX researcher. An unmoderated study has both the benefit that the participants can complete the study at their own convenience using the product as naturally as possible while also enabling the UX researcher to gather valuable feedback regarding the use of the app under real conditions. This is the biggest benefit of the remote unmoderated usability study.
Remote unmoderated studies are independent of time zones. This is especially relevant for globally distributed studies where the study participants are located in varying time zones. In an unmoderated study, there is no interaction required between the UX researcher and the participants, thereby allowing them to complete the study at a time that suits them.
The effort for the UX researcher conducting a remote unmoderated study is significantly less than it would be if they were conducting a remote moderated study. This is because, while the effort for preparing and reporting on the study are comparable, no dedicated sessions between the UX researcher and each participant are required.
The absence of a joint session eliminates any inadvertent influence on the study participant by the UX researcher. The validity of the data could be compromised by as subtle an influence as a sigh, or obviously taking notes, for example. Moderating a usability study requires a skilled moderator who is experienced enough to know when to jump in and when to hold back. Remote unmoderated studies can be run by UX researchers who do not necessarily have this experience.
In an unmoderated study, no effort is required for scheduling the sessions between the moderator and the participants. In order to handle no-shows, moderated studies very often use "floaters". These floaters are on standby in case someone arrives late or doesn’t show up. Managing these schedules and the floaters generates a lot of overhead, not to mention the cost of having additional participants on standby. This effort and cost are absent in remote unmoderated studies.
Testing with children is easier in remote unmoderated studies because parents can observe their kids and respond to their questions. Imagine a usability study where the product under testing is an educational app targeted at three- to six-year-olds. In an unmoderated study, the parents can be more flexible in interrupting the study and attending to their children if they might have lost patience or interest in the app, and resume the study at a later point in time.
Participants in an unmoderated study require guidance in the form of a script to let them know what they are required to do. This script requires a lot more attention to get it right than a discussion guide for a moderated session. This is obvious when you look at the advantages of this method: The fact that the participants are geographically dispersed and complete the tasks on their own schedule makes it impossible for the UX researcher to be available to guide them through any bit of the study that may be unclear or a dead end that they may inadvertently encounter. In our experience, this is the biggest stumbling point for UX researchers who are comfortable with in-person usability testing and wish to try out the remote unmoderated variety.
The scripts need to cater to any and every outcome, given that there is no possibility to help the participants during the study. You cannot always be sure what the participants are seeing and whether they are exactly where you expect them to be, so you have to anticipate each possible path and cater for it. This is not an insurmountable disadvantage; it just requires more effort and skill on the part of the UX researcher.
Another disadvantage of remote unmoderated studies as compared to moderated testing is the lack of immediate observation. Body posture and other physical expressions that could provide the moderator with additional insight during a study are not available. We don't view this as an absolute argument against remote unmoderated testing because you could also argue that, especially in globally distributed tests, the UX researcher may not be familiar with local gestures, and thus potentially interpret them wrongly. In India, for example, shaking your head does not mean "no"—it actually means "ok" or "I understand". Thus, asking participants to explicitly express their opinions in writing in an unmoderated study is less subject to bias than interpreting body language.
The lack of personal contact could also be viewed as a disadvantage because the unmoderated study does not allow for building a strong personal rapport between the UX researcher and the study participants. To some extent, it is possible to build a rapport in a remote unmoderated study just through how the participants are addressed in the script. As we saw with the previously mentioned disadvantage, though, what might help to build rapport with members of one culture might not work with others, or might even have the opposite effect.
One last disadvantage of the remote unmoderated methodology is that it is not possible to get reliable time-on-task values. Some unmoderated testing platforms may record the time taken on a task, but there is no guarantee that this will be valid because it is not possible to determine whether the participant continuously worked on the task or was distracted by a phone call during the execution of the task, for example.
- Formative (exploratory) versus summative (assessment)
- Qualitative versus quantitative
- Longitudinal studies versus single session
- Comparative studies versus single product under testing
These types are not mutually exclusive: For example, a comparative study can also be a formative study and a longitudinal study can also be a qualitative study.
The difference between formative and summative studies is determined by the development stage of the interface to be tested and the goal of the test. Formative, or exploratory, studies are run on early versions of the product when it is still being designed with the aim of understanding user behavior, needs, and wants, and how people expect to interact with the product while summative, or assessment, studies are run on half-completed to completed products, with the aim of validating the improvements made as a result of previous formative tests, validating individual features or determining a baseline usability. Formative studies are run to shape the product while summative studies are run to assess its usability. Jeff Sauro describes this as follows:
"Summative tells you how usable an interface is and formative tells you what isn't usable."
– Jeff Sauro (https://measuringu.com/formative-summative/)
Both types are compatible with remote methodologies.
Qualitative and quantitative types of study differ in the type of data that they gather. Qualitative usability studies are focused on gaining in-depth understanding based on narrative data, while quantitative studies collect numerical data in order to produce statistically relevant metrics. A qualitative study will uncover usability issues and identify why users stumble there, whereas quantitative studies will determine the task completion rate, the time-on-task, the users' satisfaction, or other relevant metrics.
A small number of participants is sufficient to provide valuable results in qualitative studies, whereas quantitative studies rely on large numbers of participants in order to provide statistically relevant metrics. How many participants are actually required for a quantitative study is determined by the study parameters, whereas qualitative studies can be run with as few as five participants, according to Nielsen (https://www.nngroup.com/articles/how-many-test-users/).
Both study types are compatible with remote methodologies.
In a single-session study, the participant is only required to participate once in order to complete the study, while a longitudinal study is run over a longer period of time and includes repeated contact with the participants during this time. This time period can be anything from a couple of days to weeks. The study participants are asked to use a product over that period of time. If the aim of the longitudinal study is that the participants record their behavior around and with the interface, this type of study is called a diary study.
Both longitudinal and single-session studies are compatible with remote methodologies.
These studies differ in the number of interfaces that are being tested. Single test object studies will cover only a single product, while comparative studies involve two or more products in order to compare them. Comparative studies may be run in order to determine which alternative of a product’s early design better reflects the user's expectations. The aim of this type of study is to either find a preferred candidate or identify the best aspects of either alternative which can then be used to drive the further design of the product. Comparative studies can also be used to determine how competing products measure up against each other.
Another aspect of comparative studies is whether all study participants will test all test objects (within-subjects) or whether each study participant only tests one of the test objects (between-subjects). Both have their advantages and disadvantages; fewer participants are required for a within-subject study, but the study duration is necessarily longer, while a between-subjects study eliminates anchoring bias (judging subsequent test objects by comparing them to the subjective baseline made after using the first one).
Comparative studies and studies of a single test object are compatible with remote methodologies.
There are other, more specific, study types. One such example is the end-to-end, multichannel customer experience study. A remote unmoderated survey approach works well in gathering data on an end-to-end customer experience that includes a digital and a physical aspect; one example of an activity that includes such aspects could be ordering pizza from a fast food chain on a mobile app and then picking it up at the restaurant. The study can evaluate the ease of installing and registering on the app; ordering food, paying, and selecting the pick-up location; the quality of the messages received apropos pickup time; and the order completion progress, if available. For the actual physical pickup, the study can gather feedback on the punctuality of the order, whether it was clear where to go to pick it up in the restaurant, what information was needed to verify the identity of the orderer, and whether it was fulfilled correctly.
The advantages of remote usability studies significantly outweigh the disadvantages. A strength of this methodology is the ease of testing across geographical areas. There are only very few situations where a remote usability study would be ill-advised.
In the next chapter, we will look into how to go about planning a remote usability study in order to prepare the foundations for a successful study.