Usability Testing: An Overview
Chapter 1: What Makes Something Usable?
Chapter 2: What Is Usability Testing?
Chapter 3: When Should You Test?
Chapter 4: Skills for Test Moderators
What Makes Something Usable?
What makes a product or service usable?
Usability is a quality that many products possess, but many, many more lack. There are historical, cultural, organizational, monetary, and other reasons for this, which are beyond the scope of this book. Fortunately, however, there are customary and reliable methods for assessing where design contributes to usability and where it does not, and for judging what changes to make to designs so a product can be usable enough to survive or even thrive in the marketplace.
It can seem hard to know what makes something usable because unless you have a breakthrough usability paradigm that actually drives sales (Apple's iPod comes to mind), usability is only an issue when it is lacking or absent. Imagine a customer trying to buy something from your company's e-commerce web site. The inner dialogue they may be having with the site might sound like this: I can't find what I'm looking for. Okay, I have found what I'm looking for, but I can't tell how much it costs. Is it in stock? Can it be shipped to where I need it to go? Is shipping free if I spend this much? Nearly everyone who has ever tried to purchase something on a web site has encountered issues like these.
It is easy to pick on web sites (after all there are so very many of them), but there are myriad other situations where people encounter products and services that are difficult to use every day. Do you know how to use all of the features on your alarm clock, phone, or DVR? When you contact a vendor, how easy is it to know what to choose in their voice-based menu of options?
In large part, what makes something usable is the absence of frustration in using it. As we lay out the process and method for conducting usability testing in this book, we will rely on this definition of “usability;” when a product or service is truly usable, the user can do what he or she wants to do the way he or she expects to be able to do it, without hindrance, hesitation, or questions.
But before we get into defining and exploring usability testing, let's talk a bit more about the concept of usability and its attributes. To be usable, a product or service should be useful, efficient, effective, satisfying, learnable, and accessible.
Usefulness concerns the degree to which a product enables a user to achieve his or her goals, and is an assessment of the user's willingness to use the product at all. Without that motivation, other measures make no sense, because the product will just sit on the shelf. If a system is easy to use, easy to learn, and even satisfying to use, but does not achieve the specific goals of a specific user, it will not be used even if it is given away for free. Interestingly enough, usefulness is probably the element that is most often overlooked during experiments and studies in the lab.
In the early stages of product development, it is up to the marketing team to ascertain what product or system features are desirable and necessary before other elements of usability are even considered. Lacking that, the development team is hard-pressed to take the user's point of view and will simply guess or, even worse, use themselves as the user model. This is very often where a system-oriented design takes hold.
Efficiency is the quickness with which the user's goal can be accomplished accurately and completely and is usually a measure of time. For example, you might set a usability testing benchmark that says “95 percent of all users will be able to load the software within 10 minutes.”
Effectiveness refers to the extent to which the product behaves in the way that users expect it to and the ease with which users can use it to do what they intend. This is usually measured quantitatively with error rate. Your usability testing measure for effectiveness, like that for efficiency, should be tied to some percentage of total users. Extending the example from efficiency, the benchmark might be expressed as “95 percent of all users will be able to load the software correctly on the first attempt.”
Learnability is a part of effectiveness and has to do with the user's ability to operate the system to some defined level of competence after some predetermined amount and period of training (which may be no time at all). It can also refer to the ability of infrequent users to relearn the system after periods of inactivity.
Satisfaction refers to the user's perceptions, feelings, and opinions of the product, usually captured through both written and oral questioning. Users are more likely to perform well on a product that meets their needs and provides satisfaction than one that does not. Typically, users are asked to rate and rank products that they try, and this can often reveal causes and reasons for problems that occur.
Usability goals and objectives are typically defined in measurable terms of one or more of these attributes. However, let us caution that making a product usable is never simply the ability to generate numbers about usage and satisfaction. While the numbers can tell us whether a product “works” or not, there is a distinctive qualitative element to how usable something is as well, which is hard to capture with numbers and is difficult to pin down. It has to do with how one interprets the data in order to know how to fix a problem because the behavioral data tells you why there is a problem. Any doctor can measure a patient's vital signs, such as blood pressure and pulse rate. But interpreting those numbers and recommending the appropriate course of action for a specific patient is the true value of the physician. Judging the several possible alternative causes of a design problem, and knowing which are especially likely in a particular case, often means looking beyond individual data points in order to design effective treatment. There exist these little subtleties that evade the untrained eye.
Accessibility and usability are siblings. In the broadest sense, accessibility is about having access to the products needed to accomplish a goal. But in this book when we talk about accessibility, we are looking at what makes products usable by people who have disabilities. Making a product usable for people with disabilities—or who are in special contexts, or both—almost always benefits people who do not have disabilities. Considering accessibility for people with disabilities can clarify and simplify design for people who face temporary limitations (for example, injury) or situational ones (such as divided attention or bad environmental conditions, such as bright light or not enough light). There are many tools and sets of guidelines available to assist you in making accessible designs. (We include pointers to accessibility resources on the web site that accompanies this book (see www.wiley.com/go/usabilitytesting.com for more information.) You should acquaint yourself with accessibility best practices so that you can implement them in your organization's user-centered design process along with usability testing and other methods.
Making things more usable and accessible is part of the larger discipline of user-centered design (UCD), which encompasses a number of methods and techniques that we will talk about later in this chapter. In turn, user-centered design rolls up into an even larger, more holistic concept called experience design. Customers may be able to complete the purchase process on your web site, but how does that mesh with what happens when the product is delivered, maintained, serviced, and possibly returned? What does your organization do to support the research and decision-making process leading up to the purchase? All of these figure in to experience design.
Which brings us back to usability.
True usability is invisible. If something is going well, you don't notice it. If the temperature in a room is comfortable, no one complains. But usability in products happens along a continuum. How usable is your product? Could it be more usable even though users can accomplish their goals? Is it worth improving?
Most usability professionals spend most of their time working on eliminating design problems, trying to minimize frustration for users. This is a laudable goal! But know that it is a difficult one to attain for every user of your product. And it affects only a small part of the user's experience of accomplishing a goal. And, though there are quantitative approaches to testing the usability of products, it is impossible to measure the usability of something. You can only measure how unusable it is: how many problems people have using something, what the problems are and why.
By incorporating evaluation methods such as usability testing throughout an iterative design process, it is possible to make products and services that are useful and usable, and possibly even delightful.
Why are so many high-tech products so hard to use?
In this section, we explore this question, discuss why the situation exists, and examine the overall antidote to this problem. Many of the examples in this book involve not only consumer hardware, software, and web sites but also documentation such as user's guides and embedded assistance such as on-screen instructions and error messages. The methods in this book also work for appliances such as music players, cell phones, and game consoles. Even products, such as the control panel for an ultrasound machine or the user manual for a digital camera, fall within the scope of this book.
Five Reasons Why Products Are Hard to Use
For those of you who currently work in the product development arena, as engineers, user-interface designers, technical communicators, training specialists, or managers in these disciplines, it seems likely that several of the reasons for the development of hard-to-use products and systems will sound painfully familiar.
- Development focuses on the machine or system.
- Target audiences change and adapt.
- Designing usable products is difficult.
- Team specialists don't always work in integrated ways.
- Design and implementation don't always match.
Reason 1: Development Focuses on the Machine or System
During design and development of the product, the emphasis and focus may have been on the machine or system, not on the person who is the ultimate end user. The general model of human performance shown in Figure 1.1 helps to clarify this point.
There are three major components to consider in any type of human performance situation as shown in Bailey's Human performance model.
- The human
- The context
- The activity
Because the development of a system or product is an attempt to improve human performance in some area, designers should consider these three components during the design process. All three affect the final outcome of how well humans ultimately perform. Unfortunately, of these three components, designers, engineers, and programmers have traditionally placed the greatest emphasis on the activity component, and much less emphasis on the human and the context components. The relationship of the three components to each other has also been neglected. There are several explanations for this unbalanced approach:
- There has been an underlying assumption that because humans are so inherently flexible and adaptable, it is easier to let them adapt themselves to the machine, rather than vice versa.
- Developers traditionally have been more comfortable working with the seemingly “black and white,” scientific, concrete issues associated with systems, than with the more gray, muddled, ambiguous issues associated with human beings.
- Developers have historically been hired and rewarded not for their interpersonal, “people” skills but for their ability to solve technical problems.
- The most important factor leading to the neglect of human needs has been that in the past, designers were developing products for end users who were much like themselves. There was simply no reason to study such a familiar colleague. That leads us to the next point.
Reason 2: Target Audiences Expand and Adapt
As technology has penetrated the mainstream consumer market, the target audience has expanded and continues to change dramatically. Development organizations have been slow to react to this evolution.
The original users of computer-based products were enthusiasts (also known as early adopters) possessing expert knowledge of computers and mechanical devices, a love of technology, the desire to tinker, and pride in their ability to troubleshoot and repair any problem. Developers of these products shared similar characteristics. In essence, users and developers of these systems were one and the same. Because of this similarity, the developers practiced “next-bench” design, a method of designing for the user who is literally sitting one bench away in the development lab. Not surprisingly, this approach met with relative success, and users rarely if ever complained about difficulties.
Why would they complain? Much of their joy in using the product was the amount of tinkering and fiddling required to make it work, and enthusiast users took immense pride in their abilities to make these complicated products function. Consequently, a “machine-oriented” or “system-oriented” approach met with little resistance and became the development norm.
Today, however, all that has changed dramatically. Users are apt to have little technical knowledge of computers and mechanical devices, little patience for tinkering with the product just purchased, and completely different expectations from those of the designer. More important, today's user is not even remotely comparable to the designer in skill set, aptitude, expectation, or almost any attribute that is relevant to the design process. Where in the past, companies might have found Ph.D. chemists using their products, today they will find high-school graduates performing similar functions. Obviously, “next-bench” design simply falls apart as a workable design strategy when there is a great discrepancy between user and designer, and companies employing such a strategy, even inadvertently, will continue to produce hard-to-use products.
Designers aren't hobbyist enthusiasts (necessarily) anymore; most are trained professionals educated in human computer interaction, industrial design, human factors engineering, or computer science, or a combination of these. Whereas before it was unusual for a nontechnical person to use electronic or computer-based equipment, today it is almost impossible for the average person not to use such a product in either the workplace or in private life. The overwhelming majority of products, whether in the workplace or the home, be they cell phones, DVRs, web sites, or sophisticated testing equipment, are intended for this less technical user. Today's user wants a tool, not another hobby.
Reason 3: Designing Usable Products Is Difficult
The design of usable systems is a difficult, unpredictable endeavor, yet many organizations treat it as if it were just “common sense.”
While much has been written about what makes something usable, the concept remains maddeningly elusive, especially for those without a background in either the behavioral or social sciences. Part art, part science, it seems that everyone has an opinion about usability, and how to achieve it—that is, until it is time to evaluate the usability of a product (which requires an operational definition and precise measurement).
This trivializing of usability creates a more dangerous situation than if product designers freely admitted that designing for usability was not their area of expertise and began to look for alternative ways of developing products. Or as Will Rogers so aptly stated “It's not the things that we don't know that gets us into trouble; it's the things we do know that ain't so.” In many organizations usability engineering has been approached as if it were nothing more than “common sense.”
When this book was first published in 1994, few systems designers and developers had knowledge of the basic principles of user-centered design. Today, most designers have some knowledge of—or at least exposure to—user-centered design practices, whether they are aware of them or not. However, there are still gaps between awareness and execution. Usability principles are still not obvious, and there is still a great need for education, assistance, and a systematic approach in applying so-called “common sense” to the design process.
Reason 4: Team Specialists Don't Always Work in Integrated Ways
Organizations employ very specialized teams and approaches to product and system development, yet fail to integrate them with each other.
To improve efficiency, many organizations have broken down the product development process into separate system components developed independently. For example, components of a software product include the user interface, the help system, and the written materials. Typically, these components are developed by separate individuals or teams. Now, there is nothing inherently wrong with specialization. The difficulty arises when there is little integration of these separate components and poor communication among the different development teams.
Often the product development proceeds in separate, compartmentalized sections. To an outsider looking on, the development would be seen as depicted in Figure 1.2.
Each development group functions independently, almost as a silo, and the final product often reflects this approach. The help center will not adequately support the user interface or it will be organized very differently from the interface. Or user documentation and help will be redundant with little cross-referencing. Or the documentation will not reflect the latest version of the user interface. You get the picture.
The problem occurs when the product is released. The end user, upon receiving this new product, views it and expects it to work as a single, integrated product, as shown in Figure 1.3. He or she makes no particular distinction among the three components, and each one is expected to support and work seamlessly with the others. When the product does not work in this way, it clashes with the user's expectations, and whatever advantages accrue through specialization are lost.
Even more interesting is how often organizations unknowingly exacerbate this lack of integration by usability testing each of the components separately. Documentation is tested separately from the interface, and the interface separately from the help. Ultimately, this approach is futile, because it matters little if each component is usable within itself. Only if the components work well together will the product be viewed as usable and meeting the user's needs.
Fortunately, there have been advances in application development methodologies in recent years that emphasize iterated design and interdisciplinary teams. Plus there are great examples of cutting-edge products and services built around usability advantages that are dominating their markets, such as Netflix, eBay, Yahoo!, and the iPod and iPhone, as well as Whirlpool's latest line of home appliances. Their integration of components is a key contributor to their success.
Reason 5: Design and Implementation Don't Always Match
The design of the user interface and the technical implementation of the user interface are different activities, requiring very different skills. Today, the emphasis and need are on design skills, while many engineers possess the mind-set and skill set for technical implementation.
Design, in this case, relates to how the product communicates, whereas implementation refers to how it works. Previously, this dichotomy between design and implementation was rarely even acknowledged. Engineers and designers were hired for their technical expertise (e.g., programming and machine-oriented analysis) rather than for their design expertise (e.g., communication and human-oriented analysis). This is understandable, because with early generation computer languages the great challenge lay in simply getting the product to work. If it communicated elegantly as well, so much the better, but that was not the prime directive.
With the advent of new-generation programming languages and tools to automatically develop program code, the challenge of technical implementation has diminished. The challenge of design, however, has increased dramatically due to the need to reach a broader, less sophisticated user population and the rising expectations for ease of use. To use a computer analogy, the focus has moved from the inside of the machine (how it works) to the outside where the end user resides (how it communicates).
This change in focus has altered the skills required of designers. This evolution toward design and away from implementation will continue. Someday, perhaps skills such as programming will be completely unnecessary when designing a user interface.
These five reasons merely brush the surface of how and why unusable products and systems continue to flourish. More important is the common theme among these problems and misperceptions; namely that too much emphasis has been placed on the product itself and too little on the desired effects the product needs to achieve. Especially in the heat of a development process that grows shorter and more frenetic all the time, it is not surprising that the user continues to receive too little attention and consideration.
It is easy for designers to lose touch with the fact that they are not designing products per se, but rather they are designing the relationship of product and human. Furthermore, in designing this relationship, designers must allow the human to focus on the task at hand—help the human attain a goal—not on the means with which to do that task. They are also designing the relationship of the various product components to each other. This implies excellent communication among the different entities designing the total product and those involved in the larger experience of using the product in a life or work context. What has been done in the past simply will not work for today's user and today's technologies.
What is needed are methods and techniques to help designers change the way they view and design products—methods that work from the outside in, from the end user's needs and abilities to the eventual implementation of the product is user-centered design (UCD). Because it is only within the context of UCD that usability testing makes sense and thrives, let's explore this notion of user-centered design in more detail.
User-centered design (UCD) describes an approach that has been around for decades under different names, such as human factors engineering, ergonomics, and usability engineering. (The terms human factors engineering and ergonomics are almost interchangeable, the major difference between the two having more to do with geography than with real differences in approach and implementation. In the United States, human factors engineering is the more widely used term, and in other countries, most notably in Europe, ergonomics is more widely used.) UCD represents the techniques, processes, methods, and procedures for designing usable products and systems, but just as important, it is the philosophy that places the user at the center of the process.
Although the design team must think about the technology of the product first (can we build what we have in mind?), and then what the features will be (will it do what we want it to do?), they must also think about what the user's experience will be like when he or she uses the product. In user-centered design, development starts with the user as the focus, taking into account the abilities and limitations of the underlying technology and the features the company has in mind to offer.
As a design process, UCD seeks to support how target users actually work, rather than forcing users to change what they do to use something. The International Organization for Standardization (ISO) in standard 13407 says that UCD is “characterized by: the active involvement of users and a clear understanding of user and task requirements; an appropriate allocation of function between users and technology; the iteration of design solutions; multidisciplinary design.”
Going beyond user-centered design of a product, we should be paying attention to the whole user experience in the entire cycle of user ownership of a product. Ideally, the entire process of interacting with potential customers, from the initial sales and marketing contact through the entire duration of ownership through the point at which another product is purchased or the current one upgraded, should also be included in a user-centered approach. In such a scenario, companies would extend their concern to include all prepurchase and postpurchase contacts and interactions. However, let's take one step at a time, and stick to the design process.
Numerous articles and books have been written on the subject of user-centered design (UCD) (for a list of our favorites, see the web site that accompanies this book, www.wiley.com/go/usabilitytesting.com). However, it is important for the reader to understand the basic principles of UCD in order to understand the context for performing usability testing. Usability testing is not UCD itself; it is merely one of several techniques for helping ensure a good, user-centered design.
We want to emphasize these basic principles of user-centered design:
- Early focus on users and their tasks
- Evaluation and measurement of product usage
- Iterated design
An Early Focus on Users and Tasks
More than just simply identifying and categorizing users, we recommend direct contact between users and the design team throughout the development lifecycle. Of course, your team needs training and coaching in how to manage these interactions. This is a responsibility that you can take on as you become more educated and practiced, yourself.
Though a goal should be to institutionalize customer contact, be wary of doing it merely to complete a check-off box on one's performance appraisal form. What is required is a systematic, structured approach to the collection of information from and about users. Designers require training from expert interviewers before conducting a data collection session. Otherwise, the results can be very misleading.
Evaluation and Measurement of Product Usage
Here, emphasis is placed on behavioral measurements of ease of learning and ease of use very early in the design process, through the development and testing of prototypes with actual users.
Iterative Design and Testing
Much has been made about the importance of design iteration. However, this is not just fine-tuning late in the development cycle. Rather, true iterative design allows for the complete overhaul and rethinking of a design, through early testing of conceptual models and design ideas. If designers are not prepared for such a major step, then the influence of iterative design becomes minimal and cosmetic. In essence, true iterative design allows one to “shape the product” through a process of design, test, redesign, and retest activities.
Attributes of Organizations That Practice UCD
User-centered design demands a rethinking of the way in which most companies do business, develop products, and think about their customers. While currently there exists no cookie-cutter formula for success, there are common attributes that companies practicing UCD share. For example:
- Phases that include user input
- Multidisciplinary teams
- Concerned, enlightened management
- A “learn as you go” perspective
- Defined usability goals and objectives
Phases That Include User Input
Unlike the typical phases we have all seen in traditional development methodologies, a user-centered approach is based on receiving user feedback or input during each phase, prior to moving to the next phase. This can involve a variety of techniques, usability testing being only one of these.
Today, most major companies that develop technology-based products or systems have product lifecycles that include some type of usability engineering/human factors process. In that process, questions arise. These questions and some suggested methods for answering them appear in Figure 1.4.
Within each phase, there will be a variety of usability engineering activities. Note that, although this particular lifecycle is written from the viewpoint of the human factors specialist's activities, there are multiple places where collaboration is required among various team members. This leads to our next attribute of organizations practicing UCD.
A Multidisciplinary Team Approach
No longer can design be the province of one person or even of one specialty. While one designer may take ultimate responsibility for a product's design, he or she is not all-knowing about how to proceed. There are simply too many factors to consider when designing very complex products for less technical end users. User-centered design requires a variety of skills, knowledge, and, most importantly, information about the intended user and usage. Today, teams composed of specialists from many fields, such as engineering, marketing, training, user-interface design, human factors, and multimedia, are becoming the norm. In turn, many of these specialists have training in complementary areas, so cross-discipline work is easier and more dynamic than ever before.
Concerned, Enlightened Management
Typically, the degree to which usability is a true corporate concern is the degree to which a company's management is committed to following its own lifecycle and giving its guidelines teeth by holding the design team accountable. Management understands that there are financial benefits to usability and market share to be won.
A “Learn as You Go” Perspective
UCD is an evolutionary process whereby the final product is shaped over time. It requires designers to take the attitude that the optimum design is acquired through a process of trial and error, discovery, and refinement. Assumptions about how to proceed remain assumptions and are not cast in concrete until evaluated with the end user. The end user's performance and preferences are the final arbiters of design decisions.
Defined Usability Goals and Objectives
Designing a product to be useful must be a structured and systematic process, beginning with high-level goals and moving to specific objectives. You cannot achieve a goal—usability or otherwise—if it remains nebulous and ill-conceived. Even the term usability itself must be defined with your organization. An operational definition of what makes your product usable (tied to successful completion criteria, as we will talk about in Chapter 5) may include:
Thus bringing us full circle to our original description of what makes a product usable. Now let's review some of the major techniques and methods a usability specialist uses to ensure a user-centered design.
UCD comprises a variety of techniques, methods, and practices, each applied at different points in the product development lifecycle. Reviewing the major methods will help to provide some context for usability testing, which itself is one of these techniques. Please note that the order in which the techniques are described is more or less the order in which they would be employed during a product's development lifecycle.
Ethnographic research borrows techniques from anthropology. It involves observing users in the place where they would normally use the product (e.g., work, home, coffee bar, etc.) to gather data about who your target users are, what tasks and goals they have related to your planned product (or enhancements), and the context in which they work to accomplish their goals. From this qualitative research, you can develop user profiles, personas (archetype users), scenarios, and task descriptions on which you and the design team can base design decisions throughout the development lifecycle.
Less a technique and more an embodiment of UCD, participatory design employs one or more representative users on the design team itself. Often used for the development of in-house systems, this approach thrusts the end user into the heart of the design process from the very commencement of the project by tapping the user's knowledge, skill set, and even emotional reactions to the design. The potential danger is that the representative users can become too close to the design team. They begin to react and think like the others, or by virtue of their desire to avoid admonishing their colleagues, withhold important concerns or criticism.
A variation on this technique is to arrange short, individual workshops where users, designers, and developers work together on an aspect of design. For example, users, designers, and engineers using workable models, work together to determine the best size and shape for the product.
Focus Group Research
Use focus group research at the very early stages of a project to evaluate preliminary concepts with representative users. It can be considered part of “proof of concept” review. In some cases it is used to identify and confirm the characteristics of the representative user altogether. All focus group research employs the simultaneous involvement of more than one participant, a key factor in differentiating this approach from many other techniques.
The concepts that participants evaluate in these group sessions can be presented in the most preliminary form, such as paper-and-pencil drawings, storyboards, and/or more elaborate screen-based prototypes or plastic models. The objective is to identify how acceptable the concepts are, in what ways they are unacceptable or unsatisfactory, and how they might be made more acceptable and useful. The beauty of the focus group is its ability to explore a few people's judgments and feelings in great depth, and in so doing learn something about how end users think and feel. In this way, focus groups are very different from—and no substitute for—usability tests. A focus group is good for general, qualitative information but not for learning about performance issues and real behaviors. Remember, people in focus groups are reporting what they feel like telling you, which is almost always different from what they actually do. Usability tests are best for observing behaviors and measuring performance issues, while perhaps gathering some qualitative information along the way.
By administering surveys you can begin to understand the preferences of a broad base of users about an existing or potential product. While the survey cannot match the focus group in its ability to plumb for in-depth responses and rationale, it can use larger samples to generalize to an entire population. For example, the Nielsen ratings, one of the most famous ongoing surveys, are used to make multimillion-dollar business decisions for a national population based on the preferences of about 1500 people. Surveys can be used at any time in the lifecycle but are most often used in the early stages to better understand the potential user. An important aspect of surveys is that their language must be crystal clear and understood in the same way by all readers, a task impossible to perform without multiple tested iterations and adequate preparation time. Again, asking people about what they do or have done is no substitute for observing them do it in a usability test.
Once you have a good idea who your target users are and the task goals they have, walk-throughs are used to explore how a user might fare with a product by envisioning the user's route through an early concept or prototype of the product. Usually the designer responsible for the work guides his or her colleagues through actual user tasks (sometimes even playing the role for the user), while another team member records difficulties encountered or concerns of the team. In a structured walk-through, as first developed by IBM to perform code reviews, the participants assume specific roles (e.g., moderator, recorder) and follow explicit guidelines (e.g., no walk-through longer than two hours) to ensure the effectiveness of the effort. Rather than the designer taking on the role of the user, you may want to bring in a real user, perhaps someone from a favored client.
Open and Closed Card Sorting
Use card sorting to design in “findability” of content or functionality. This is a very inexpensive method for getting user input on content organization, vocabulary, and labeling in the user interface. You can either give participants cards showing content without titles or categories and have the users do the naming (an open card sort), or give participants preliminary or preexisting categories and ask participants to sort content or functions into those (a closed sort).
In this technique users are shown an aspect of a product on paper and asked questions about it, or asked to respond in other ways. To learn whether the flow of screens or pages that you have planned supports users' expectations, you may mock up pages with paper and pencil on graph paper, or create line drawings or wireframe drawings of screens, pages, or panels, with a version of the page for each state. For example, if the prototype is for a shopping cart for an e-commerce web site, you can show the cart with items, as items are being changed, and then with shipping and taxes added. (Or, you may simply decide to have the participant or the “computer” fill these items in as the session progresses.)
To learn whether the labels help users know what to expect next, and if the categories you have planned reflect how users think and talk about tasks, you can show the top-level navigation. As the participant indicates the top-level choice, you then show the next level of navigation for that choice. The process continues until the user has gone as deeply into the navigation as you have designed and prepared for the sessions.
Or, you may simply ask participants about the prototype you have created. The questions can range from particular attributes, such as organization and layout, to where one might find certain options or types of information.
The value of the paper prototype or paper-and-pencil evaluation is that critical information can be collected quickly and inexpensively. One can ascertain those functions and features that are intuitive and those that are not, before one line of code has been written. In addition, technical writers might use the technique to evaluate the intuitiveness of their table of contents before writing one word of text. The technique can be employed again and again with minimal drain on resources.
Expert or Heuristic Evaluations
Expert evaluations involve a review of a product or system, usually by a usability specialist or human factors specialist who has little or no involvement in the project. The specialist performs his or her review according to accepted usability principles (heuristics) from the body of research, human factors literature, and previous professional experience. The viewpoint is that of the specific target population that will use the product.
A “double” specialist, that is, someone who is an expert in usability principles or human factors as well as an expert in the domain area (such as healthcare, financial services, and so on, depending on the application), or in the particular technology employed by the product, can be more effective than one without such knowledge.
Usability testing, the focus of this book, employs techniques to collect empirical data while observing representative end users using the product to perform realistic tasks. Testing is roughly divided into two main approaches. The first approach involves formal tests conducted as true experiments, in order to confirm or refute specific hypotheses. The second approach, a less formal but still rigorous one (and the one we emphasize in this book), employs an iterative cycle of tests intended to expose usability deficiencies and gradually shape or mold the product in question.
A follow-up study occurs after formal release of the product. The idea is to collect data for the next release, using surveys, interviews, and observations. Structured follow-up studies are probably the truest and most accurate appraisals of usability, because the actual user, product, and environment are all in place and interacting with each other. That follow-up studies are so rare is unfortunate because designers would benefit immensely from learning what happened to the product that they spent two years of their lives perfecting. Sales figures, while helpful, add nothing to one's knowledge of the product's strengths and weaknesses.
This is not a definitive list of methods by any means, and it is meant merely to provide the reader with an appreciation for the wealth of techniques available and the complexity involved in implementing a UCD approach. It is a rare organization that performs all of these techniques, and just as few conduct them in their pure form. Typically, they are used in altered and combined form, as the specific needs and constraints of a project dictate. For more about these techniques, check out our list of resources on the web site that accompanies this book at www.wiley.com/go/usabilitytesting.com.
Now let's take a closer look at one of the most renowned techniques of all the ones discussed, and the focus of this book, usability testing, in Chapter 2.
What Is Usability Testing?
The term usability testing is often used rather indiscriminately to refer to any technique used to evaluate a product or system. Many times it is obvious that the speaker is referring to one of the other techniques discussed in Chapter 1. Throughout this book we use the term usability testing to refer to a process that employs people as testing participants who are representative of the target audience to evaluate the degree to which a product meets specific usability criteria. This inclusion of representative users eliminates labeling as usability testing such techniques as expert evaluations, walk-throughs, and the like that do not require representative users as part of the process.
Usability testing is a research tool, with its roots in classical experimental methodology. The range of tests one can conduct is considerable, from true classical experiments with large sample sizes and complex test designs to very informal qualitative studies with only a single participant. Each testing approach has different objectives, as well as different time and resource requirements. The emphasis of this book is on more informal, less complex tests designed for quick turnaround of results in industrial product development environments.
From the point of view of some companies, usability testing is part of a larger effort to improve the profitability of products. There are many aspects to doing so, which in the end also benefits users greatly: design decisions are informed by data gathered from representative users to expose design issues so they can be remedied, thus minimizing or eliminating frustration for users.
The overall goal of usability testing is to inform design by gathering data from which to identify and rectify usability deficiencies existing in products and their accompanying support materials prior to release. The intent is to ensure the creation of products that:
- Are useful to and valued by the target audience
- Are easy to learn
- Help people be effective and efficient at what they want to do
- Are satisfying (and possibly even delightful) to use
Eliminating Design Problems and Frustration
One side of the profitability coin is the ease with which customers can use the product. When you minimize the frustration of using a product for your target audience by remedying flaws in the design ahead of product release, you also accomplish these goals:
- Set the stage for a positive relationship between your organization and your customers.
- Establish the expectation that the products your organization sells are high quality and easy to use.
- Demonstrate that the organization considers the goals and priorities of its customers to be important.
- Release a product that customers find useful, effective, efficient, and satisfying.
Goals or benefits of testing for your organization are:
- Creating a historical record of usability benchmarks for future releases. By keeping track of test results, a company can ensure that future products either improve on or at least maintain current usability standards.
- Minimizing the cost of service and support calls. A more usable product will require fewer service calls and less support from the company.
- Increasing sales and the probability of repeat sales. Usable products create happy customers who talk to other potential buyers or users. Happy customers also tend to stick with future releases of the product, rather than purchase a competitor's product.
- Acquiring a competitive edge because usability has become a market separator for products. Usability has become one of the main ways to separate one's product from a competitor's product in the customer's mind. One need only scan the latest advertising to see products described using phrases such as “simple” and “easy” among others. Unfortunately, this information is rarely truthful when put to the test.
- Minimizing risk. Actually, all companies and organizations have conducted usability testing for years. Unfortunately, the true name for this type of testing has been “product release,” and the “testing” involved trying the product in the marketplace. Obviously, this is a very risky strategy, and usability testing conducted prior to release can minimize the considerable risk of releasing a product with serious usability problems.
The basic methodology for conducting a usability test has its origin in the classical approach for conducting a controlled experiment. With this formal approach, often employed to conduct basic research, a specific hypothesis is formulated and then tested by isolating and manipulating variables under controlled conditions. Cause-and-effect relationships are then carefully examined, often through the use of the appropriate inferential statistical technique(s), and the hypothesis is either confirmed or rejected. Employing a true experimental design, these studies require that:
- A hypothesis must be formulated. A hypothesis states what you expect to occur when testing. For example, “Help as designed in format A will improve the speed and error rate of experienced users more than help as designed in format B.” It is essential that the hypothesis be as specific as possible.
- Randomly chosen (using a very systematic method) participants must be assigned to experimental conditions. One needs to understand the characteristics of the target population, and from that larger population select a representative random sample. Random sampling is often difficult, especially when choosing from a population of existing customers.
- Tight controls must be employed. Experimental controls are crucial or else the validity of the results can be called into question, regardless of whether statistical significance is the goal. All participants should have nearly the identical experience as each other prior to and during the test. In addition, the amount of interaction with the test moderator must be controlled.
- Control groups must be employed. In order to validate results, a control group must be employed; its treatment should vary only on the single variable being tested.
- The sample (of users) must be of sufficient size to measure statistically significant differences between groups. In order to measure differences between groups statistically, a large enough sample size must be used. Too small a sample can lead to erroneous conclusions.
The preceding approach is the basis for conducting classical experiments, and when conducting basic research, it is the method of choice. However, it is not the method expounded in this book for the following reasons.
- It is often impossible or inappropriate to use such a methodology to conduct usability tests in the fast-paced, highly pressurized development environment in which most readers will find themselves. It is impossible because of the many organizational constraints, political and otherwise. It is inappropriate because the purpose of usability testing is not necessarily to formulate and test specific hypotheses, that is, conduct research, but rather to make informed decisions about design to improve products.
- The amount of prerequisite knowledge of experimental method and statistics required in order to perform these kinds of studies properly is considerable and better left to an experienced usability or human factors specialist. Should one attempt to conduct this type of tight research without the appropriate background and training, the results can often be very misleading, and lead to a worse situation than if no research had been conducted.
- In the environment in which testing most often takes place, it is often very difficult to apply the principle of randomly assigning participants because one often has little control over this factor. This is especially true as it concerns the use of existing customers as participants.
- Still another reason for a less formal approach concerns sample size. To achieve generalizable results for a given target population, one's sample size is dependent on knowledge of certain information about that population, which is often lacking (and sometimes the precise reason for the test). Lacking such information, one may need to test 10 to 12 participants per condition to be on the safe side, a factor that might require one to test 40 or more participants to ensure statistically significant results.
- Last, and probably most important, the classical methodology is designed to obtain quantitative proof of research hypotheses that one design is better than another, for example. It is not designed to obtain qualitative information on how to fix problems and redesign products. We assume that most readers will be more concerned with the latter than the former.
The approach we advocate is a more informal, iterative approach to testing, albeit with experimental rigor at its core. As the reader will see in later chapters of this book, experimental rigor is essential for any study that one conducts.
Much can be achieved by conducting a series of quick, pointed studies, beginning early in the development cycle. It is the intent of this book to present the basics of conducting this type of less formal, yet well-designed test that will identify the specific usability deficiencies of a product, their cause, and the means to overcome them. The basics of this approach are described in the sections that follow.
Basic Elements of Usability Testing
- Development of research questions or test objectives rather than hypotheses.
- Use of a representative sample of end users which may or may not be randomly chosen.
- Representation of the actual work environment.
- Observation of end users who either use or review a representation of the product.
- Controlled and sometimes extensive interviewing and probing of the participants by the test moderator.
- Collection of quantitative and qualitative performance and preference measures.
- Recommendation of improvements to the design of the product.
We detail the “how-to” of this approach in the chapters that follow.
Limitations of Testing
Now, having painted a rather glorified picture of what usability testing is intended to accomplish, let's splash a bit of cold water on the situation. Testing is neither the end-all nor be-all for usability and product success, and it is important to understand its limitations. Testing does not guarantee success or even prove that a product will be usable. Even the most rigorously conducted formal test cannot, with 100 percent certainty, ensure that a product will be usable when released. Here are some reasons why:
- Testing is always an artificial situation. Testing in the lab, or even testing in the field, still represents a depiction of the actual situation of usage and not the situation itself. The very act of conducting a study can itself affect the results.
- Test results do not prove that a product works. Even if one conducts the type of test that acquires statistically significant results, this still does not prove that a product works. Statistical significance is simply a measure of the probability that one's results were not due to chance. It is not a guarantee, and it is very dependent upon the way in which the test was conducted.
- Participants are rarely fully representative of the target population. Participants are only as representative as your ability to understand and classify your target audience. Market research is not an infallible science, and the actual end user is often hard to identify and describe.
- Testing is not always the best technique to use. There are many techniques intended to evaluate and improve products, as discussed in Chapter 1 and Chapter 13. For example, in some cases it is more effective both in terms of cost, time, and accuracy to conduct an expert or heuristic evaluation of a product rather than test it. This is especially true in the early stages of a product when gross violations of usability principles abound. It is simply unnecessary to bring in many participants to reveal the obvious.
However, in spite of these limitations, usability testing, when conducted with care and precision, for the appropriate reasons, at the appropriate time in the product development lifecycle, and as part of an overall user-centered design approach, is an almost infallible indicator of potential problems and the means to resolve them. It minimizes the risk considerably of releasing an unstable or unlearnable product. In almost every case, and this is an underlying theme of this book: it is better to test than not to test.
The next chapter covers the basics for conducting four types of specific tests and then provides a hypothetical case study employing all four tests in the course of a development cycle.
When Should You Test?
Some type of usability testing fits into every phase of a development lifecycle. The type of testing is distinguished by the research questions asked, the state of the completeness of the product, and the time available for implementing solutions to problems revealed in testing. This chapter outlines four types of tests that fit into the general phases that any product development cycle goes though (see Figure 3.1).
The literature is filled with a variety of testing methodologies, each with a slightly different purpose. Often, different terms are used to describe identical testing techniques. Needless to say, this can be extremely confusing. In deciding which tests to discuss and emphasize, the most beneficial approach might be to use the product development lifecycle as a reference point for describing several different types of tests. Associating a test with a particular phase in the lifecycle should help you understand the test's purpose and benefits.
We discuss three tests—exploratory (or formative), assessment (or summative), and validation (or verification) tests—at a high level, according to the approximate point in the product development lifecycle at which each would be administered. The fourth type of test, the comparison test, can be used as an integral part of any of the other three tests and is not associated with any specific lifecycle phase.
The basic methodology for conducting each test is roughly the same and is described in detail in Chapter 5. However, each test will vary in its emphasis on qualitative vs. quantitative measures, and by the amount of interaction between test moderator and participant. Also, the tests expounded here are definitely biased toward an environment of tight deadlines and limited resources, and chosen with a keen eye on the bottom line.
Our other purpose for presenting the test types in terms of the product development lifecycle has to do with the power of iterative design. Usability testing is most powerful and most effective when implemented as part of an iterative product development process. That is, a cycle of design, test and measure, and redesign throughout the product development lifecycle has the greatest probability of concluding with a usable product. Even if important product flaws or deficiencies are missed during one test, another testing cycle offers the opportunity to identify these problems or issues.
An iterative design and testing approach also allows one to make steady and rapid progress on a project, to learn through empirical evidence, and to “shape” the product to fit the end users' abilities, expectations, and aptitude. We feel very strongly that such an approach provides the value when resources are limited, and that one will obtain the best results by conducting a series of short, precise tests that build one upon the other.
However, while the tests we are about to describe lend themselves to an iterative design process, one need not be concerned about applying the tests at exactly the correct moment. Rather, consider what it is that you need to understand about your product, and let that drive your test objectives and the appropriate application of a particular test method. Also, do not be put off if you are unable to conduct multiple tests. One test is almost always better than none, and it is better to focus on what you can do than on what you cannot do.
The first three tests, exploratory (or formative), assessment (or summative), and validation (or verification), are shown in Figure 3.1 next to the approximate points in the lifecycle at which they are most effectively conducted. Now let's review each in turn.
The exploratory study is conducted quite early in the development cycle, when a product is still in the preliminary stages of being defined and designed (hence the reason it is sometimes called “formative”). By this point in the development cycle, the user profile and usage model (or task analysis) of the product will have (or should have) been defined. The project team is probably wrestling with the functional specification and early models of the product. Or perhaps the requirements and specifications phase is completed, and the design phase is just about to begin.
The main objective of the exploratory study is to examine the effectiveness of preliminary design concepts. If one thinks of a user interface or a document as being divided into a high-level aspect and a more detailed aspect, the exploratory study is concerned with the former.
For example, designers of a Web application interface would benefit greatly knowing early on whether the user intuitively grasps the fundamental and distinguishing elements of the interface. For example, designers might want to know how well the interface:
- Supports users' tasks within a goal.
- Communicates the intended workflow.
- Allows the user to navigate from screen to screen and within a screen.
Or, using the task-oriented user guide of a software product as an example, technical writers typically might want to explore the following high-level issues:
- Overall organization of subject matter
- Whether to use a graphic or verbal approach
- How well the proposed format supports findability
- Anticipated points of assistance and messaging
- How to address reference information
The implications of these high-level issues go beyond the product, because you are also interested in verifying your assumptions about the users. Understanding one is necessary to define the other. Some typical user-oriented questions that an exploratory study would attempt to answer might include the following:
- What do users conceive and think about using the product?
- Does the product's basic functionality have value to the user?
- How easily and successfully can users navigate?
- How easily do users make inferences about how to use this user interface, based on their previous experience?
- What type of prerequisite information does a person need to use the product?
- Which functions of the product are “walk up and use” and which will probably require either help or written documentation?
- How should the table of contents be organized to accommodate both novice and experienced users?
The importance of this type of early analysis and research cannot be over emphasized, for this is the point in time when critical design decisions set the stage for all that will follow. If the project begins with wrong assumptions and faulty premises about the user, the product is almost guaranteed to have usability problems later. Similarly to building a house, once you lay the foundation for one type of model, you cannot simply build a totally different model without first ripping out the existing framework. The underlying structure determines all that will follow.
Overview of the Methodology
Exploratory tests usually dictate extensive interaction between the participant and test moderator to establish the efficacy of preliminary design concepts. One way to answer very fundamental questions, similar to those listed previously, is to develop preliminary versions of the product's interface and/or its support materials for evaluation by representative users. For software, this would typically involve a prototype simulation or mockup of the product that represents its basic layout, organization of functions, and high-level operations. Even prior to a working prototype, one might use static screen representations or even paper drafts of screens. For hardware representations, one might use two-dimensional or three-dimensional foamcore, clay, or plastic models. For user support materials, one might provide very rough layouts of manuals, training materials, or help screens.
When developing a prototype, one need not represent the entire functionality of the product. Rather, one need only show enough functionality to address the particular test objective. For example, if you want to see how the user responds to the organization of your pull-down menus, you need only show the menus and one layer of options below. If the user proceeds deeper than the first layer, you might show a screen that reads, “Not yet implemented,” or something similar and ask what the participant was looking for or expecting next.
This type of prototype is referred to as a “horizontal representation,” since the user can move left or right but is limited in moving deeper. However, if your test objective requires seeing how well a user can move down several menu layers, you will need to prototype several functions “vertically,” so users can proceed deeper. You might achieve both objectives with a horizontal representation of all major functions, and a vertical representation of two of the functions.
During the test of such a prototype, the user would attempt to perform representative tasks. Or if it is too early to perform tasks, then the user can simply “walk through” or review the product and answer questions under the guidance of a test moderator. Or, in some cases, the user can even do both. The technique depends on the point in the development cycle and the sophistication of the mockups.
The testing process for an exploratory test is usually quite informal and almost a collaboration between participant and test moderator, with much interaction between the two. Because so much of what you need to know is cognitive in nature, an exploration of the user's thought process is vital. The test moderator and participant might explore the product together, with the test moderator conducting an almost ongoing interview or encouraging the participant to “think aloud” about his or her thought process as much as possible. Unlike later tests where there is much less interaction, the test moderator and participant can sit side by side as shown in Figure 3.2.
Ask participants for their ideas about how to improve confusing areas. Unlike later tests where there is more emphasis on measuring how well the user is able to perform by collecting quantitative data, here you strive to understand why the user performs as he or she does by collecting qualitative data. Regardless of whether you use a working prototype, static screens, early manuals, or whether the user performs tasks or simply “walks through” a product with the test moderator, the distinguishing feature of the exploratory test is its emphasis on discussion and examination of high-level concepts and thought processes, thus helping to form the final design.
Example of Exploratory Study
Because the nature of the exploratory test is often somewhat abstract, let's review how a typical exploration might proceed for a product, such as a web site. Assume that you are exploring the home page of a web site, which employs options in the left navigation, each revealing further choices when the user mouses over it. Assume also that this is a very early stage of development, so the user interface simply consists of a single screen without any underlying structure or connections. However, the navigation menus function, so the user can view the menu options underneath each menu heading, as shown in Figure 3.3.
Now let's look at Figure 3.4, which contains an excerpt of a test script for conducting an exploratory test, to see how the test might proceed. You might continue in this vein, having the user attempt to accomplish realistic tasks with much discussion about assumptions and thought process. Alternatively, though, if the web page is in such a preliminary stage that the navigation does not work, and you wanted to evaluate the effectiveness of the organization of the navigation, you might ask the user to simply point to the navigation label under which he or she would expect to accomplish a particular task, similarly to a paper-and-pencil evaluation. This approach would establish which tasks were harder to initiate and less intuitive.
Exploratory tests are often conducted as comparison tests, with different prototypes matched against each other. This prevents the project team from committing too early to one design, only to find out later that the design has serious flaws and liabilities. An example of this type of test is shown later in this chapter.
The important point of exploratory tests is that you can be extremely creative in simulating early versions of the product. Paper screens, prototypes with limited functionality, and so on all help to acquire important high-level information before the design is cast in concrete. It is never too early to learn how the user perceives the product and its fundamental presentation.
The benefits of using exploratory research to establish the soundness of high-level design prior to fleshing out all the details are innumerable. The time saved alone makes early research well worth doing. Explore very basic ideas and concepts as soon as you are able to simulate how they will work to users. Do not wait to take action until a very well thought-out, full-blown design takes shape.
The assessment test is probably the most typical type of usability test conducted. Of all the tests, it is probably the simplest and most straightforward for the novice usability professional to design and conduct. Assessment tests are conducted either early or midway into the product development cycle, usually after the fundamental or high-level design or organization of the product has been established.
The purpose of the assessment test is to expand the findings of the exploratory test by evaluating the usability of lower-level operations and aspects of the product. If the intent of the exploratory test is to work on the skeleton of the product, the assessment test begins to work on the meat and the flesh. Assuming that the basic conceptual model of the product is sound, this test seeks to examine and evaluate how effectively the concept has been implemented. Rather than just exploring the intuitiveness of a product, you are interested in seeing how well a user can actually perform full-blown realistic tasks and in identifying specific usability deficiencies in the product.
Overview of the Methodology
Often referred to as an information-gathering or evidence-gathering test, the methodology for an assessment test is a cross between the informal exploration of the exploratory test and the more tightly controlled measurement of the validation test. Unlike the exploratory test:
- The user will always perform tasks rather than simply walking through and commenting upon screens, pages, and so on.
- The test moderator will lessen his or her interaction with the participant because there is less emphasis on thought processes and more on actual behaviors.
- Quantitative measures will be collected.
The validation test, also referred to as the verification test, is usually conducted late in the development cycle and, as the name suggests, is intended to measure usability of a product against established benchmarks or, in the case of a verification test, to confirm that problems discovered earlier have been remedied and that new ones have not been introduced. Unlike the first two tests, which take place in the middle of a very active and ongoing design cycle, the validation test typically takes place much closer to the release of the product.
The objective of the validation test is to evaluate how the product compares to some predetermined usability standard or benchmark, either a project-related performance standard, an internal company or historical standard, or even a competitor's standard of performance. The intent is to establish that the product meets such a standard prior to release, and if it does not, to establish the reason(s) why. The standards usually originate from the usability objectives developed early in the project. These in turn come from previous usability tests, marketing surveys, interviews with users, or simply educated guesses by the development team.
Usability objectives are typically stated in terms of performance criteria, such as efficiency and effectiveness, or how well and how fast the user can perform various tasks and operations. Or the objectives can be stated in terms of preference criteria, such as achieving a particular ranking or rating from users. A verification test has a slightly different flavor. The objective here is to ensure that usability issues identified in earlier tests have been addressed and corrected appropriately.
It only makes sense then that the validation test itself can be used to initiate standards within the company for future products. Verification can accomplish the same thing. For example, if one establishes that a setup procedure for a software package works well and can be conducted within 5 minutes with no more than one error, it is important that future releases of the product perform to that standard or better. Products can then be designed with this benchmark as a target, so that usability does not degrade as more functions are added to future releases.
Another major objective of the validation test is to evaluate, sometimes for the first time, how all the components of a product work together in an end-to-end study. For example, how documentation, help, and software/hardware are integrated with each other, or all the steps in a longer process or workflow. The importance of an integrated validation test cannot be overstated. Because components are often developed in relative isolation from each other, it is not unusual that they do not work well together. It behooves an organization to discover this prior to release because, from the user's viewpoint, it is all one product and it is expected to perform that way.
Still another objective of the validation test, or really any test conducted very late in the development cycle, has become known in the trade as “disaster or catastrophe insurance.” At this late stage, management is most concerned with the risk of placing into the marketplace a new product that contains major flaws or that might require recall. If such a flaw is discovered, slipping the schedule may be preferable to recalling the product or having to send out “fixes” to every user. Even if there is no time to make changes before release, you are always at an advantage if you can anticipate a major deficiency in the product. There will be time to prepare a solution, train the support team, and even prepare public-relation responses. Even so, with all these advantages, there are companies that would rather not know about problems that exist in a product.
Overview of the Methodology
The validation test is conducted in similar fashion to the assessment test with three major exceptions.
- Prior to the test, benchmarks or standards for the tasks of the test are either developed or identified. This can be specific error or time measures, or as simple as eliminating the problems identified in earlier exploratory tests.
- Participants are given tasks to perform with either very little or no interaction with a test moderator. (And they are probably not asked to “think aloud.”)
- The collection of quantitative data is the central focus, although reasons for substandard performance are identified.
Because you are measuring user performance against a standard, you also need to determine beforehand how adherence to the standard will be measured, and what actions will be taken if the product does not meet its standards. For example, if the standard for a task addresses “time to complete,” must 70 percent of participants meet the standard, or will you simply compare the standard to the average score of all participants? Under what conditions will the product's schedule be postponed? Will there be time to retest those tasks that did not meet the standard? These are all questions that should be addressed and resolved prior to the test.
Compared to an assessment test, a validation test requires more emphasis on experimental rigor and consistency, because you are making important quantitative judgments about the product. Make sure that members of the design team have input and buy-in into developing the standards used during the test. That way they will not feel as if the standards were overly difficult or unattainable.
The comparison test is not associated with any specific point in the product development lifecycle. In the early stages, it can be used to compare several radically different interface styles via an exploratory test, to see which has the greatest potential with the proposed target population. Toward the middle of the lifecycle, a comparison test can be used to measure the effectiveness of a single element, such as whether pictorial buttons or textual buttons are preferred by users. Toward the end of the lifecycle, a comparison test can be used to see how the released product stacks up against a competitor's product.
The comparison test is the fourth type of test and can be used in conjunction with any of the other three tests. It is used to compare two or more designs, such as two different interface styles, or the current design of a manual with a proposed new design, or to compare your product with a competitor's. The comparison test is typically used to establish which design is easier to use or learn, or to better understand the advantages and disadvantages of different designs.
Overview of the Methodology
The basic methodology involves the side-by-side comparison of two or more clearly different designs. Performance data and preference data are collected for each alternative, and the results are compared. The comparison test can be conducted informally as an exploratory test, or it can be conducted as a tightly controlled classical experiment, with one group of participants serving as a control group and the other as the experimental group. The form used is dependent on your goals in testing. If conducted as a true experiment designed to acquire statistically valid results, the alternatives should vary along a single dimension—for example, keeping the content and functionality constant, but altering the visual design or the navigation scheme—and the expected results of the test should be formulated as a hypothesis.
If conducted less formally as a more observational, qualitative study, the alternatives may vary on many dimensions. One needs to ascertain why one alternative is favored over another, and which aspects of each design are favorable and unfavorable. Inevitably, when comparing one or more alternatives in this fashion, one discovers that there is no “winning” design per se. Rather, the best design turns out to be a combination of the alternatives, with the best aspects of each design used to form a hybrid design.
For exploratory comparison tests, experience has shown that the best results and the most creative solutions are obtained by including wildly differing alternatives, rather than very similar alternatives. This seems to work because:
- The design team is forced to stretch its conceptions of what will work rather than just continuing along in a predictable pattern. With the necessity for developing very different alternatives, the design team is forced to move away from predictable ways of thinking about the problem. Typically, this involves revisiting fundamental premises about an interface or documentation format that have been around for years. The result is often a design that redefines and improves the product in fundamental ways.
- During the test, the participant is forced to really consider and contemplate why one design is better and which aspects make it so. It is easier to compare alternatives that are very similar, but harder to compare very different ones. Why? Similar alternatives share the same framework and conceptual model, with only the lower-level operations working differently. Very different alternatives, however, are often based on different conceptual models of how each works and may challenge the user, especially one experienced with the product, to take stock of how the tasks are actually performed.
Now, having reviewed the basics of each type of test, let us explore how a series of tests might in fact work. Let's suppose that your company is developing a web-based software application and its associated documentation. The software is a personal information manager, consisting of calendar, contact, and task management functionality. You intend to conduct three usability tests at three different times in the product development lifecycle. Following is a hypothetical series of tests on this product throughout the lifecycle, complete with hypothetical outcomes at the end of each test. Understand the details have been greatly simplified to provide an overview of iterative design in action.
Test 1: Exploratory/Comparison Test
Two early prototypes of the interface have been developed (see Figures 3.5 and 3.6). The interfaces use the same underlying architecture, programming languages, and functionality, although the layout of their navigation is considerably different from each other.
The prototypes have very limited working functionality (e.g., about 30 to 40 percent of the proposed functions work). There is no documentation, but during the test, a technical expert will be available to reveal limited but crucial information needed to use the product. (See the gradual disclosure technique in Chapter 13 for an explanation of how to use a technical expert in this way.) Primitive help topics, available on paper only, will be provided to the participant on demand; that is, when the participant clicks the appropriate prompt or asks a question, the test moderator will provide what would normally be embedded assistance, instruction prompts, or messages on paper as they would appear on the screen.
Main Research Questions
- Which of the two interface styles/concepts is the most effective? In which is the user better able to remain oriented within the program?
- What are the best and worst features of each approach?
- What are the main stumbling blocks for the user?
- After some period of initial learning, which style has the greatest potential for the power user?
- For which tasks will users need help, further instructions, or supporting documentation?
- What types of written information will be required?
- Theoretical or conceptual
Brief Summary of Outcome
The test was conducted. As is typical of comparison tests at this point, there was no “winner” per se. Rather, the result was an interface with the best attributes of both prototypes. The navigation schema employing the navigation on the left was most efficient and effective, but some of the options available did not seem to belong with the others and so will remain in a navigation bar across the top of the main work area. Apparently, the options to remain in the top navigation are done less frequently.
There were many advanced features for use in a corporate setting that users needed additional information about. Because this personal information manager will be used throughout a large company, some functionality was added to support work group collaboration, which added complexity to the product. To remedy the complexity issue, the first line of defense is to develop a documentation set that includes, at minimum, a guide for setting up preferences, some self-paced training on interface operations, and a procedural user guide for more advanced, less frequent tasks.
Test 2: Assessment Test
Time has passed. A single prototype has now been expanded to approximately 60 to 80 percent of its eventual functionality. There are comprehensive help topics for working functions in a separate section of the web site. A first draft, of simplified documentation, on 8 1/2“ by 11“ bond paper is available for the test, with a table of contents, but no index.
Main Test Objectives
- Confirm whether the findings of the original test adequately match interface operations with the user's workflow.
- Expose all major usability deficiencies and their causes for the most common tasks.
- Determine if there is a seamless connection of help topics, embedded assistance, and messaging with the functionality and user interface. Does the software give support at the right moments? Is the help center organized in a way that answers participants' questions?
- Is the documentation being utilized as designed? Is it accessible? Are graphics understood and at the appropriate level of detail? Are certain sections not read at all? Are additional sections required? Is all terminology clear? Are there areas that require more explanation?
- Where do participants still have questions? What are their questions?
Brief Summary of Test Outcome
Many difficulties in operations were identified, but the users' workflow matched that employed by the design team for the product's interface operations. Essentially, the high-level interface “works,” and the lower-level details remain to be implemented and refined. The help information was accurate and helpful, but users rarely invoked it unless prompted. There was a strong preference for trial and error with this particular user audience. When users were prompted to try the help, it was found that the organization of the help topics needs to be extensively revamped and made more task-oriented. Even more theoretical, contextual information needs to be included for the most advanced users. This last issue turned out to be very controversial because designers felt it was not their responsibility to force particular operational approaches on corporate working groups. It is possible that an interactive primer for users may be required for infrequent but important tasks.
Test 3: Verification Test
Some weeks have passed. For this last test, a fully functional product with comprehensive help topics has been prepared. All sections of the documentation have been through one draft, with half of the sections undergoing a second draft. The documentation has a rough index for the test. A small “tour” for users about quarterly and semi-annual tasks was developed. For the major tasks of the product, specified measurable time and accuracy criteria have been developed. For example, one criterion reads:
Using the setup guide, a user will be able to correctly implement View and Network preferences within 10 minutes, with no more than two attempts required.
Unbelievably, and for only the first time in the recorded history of software development, there actually will be time to make minor modifications before release.
- Verify that 70 percent of participants can meet established successful completion criteria for each major task scenario.
(The 70 percent benchmark is something that Jeff has personally evolved toward over time, and that Dana has used effectively. It provides a reasonably challenging test while still leaving the design team some work to do before product release to move that number toward a more acceptable and traditional 95 percent benchmark. A benchmark of 100 percent is probably not realistic except for tasks involving danger or damage to the system or possible loss of life, and should never be used lightly. In the 1960s NASA found that achieving 100 percent performance cost as much as 50 times the cost of achieving 95 percent performance. It is likely that such costs have gone down over 40 years, but the point is that you should only use the higher benchmark if you are willing to pay the piper.)
- Identify any tasks and areas of the product that risk dire consequences (e.g., are unusable, contain destructive bugs) if the product is released as is.
- Identify all usability deficiencies and sources of those problems. Determine which deficiencies must be repaired before release and which, if there is not time within the schedule, can be implemented in the next release.
Brief Summary of Test Outcome
Every major task passed the 70 percent successful completion criteria with the exception of two. The team felt that the problems associated with those tasks could be corrected prior to release, and wanted to schedule a very quick test to confirm. Twenty recommendations from the test were identified for implementation prior to release, and at least fifteen recommendations were diverted to future releases.
Providing a “tour” of advanced features prior to the test proved to be a stroke of genius. Participants loved it, and some even insisted on taking it back to their current jobs. One user suggested the company market it or a longer virtual seminar as a separate product for customers, and that is already in the works.
The revamped organization of the user guide was much more in tune with users' expectations than the previous set, although the index proved difficult to use. More task-oriented items must be added to the index to improve accessibility.
As you can tell from this condensed series of tests, the product evolved over time and reflected each test's findings. We strongly advocate such an iterative approach, but again, do not be discouraged if you can manage only one test to begin. Now let's talk about what it takes to be a good test moderator.
Skills for Test Moderators
The role of the test moderator or test administrator is the most critical of all the test team members, presuming that you even have the luxury of a test team. In fact, the moderator is the one team member that you absolutely must have in order to conduct the test. The moderator is ultimately responsible for all preparations including test materials, participant arrangements, and coordination of the efforts of other members of the test team.
During the test, the moderator is responsible for all aspects of administration, including greeting the participant, collecting data, assisting and probing, and debriefing the participant. After the test, he or she needs to collate the day's data collection, meet with and debrief other team members, and ensure that the testing is tracking with the test objectives. If the usability test were an athletic contest, the moderator would be the captain of the team. As such, he or she has the potential to make or break the test. An ineffective moderator can seriously negate test results and even waste much of the preliminary preparation work. This chapter discusses several alternatives for acquiring test moderators from inside and outside your organization, as well as the desired characteristics of an effective test moderator. Chapter 9.1 includes guidelines for moderating test sessions, including information about when and how to intervene, and the advantages and disadvantages of using a “think-aloud” protocol.
Who Should Moderate?
One of the basic tenets of usability testing—and of this book—is that it is almost impossible to remain objective when conducting a usability test of your own product. There is simply too strong a tendency to lead participants in a direction that you want the results to go, rather than acting as a neutral enabler of the process. This is even true for experienced test moderators who conduct the test from an external control room. In fact, asking someone to test his or her own product is like asking parents to objectively evaluate the abilities of their child. It is an impossible endeavor.
Having said that, if there is only you available to test your product, do so. In almost every case, it is still better to test than not to test, even if you must do the testing yourself. However, for the long term, you would want to be out of the self-testing business as soon as possible.
Imagine that you want to conduct a test on a product for which you have primary responsibility, and if possible you would like someone less involved with the product to conduct the test. You can help develop the test materials, make arrangements, and select participants, but you need a more objective person to handle the actual test moderating. Suppose also that your organization currently has no in-house testing staff and does not plan to introduce one shortly. To whom should you look for help?
The following sources represent a number of areas from which you can find candidates who possess the requisite skills to conduct a test, or who could head up the beginnings of an internal testing group. They may or may not already be working on your product.
Human Factors Specialist
A human factors specialist is the most likely candidate to conduct a usability test. This type of person typically has an advanced degree in psychology, industrial engineering, or similar discipline, and is familiar with experimental methodology and test rigor. Just as important, the human factors specialist is grounded in the basics of information processing, cognitive psychology, and other disciplines related to the development of usable products, systems, and support materials. This grounding is crucial in differentiating the important from the superficial usability factors in a product and ultimately in designing and conducting the test.
With the current focus on usability engineering and testing, it is highly probable that human factors specialists within your organization are already involved with testing in one form or another.
A marketing specialist is typically customer-oriented, user-oriented, or both, has good interpersonal and communication skills, and would be very interested in improving the quality of products. This type of specialist may already be involved with your product, but usually not to the detailed level that would tend to disqualify him or her from conducting the testing.
Technical communicators, including technical writers and training specialists, often make wonderful test moderators. Many technical communicators already serve as user advocates on projects, and their profession requires them to think as a user in order to design, write, and present effective support materials.
Rotating Team Members
Let's suppose that no one from the disciplines listed previously is available to help on your project, and you are still determined not to test your own materials. Another alternative is to draw upon colleagues of similar disciplines, who are not working on the same product. An example of this approach is for technical communicators to test each other's manuals or for software engineers to test each other's program modules.
In such a scenario, the person whose product is being tested could help prepare many of the test materials and make the pretest arrangements, then turn over the actual moderating of the test to a colleague. One of the advantages of this approach is that two (or more) heads are better than one, and it is always beneficial to have someone other than yourself help prepare the test. The person acting as the test moderator would need time to become familiar with the specific product being tested and to prepare to test it in addition to the time required to actually moderator the test.
Should you decide to implement this approach, you must plan ahead in order to build the test into your mutual schedules. You cannot expect your colleague to drop everything he or she is working on to help you. Of course, you would reciprocate and serve as test moderator for your colleague's product.
Another option is to hire an external consultant. Many human factors, industrial design, market research, and usability engineering firms now offer usability testing as one of their services, including the use of their test laboratories. You may simply want to outsource the usability test to such a firm, or use such a firm to “kick off” a testing program in your organization.
Using an external consulting company guarantees the objectivity that testing requires. Even some organizations that employ internal human factors specialists to work on the design and development of products still outsource the testing work for the greater sense of impartiality it provides.
If you know your organization is committed to eventually forming a long-term testing program on site, then seek out a consulting company that will work with you to transfer the knowledge of testing into your organization. Even if you are unsure about the long-term prospects for testing in your company, it still might be easier to have outside help with an initial test. Just make sure that if you conduct the test off-site, its location is physically close enough to allow development team members to attend the test sessions. Do not simply farm out the test to a remote location. (Although, in a pinch, team members could observe tests from their remote locations via Morae, Camtasia, or other electronic monitoring tools.) Viewing tests in person is much more effective than watching or listening to a recording, especially for those who are skeptical about the value of testing.
Regardless of who conducts the test, either yourself or internal or external staff, and regardless of the background of that person, there are several key characteristics that the most effective test moderators share. These key characteristics are listed and described in the paragraphs that follow. If you are personally considering taking on the role of test moderator in your organization, use these key characteristics as a checklist of the skills you need to acquire. If you are considering using either an internal person or hiring an external person to perform this role, use these key characteristics to help evaluate the person's capabilities.
Grounding in the Basics of User-Centered Design
Grounding in the basics of human information processing, cognitive psychology, and user-centered design (essentially the domain of the human factors specialist) helps immensely because it enables the test moderator to sense, even before the test begins, which interactions, operations, messages, or instructions are liable to cause problems. Test moderators with this background have a knowledge of which problems can be generalized to the population at large and which are more trivial. This helps to ascertain when to probe further and what issues need to be explored thoroughly during the debriefing session. Additionally, this background can also prevent the need to test situations that are known to cause problems for users, such as the inappropriate use of color or the incorrect placing of a note in a manual. Lastly, a strong background in usability engineering helps the test moderator to focus on fixing the important issues after a test is complete.
An effective test moderator need not be expert in the intricacies of the specific product being tested. For example, if the product is a database management system, the moderator need not be an expert in database management. However, he or she must be able to absorb new concepts quickly and to integrate these concepts into his or her thinking and vocabulary. The moderator also needs to absorb all the peripheral issues surrounding a product, such as its positioning in the marketplace, competitors, and historical problems. During the test itself, the moderator must be able to understand the actions and comments of the participant quickly, as well as the implications behind those actions and comments. Being a quick learner enables the moderator to probe and question effectively.
Instant Rapport with Participants
Bringing in participants to evaluate your product is an auspicious and very opportune point in the development cycle that should not be squandered. If for some reason a participant is not at ease and is not able to function as he or she normally would, it represents a lost opportunity and potentially misleading results. If you are able to test only five participants, one uneasy participant represents a potential loss of 20 percent of your test data. The test moderator's ability to quickly size up each participant's personality, make friends, and put the person at ease is essential to getting the most from the testing process. Some participants need coddling, some need stroking, and some are businesslike and require a more formal approach. Whichever the case, the test moderator must make each person feel comfortable and secure.
Some might believe that because usability test sessions are recorded, the test moderator need not rely on memory for conducting and evaluating a test session. Actually, memory is called into play well before a test session has ended. Because a test session can be rather long, the test moderator needs to remember behaviors or comments that took place earlier in the session in order to cross-check and probe those behaviors later in the session. For example, a participant may attempt to perform the same task in two or three different ways, and the test moderator may want to probe to understand why the participant performed the task differently each time.
Memory is also required to recall the results of a test session after its completion. Because there is often very little time to devote to searching the videotapes after a test, except as insurance against missing some point entirely, the test moderator often must rely heavily on memory and notes.
Listening skills involve the test moderator's ability to hear with “new ears” during each session and to lay aside personal biases and strong opinions about what he or she is seeing and hearing. The test moderator needs to understand both the content and the implication of a participant's comments, as there are often mixed messages of all kinds during testing. The test moderator must pick up on the subtle nuances of speech and emphasis, as a participant's comments are often indirect and less than forthcoming. It is so important to understand the rationale behind the participant's behavior, because the rationale often signals whether a change in the product is required or not.
Comfortable with Ambiguity
Usability is not a precise science consisting of formulas and black and white answers. Even if a usability test is conducted under the most rigorous conditions, which is atypical, you are still not assured that all of the results are valid and generalizable to your entire user population. Instead, usability testing can often be an imprecise, ambiguous enterprise, with varying and sometimes conflicting observations, not surprising for any venture that has human beings as its focus. A test moderator, then, must understand and be comfortable with ambiguity.
For example, prior to testing you may think that there are only two ways to perform a particular task. During testing though, you discover that the participants have found four other ways to perform the same task. Or, you discover that you are no closer to a clear-cut resolution of a product's problems after a week of testing than you were before you began. Or, when testing multiple versions of a product, no clear winner emerges. The versions are all equally bad or, if you are lucky, equally good. These situations require patience, perseverance, and very often skill at negotiation. Without tolerance for ambiguity and the patience to persevere, the test moderator tends to rationalize and to blame the participants for making unplanned choices during the test.
Another related characteristic of an effective test moderator is flexibility, which has to do with knowing when to deviate from the test plan (we discuss this further in Chapter 9.1). There are times when a particular participant does not have the expected skills or simply views the task in a completely different way than was originally intended. Jeff conducted a test when the entire high-level design of an interface became questionable after testing only two participants. He could see immediately that the premise for the design was flawed. At that time, he recommended that the company halt testing and go back to the drawing board. To continue ferreting out minor problems with the product would have been a waste of everyone's time. While this is an extreme case, the point is that one needs to be prepared for the unexpected, even if that has serious consequences.
Long Attention Span
Experienced test moderators share a secret: Usability testing can be tedious and boring. There are long stretches when seemingly nothing is happening, when participants are reading and absorbing, thinking, and sometimes just resting. The moderator cannot possess the type of personality that needs new stimulation every 5 to 10 minutes. The moderator must be able to pay attention for long periods of time because there is no predicting when a gem of a discovery will arise during a test session. In addition, because the moderator may view up to 10, 15, or 20 sessions, all of which involve observing the same or similar tasks, the ability to stay focused is extremely vital.
Empathic “People Person”
Participants will relate more readily to a test moderator who is an empathic individual. This may not be all that critical during the test session itself, especially if the session requires little probing or exploration on the part of the test moderator. However, empathy can play a major part during the debriefing session when the test moderator is trying to elicit a participant's innermost thoughts and feelings about the previous two hours of work. Participants will tend to hold back if they feel that the test moderator cannot relate to their particular situation, this being especially true if the session was unusually frustrating or difficult.
“Big Picture” Thinker
There is so much data collected during a usability test, and there is so much data that could be collected during a test that it is very easy to lose sight of the forest for the trees. The test moderator must be able to weed out the significant from the insignificant, and this ability takes two concrete forms.
- The ability to draw together all of the various inputs, comments, and data from a single test to form a cohesive picture of a participant's performance.
- The ability to draw together the varied inputs from different test sessions and focus on the most important and critical findings.
It is very easy to get lost in the details and focus on trivial observations. It is also easy to be influenced by the last participant and forget all that has come before.
An effective test moderator, however, avoids these difficulties by staying focused on the big picture.
Good communication skills are essential in usability testing. The test moderator must communicate with individual members of the development team, participants, observers, and other individuals who may be helping to administer the test in one way or another. The test moderator must be skillful at persuading others to make changes that are in their best interest, and he or she must be able to explain the implications behind the test results. Good writing skills are also essential because the test report is often the sole means of communicating test results to those who did not attend the test sessions. The written report is also the important historical document that is relied upon months or years later to review or revisit the test results.
Good Organizer and Coordinator
A usability test is a project within a project. Even a simple test requires the management of an astonishing number of small details, events, and milestones. Ensuring that equipment is in running order, getting all participants to the site on time, and making sure that the product is ready for testing are ultimately the responsibility of the test moderator. In addition, the test moderator is the focal point for the other test team members, and must coordinate their activities as well as those of any outside consultants into a unified effort. Therefore, the test moderator should be a good organizer and coordinator.
Let's explore some things that the test moderator can do to enhance the process. One of the best things a test moderator can do is to develop increased sensitivity to the plight of the participants. What does it feel like to be in the testing hot seat? Figure 4.1 shows a hypothetical example of one participant's point of view compiled from many of the participants observed over the years. While it is written tongue in cheek, its point is not. Participants are often placed in awkward, stressful situations where they have little control over events. The more you put them at ease, the greater are your chances for accurate results that are applicable to real-world situations. You should become familiar with the codes of ethics of the Usability Professionals' Association and the Human Factors and Ergonomics Society. The links for each are available on the web site that accompanies this book (www.wiley.com/go/usabilitytesting.com).
Choose the Right Format
Keeping in mind the research questions you are working with, the location of the test sessions, the time you have available, and the types of participants involved, you must consider the best way to get data from the people you have brought in to help you evaluate the product.
Sit-By Sessions versus Observing from Elsewhere
When the practice of usability testing started, most sessions were conducted in a highly clinical style, with the moderator seated in a control room to observe and collect data. Over the years, a less formal, “sit-by” style of conducting sessions has developed for circumstances under which one would want to be near participants during usability testing, including most of the types of tests we describe in this book. For example, being with the participant in the testing room is advantageous for exploratory situations in which one wants to be able to gather the first impressions of participants about the design or functionality of a product. Doing so allows the moderator to be able to ask follow-up questions easily in a more cooperative session.
Additionally, if there is very quick interaction that must be observed closely or if you are conducting paper prototyping sessions, the moderator must be with the participant to mimic the computer actions.
However, if there is a chance in an assessment or validation test that the moderator's presence may bias the data, or if the emphasis is on gathering quantitative rather than qualitative data, set the test up to let the participant work by him- or herself while you manage the sessions from another room. For a deeper discussion of the physical arrangements for testing and observation rooms, see Chapter 6.1.
“Think-Aloud” Advantages and Disadvantages
Some participants will naturally verbalize what they are going through as they work through a usability test session. These people are gold; having a running commentary (something researchers call a “verbal protocol”) from participants as they “think aloud” while they perform tasks offers many insights to why a problem exists and how someone tries to work around it. Asking participants to think aloud during their sessions also reveals important clues about how they are thinking about the product or system they are using and whether the way it works matches up with the way it was designed.
Participants always filter to some extent, however, so they may consciously or unconsciously leave things out as they talk. Likewise, it is impossible to know everything that is going through a participants' mind during the session (it is of course likely that you don't want to know everything, just what pertains to your product, anyway). Thinking aloud can also help participants think through the design problem and form ideas for recovering. While this isn't always what they would do in real life, their doing so will give you ideas about how to remedy design problems.
One important reason to avoid asking participants to think aloud is when you are measuring time on tasks. Thinking aloud slows performance significantly.
An alternative to having participants think aloud is to replay the test with the participant after all of the tasks are done. As the moderator, you will have noted where during tasks the participant had issues, questions, or concerns. After the tasks are done, you can toggle to the point in the recordings where you want the participant to discuss what issues he or she was having, what the thought processes were, and how the workarounds were arrived at. Review can be a good tool to use with participants for whom talking and working might be difficult, such as small children, very old adults, and people with cognitive disabilities.
A serious drawback to performing a review is the time it takes to do. You have completed a full slate of tasks and now must spend more time reviewing the session from the beginning. It can take as much time as or more time than the main part of the session actually took. Some practitioners also avoid retrospective review because it gives some participants opportunity to revise and rationalize their behavior rather than simply reporting on what happened and why.
Give Participants Time to Work through Hindrances
Keeping in mind that the participant may be very nervous is the first step. Skillfully working with the person's sense of frustration is the next step. There will be times when the participant becomes exceedingly frustrated during the course of a test. When you see that this is occurring, recognize this as a critical point that can be advantageous. Many test moderators, at the first sign of user frustration, will immediately tell the participant to give up and go on to the next task. They do this in order to avoid confrontation, keep things on an even keel, and maintain the participant's interest. However, because the frustration is usually related to a critical deficiency in the product, moving on too quickly misses an important opportunity. The participant's behavior at this point can be more revealing than at any other time and can help the test team to understand how the participant learns to use the product. It is important to encourage the participant to continue rather than cutting the task short too quickly. The trick is to find just that point when the participant is frustrated but is still willing to try.
Offer Appropriate Encouragement
Another reason for encouraging the participant to keep trying is to show designers and developers watching the usability test the dire consequences of certain difficulties experienced by participants. It is important to let them see just how painful and frustrating the process of using their product can be. Actually seeing the participant struggle and get very frustrated and observing the serious consequences firsthand will do more to convince a designer to revise a product than weeks of discussions and negotiations. Sometimes just that extra amount of human struggle will convince a product designer that a change is needed and that the product should not be released as is.
So how should you encourage the participant? One way is to empathize with the participant and provide an end goal or end time frame. For example, you might say, “I can sense you're getting frustrated, but this is a particularly crucial part of the software/documentation, would you please try a little bit longer?”
You might try stating that other participants in the past have also had their share of difficulty. While this runs the risk of slightly biasing the participant, you may lose the participant in any case, if he or she has already experienced great difficulty. You might say, “I see you're having a difficult time with this. This isn't the first time that I've seen someone experience some difficulty here. Would you please continue on for five more minutes?”
The most skilled test moderator can encourage a participant to work with a smile. Make it seem like you and the participant are in this together, and that what is happening in the test is not a reflection of the participant's abilities. Often, the frustration level builds due to a sense of self-consciousness and a loss of self-esteem on the part of the participant. The test moderator can help immensely by deflecting the problem back to the product. For example, the test moderator might say, “Boy this sure is turning out to be a tough section. I really appreciate your efforts in trying to get through this.” Don't be afraid to encourage the participant to verbalize what's happening. Very often, if a participant is allowed to vent while performing, frustration can be minimized. It is up to the test moderator to gauge that fine line between rescuing the participant too early and pushing him or her beyond the point at which continuing the test is no longer possible. Through practice and experience, one can find the middle road between pushing too hard and giving up too early in order to get the best results.
Now that you have reviewed some of the important characteristics that a test moderator should possess, as well as some pointers for moderating the smoothest possible sessions with test participants, let's review some of the behaviors that test moderators should avoid. We describe the most common “errors” that test moderators make while conducting a test, many of which we have learned from experience. Consider it a list of “what not to do.” Even experienced test moderators can benefit from taking a few moments just prior to testing to review this list. As with the previous list of characteristics, you can use this list to evaluate and improve your own performance or to evaluate the performance of someone you hire to conduct usability testing for you.
Leading Rather Than Enabling
Behavior that leads rather than enables is usually caused by the test moderator being too close to the product and unintentionally providing cues to the participant about correct performance. The test moderator's tone of voice, a nod of the head, cutting short tasks when the participant struggles, even the type of question the test moderator asks can all influence the participant and the test results. This potential problem is the main reason why assessment and validation testing is often conducted from a control room, although subtle hints are still possible even from there. To avoid the problem of leading rather than enabling, remember that you are there to collect data and to ensure that the test environment remains as neutral as possible.
Too Involved with the Act of Data Collection
While the purpose of the test is to collect as much information as possible, the act of collecting that information should not interfere with direct observation of what is occurring. The test moderator needs to stay aware of what the participant is doing at all times, even if that means that every aspect of the participant's behavior is not written down. That's one purpose of recording the test, of developing coding categories, and of having others help with the more rote collection of such things as timings and number of references to an index or online help. All of these aids help ensure that the test moderator does not become so engrossed in the collection process that he or she misses important behaviors.
Acting Too Knowledgeable
This problem occurs when the test moderator and participant are in the same room during the test. Participants will tend to defer to the test moderator and ask many questions if they feel that the test moderator knows all the answers. Being too knowledgeable can also intimidate a participant who may be somewhat nervous and self-conscious about his or her abilities.
One simple way to counteract the problem of acting too knowledgeable is for the test moderator to “play dumb.” That is, the test moderator downplays any knowledge of the product and takes on the role of a research technician who is simply collecting data. Participants then change their entire demeanor when it becomes clear that they were not going to receive any assistance from the test moderator. They begin to try harder and to behave as if they were alone in their own home or office.
Too Rigid with the Test Plan
An experienced test moderator will know when to deviate from the test plan. It is important for the moderator to sense when the test design is not achieving its objectives and is not exposing the product's deficiencies to the fullest extent. At those times, it is up to the moderator to make the appropriate changes so that a participant's time and efforts are not wasted. Sometimes a participant with a different background than what was expected will appear. Sometimes the tasks are the wrong ones for addressing the research questions. Whatever the case, it is up to the moderator to revise the plan accordingly.
Not Relating Well to Each Participant
Participants come in all shapes, sizes, and demeanors. Regardless of whether a particular participant is shy, arrogant, moody, intimidated, self-conscious, or whatever, the test moderator needs to adjust his or her style in order to allow the participant to be comfortable and at ease. The test moderator should not get involved with battling, belittling, or in any way making a participant feel like anything but a guest. As far as the test moderator is concerned, the participant is always right.
Jumping to Conclusions
Inexperienced test moderators tend to overreact to early results. This can cause other members of the test team to act on the data prematurely. It is up to the test moderator to maintain a cool, steady demeanor and remind everyone to avoid forming conclusions until all of the results are in. One of the reasons for testing multiple participants is just for that purpose: to get a rounded, comprehensive view of the product through the eyes of different types of people with different types of backgrounds. While it is important for the test moderator to pick up patterns in the behavior of participants as early as possible, this does not necessarily mean reacting to that behavior. Avoiding premature conclusions helps to keep members of the test team from making major product changes before all the data is in.
Conducting a usability test is an extremely challenging and worthwhile endeavor on a variety of levels. On the most ordinary level, you are working very closely with people in an attempt to design a product for maximum utility and benefit. On a deeper level, it is a very profound experience that forces you to confront your own mind and its tendency to be biased, distracted, and flighty. Monitoring a test puts you on the spot and forces you to be extremely mindful and disciplined. You spend long periods of time maintaining concentration while observing people, all the time being as unobtrusive as possible. We both have found it to be delightful, frustrating, boring, and exhausting, and sometimes these feelings result from working with just the first participant.
If you are seriously considering acting as a test moderator on a regular basis in either some official or unofficial capacity, know that it can be a very rewarding and enlightening experience. Let's look at some ways for growing into this job.
Learn the Basic Principles of Human Factors/Ergonomics
Learn the basic principles of human information processing, experimental psychology, cognitive psychology, statistics, interface design, and usability engineering. Subscribe to and read the proceedings from the major societies (listed in Chapter 14.1). Attend seminars and study basic psychology courses. Many universities and community colleges offer certificate programs in usability testing or human factors. Attend conferences hosted by professional societies. (See Chapter 14.1 for a list.)
Learn from Watching Others
Watching other test moderators is a key to success. When you have an opportunity to watch an experienced test moderator at work, you get to see what works and what doesn't firsthand. If the opportunity presents itself, ask the test moderator why he or she uses a particular technique that caught your interest. Take notes about particular techniques, behaviors, and so forth that seem particularly effective and try them out yourself. Again, do not let your concern for making mistakes prevent you from exploring new techniques.
Watch Yourself on Tape
One of the benefits of recording your test sessions is that you have an ideal medium for reviewing your own performance. Take advantage of this technology by reviewing your sessions with the intent of improving your skills. Take notes on what you do well and on behaviors that need improvement. That way you will remember to work on those aspects the next time you moderate a test.
Work with a Mentor
Work closely with an experienced test moderator. Help the test moderator work on a test and have that moderator do the same for you. If it is a test with many participants, perhaps you can conduct some of the sessions. Have your mentor watch you and critique your performance. If you hire a consultant to help conduct a test, arrange for the consultant to work closely with you in a mentor/coaching relationship, so that you can learn faster than by just observing.
Practice Moderating Sessions
Start with the right attitude. Do not be a perfectionist. You are going to make mistakes, bias participants, reveal information you should not, and invent new ways to invalidate a test session's results. This is just par for the course. Usability testing has a twofold saving grace—testing multiple participants and iterative design. Testing multiple participants means that if you invalidate something in one session, there is always another opportunity to do it right. Iterative design also makes up for any mistakes you might make, because you have several chances throughout the product development lifecycle to catch problems with the product. The important thing is not to get discouraged. Continue to practice, continue to learn, continue to improve. Even the most experienced test moderators make mistakes.
Learn to Meditate
Meditation practice, specifically the type of meditation that fosters mindfulness and awareness, can be a valuable aid in learning to see clearly and in observing subtle nuances in behavior. This type of discipline is based on the belief that to understand another's mind, you first have to master your own.
Meditation practice or mindfulness training involves setting aside a period of time to sit down on a cushion and practice a simple breathing technique, while at the same time acknowledging thoughts that arise and letting them be. Over time, the result of this practice is a very personal and heartfelt recognition of how everything we perceive is filtered and biased by our version of things. Through continual practice, one's thoughts become more transparent, which in turn frees one to perceive more clearly and directly. During a test session, this is exactly what an excellent test moderator attempts to do; observe the participant's behavior free from the tyranny of his or her own expectations and biases.
Don't take up meditation strictly to become a better test moderator; that would be missing the point. However, if you are already inclined toward a discipline to quiet the mind and gain a clearer perception, meditation practice is a natural complement to the testing discipline.
Practice “Bare Attention”
“Bare attention” practice is an adjunct to meditation practice, except that it is done within one's normal daily routine. Practicing “bare attention” can heighten your ability to concentrate during test sessions. To practice “bare attention,” set aside a period of time (15–30 minutes is more than enough to begin) when you intentionally and very deliberately heighten your awareness of whatever you happen to be doing and of your surroundings. For example, if you are working at a computer, experience very deliberately the sense of your fingers hitting the keys, of your eyes looking from the paper to the screen, of your thought process. Notice when (and how often) your mind wanders from what you are doing, and when it does, gently bring it back to the present task at hand. The intent is to stay in the present moment 100 percent of the time. Try it sometime just to see how difficult it is. This practice, as with the previously described meditation practice, helps to foster mindfulness and awareness.