CONDUCTING AN ACCESSIBILITY EVALUATION

STEP 1: Definition of Test Objectives

The first step in evaluating accessibility is to clearly define the test objectives. Test objectives are statements of intent identifying the particular goals of the assessment. Each objective should be specific enough to be obtainable. A well-designed test objective might be of the form "Evaluate blind user task performance when using the front control panel on the Model 123 Facsimile Machine". In this example, the focus of the evaluation (task performance), the specific component to be evaluated and the population or impairment type being considered is clearly identified. User populations and the components of the system under test (SUT) that are of interest should be considered in the generation of the test objectives.

Once the test objectives have been identified, the evaluator must decide how to address each of the test objectives. In some cases, a test objective may be met entirely by performing a subset of the items found in the checklist. Other cases may require user-in-the-loop testing or a combination of a checklist evaluation and user-in-the-loop testing. Development of a Method of Test document (STEP 3) is an effective way of documenting how each test objective will be addressed.

STEP 2: Performing the Task Analysis

Identification of Tasks

The first step in the task analysis is to identify the tasks. There are three potential sources of information that the analyst may use to identify tasks. The first source is to observe users as they use the device (or a comparable device). Users perform tasks with a device because of the requirements of their jobs - and thus it is of benefit to understand the jobs of the users. The second source of information is from technical documentation associated with the device, such as user manuals and training materials. These documents may provide the information on tasks not readily observed, such as tasks performed only during initial setup of the device or in response to some malfunction. These documents may also provide information on alternative methods of performing tasks, beyond what current users may be observed doing.

The third source of information about user tasks is to develop design reference scenarios that describe how users are envisioned to use the device in various circumstances. This source is particularly important when analyzing a new product or addition of new interfaces to an existing product. In such cases it is not possible to observe users interacting with the new device, and technical documentation may still be immature. A design reference scenario is simply a narrative of the key elements in the work environment, and the usage of the device in that environment. A typical set of design reference scenarios would include perhaps 8-10 scenarios, each differing in the circumstances or aspects of functionality that are used.

Prioritization of Tasks

A prioritization of the task list should be developed based on an estimate of the essential or core features of the device, versus advanced features and features related to device set-up and maintenance. Priority may be divided into three levels. Priority 1 tasks must be able to be successfully completed irrespective of impairment in order for the product to be usable for all users. An inability to perform a priority 1 task because of an impairment would likely severely limit the accessibility of the product under evaluation for users with that impairment. Priority 2 tasks are secondary tasks that may be performed on an occasional basis to access advanced functionality. The inability to perform a priority 2 task because of an impairment, while not critical to the basic use of the product, may negate the value of advanced features of the product. Priority 3 tasks are tertiary tasks that are not necessarily performed by all users of the device, but must be performable by some operator on occasion. These tasks include initial setup tasks that are not ordinarily repeated, major troubleshooting tasks, and major maintenance tasks that users are expected to perform, albeit infrequently. The inability to perform a priority 3 task because of an impairment would not affect the basic accessibility of the product unless the product is to be used by a single user (or group of users all sharing the same impairment).

Maintenance tasks may be associated with any priority level. Some routine maintenance tasks that any user might perform, such as loading paper, are judged to be priority level 1. Maintenance tasks that are performed rarely or only by specially-trained users, however, are judged to be priority level 3.

Examples of typical Priority 1, 2, and 3 tasks for various devices are shown below. These tasks are typical for many situations, but actual task priority is driven by the combination of intended usage and specific user requirements, not the general type of device. For example, for most mobile phones, monitoring air time usage is probably a Priority-2 task. However, for a pre-paid mobile phone this task might be a Priority 1 task.

Task Priority Office Copier Mobile Telephone PDA
Priority 1(Core functionality of the device) Power-up copier. Make one copy of a single page. Copy pages from a book. Make multiple 2-sided copies of a 2-sided original. Add paper. Power-on phone. Place a call. Receive a call. Power-off phone. Place phone on recharger. Start PDA. Check daily schedule. Retrieve a phone number and address. Use calculator functions. Add a new appointment.
Priority 2(Advanced functionality needed on occasion) Make stapled copies. Make enlarged copies. Monitor airtime usage. Edit database of stored numbers. Speed dial. Access wireless websites. Synchronize PDA and PC. Install new applications.
Priority 3(Tertiary functionality performed by expert user, or functions not related to primary use.) Assemble copier and attachments. Setup user accounts. Clear major paper jams. Replace toner cartridges. Initialize phone for service. Enter personalized banner. Play built-in games. Clear call logs. Replace batteries. Upgrade BIOS. Reformat storage media.

STEP 3: Development of the Method of Test

A Method of Test document (MOT) should be created to document the plan for addressing the test objectives. The MOT document also outlines resources that will be required to accomplish those objectives and serves as a coordination document between the various parties involved in the evaluation. There are three main sections of the MOT.

The first section of the MOT should identify the reason for performing the test and the system that is being tested. Evaluators should pay careful attention to documenting the configuration of the SUT. Any additional software or other modifications above the baseline configuration of the SUT should be clearly noted. In addition, each test objective should be described and a summary of planned test sessions should be outlined. The second section should describe, in considerable detail, the methods that will be used in the performance of the test. The following information should be provided for each test objective:

Purpose or Rationale of the Test The purpose of performing the test should be documented in this section. In addition, evaluators should document how the evaluation will support the activity that prompted the evaluation. For example, if the evaluation was designed to evaluate accessibility in light of Section 508 requirements, then the rationale should describe how the test will be used to illustrate or document compliance with Section 508.
Subject Requirements The number and type of subjects that is required by the test should be described. For example, a test objective that involves user-in-the-loop testing should describe the specific functional limitations that are required to participate in the test. Care should be taken to choose subjects that represent 1) the user population that is expected to interact with the SUT and 2) the variety of functional limitations that the test is designed to consider. For example, when studying users with low vision, it is desirable to consider several different levels of general impairment as well as specific impairments, such as central field degeneration, in order to evaluate a more complete range of impairments.
Resource Required A section should be devoted to the identification of resources that may be required to perform the evaluation. Examples of required resources might include equipment or supplies necessary to perform the test, such as:
  • Blank videotapes
  • Task scenario descriptions
  • SUT training materials
Required resources might also include specialized equipment that is instrumental to the test being performed, such as:
  • Audio measuring equipment
  • Force meters
  • Assistive technologies
Procedure A step-by-step documentation of the procedure should be included in the MOT. The procedure section should document participant training, the particular tasks that the participant will perform, and any debriefing activities performed as a part of the test.
Performance Measures A detailed description of the data that will be recorded should be provided. A performance measure can include task durations, error rates, task completion frequencies, and subjective impressions.
Data Reduction and Analysis This section should describe how the data will be interpreted. If any statistical analyses are planned, a description of each analysis should be included.
Criteria The criteria section documents how the evaluator will determine if a test objective has been met. Definitions of important terms, such as compliant or acceptable, should be defined in this section.
Technical Risk This section describes the factors that could prevent a test objective from being accomplished, and what will be done to mitigate the risk.

The following is an outline of a Method of Test document:

  1. Introduction
    1. Background
    2. Identification of the System Under Test (SUT)
    3. Test Objectives
    4. Summary of Planned Test Sessions
  2. Methods
    1. Objective 1
      1. Purpose (or Rationale)
      2. Subjects (number and type)
      3. Resources Required (includes instrumentation, supporting materials, etc)
      4. Procedure (describes the order of events, including tasks to be performed by the subjects. May reference a separate Test Procedure document.)
      5. Performance Measures
      6. Data Reduction and Analysis
      7. Criteria
      8. Technical Risk
    2. Objective 2 (repeat sub-sections a-h for all objectives)
  3. Summary of Requirements
    1. Summary of Subjects Required
    2. Summary of Instrumentation / Facility Requirements
    3. Summary of Data Requirements

STEP 4: Performing the Checklist Evaluation

A qualified human factors engineer or accessibility professional should perform the checklist evaluation. The evaluator should select the items from the checklist in Appendix A that apply to the SUT. Additional items may be added to the checklist as appropriate. The evaluator should consider each of the selected guidelines and record a pass or fail rating for each guideline. If the evaluator determines that the SUT fails a specific guideline or standard, the evaluator should record the observations that led them to this conclusion. A second analyst should review the results of the evaluation.

STEP 5: Performing the User-in-the-loop Evaluation

Selection of Evaluators. The evaluators should be properly trained in overall evaluation methodology, the specific protocol for a given test, and the special concerns that arise when dealing with users with impairments. The evaluator should be very well acquainted with the SUT and is thus able to recognize and if possible rectify unusual problems that may arise, irrespective of whether the problem is a consequence of a participant's behavior. For example, if the SUT is a photocopier, the evaluator should know how to clear paper jams, how to restore the system to its proper configuration for the test, and whether the machine is malfunctioning in a way that will render the test invalid. Note that this means that the evaluator must have a considerable amount of practice using the SUT.

Facility Preparation. The evaluation facility should be configured according to the requirements of the SUT and the intended user population that will be participating in the evaluation. For example, if users are expected to utilize assistive technologies during the course of the test, the device should be configured to facilitate the use of specialized equipment. When testing software or hardware devices in an integrated network environment, assistive technology software, such as screen readers or voice recognition software, should be installed and configured prior to performing the test. In addition, proper accommodations must be provided for participants with disabilities. These accommodations include the physical space and any functional accommodations that are appropriate for the impairments of the user. For example, access to the building and maneuvering space around the SUT should be considered if the intended user population includes persons who use wheelchairs. If required, space should also be provided for personal assistants or interpreters.

In general, observers should not be present in the evaluation room while accessibility evaluations are taking place, unless such observers are playing a direct role in the evaluation protocol (e.g., watching for certain errors). Customer personnel and other interested parties may be accommodated by allowing them to observe the video feeds from the evaluation room, or if only one or two observers are to be accommodated, they may observe from the evaluator's control station area.

Video cameras should be positioned so as to give a good view of the subject and a good view of the SUT during the test. The audio recording should be made for the purpose of crosschecking key event reports. The video recording should have a time of day counter, including seconds, that can be used to cross-check task performance times.

Selection of Participants. Accessibility evaluations should be conducted using participants who are properly trained and representative of the user population of interest. Although ease of access to users is always a consideration, care should be taken to ensure that the "user population of interest" is not defined based on the types of impairments that happen to be conveniently present. In general, it is best to test each participant separately. Evaluation sessions should be scheduled so that the pace of the session is not hurried, allowing participants to take breaks as desired.

Data Collection. The evaluator should record any pertinent observations that are made during the test session. For example, if the participant makes a mistake that is obvious to the evaluator but is not part of a participant comment, then the evaluator should make a note of the occurrence. The evaluator should also make a note of any system malfunction or other event that might influence the interpretation of the test results. For example, if the participant was inadvertently interrupted during the performance of the test then the evaluator should note the interruption.

Important events that are not necessarily verbalized should also be recorded. These events might include the beginning and end of task sequences, the number and type of errors, or a record of tasks that are successfully completed.

The evaluation should be videotaped to support post-hoc data analysis. In general, the tapes should not be made available to anyone else (e.g., for publicity or training). Evaluators should obtain the participants' permission to videotape in advance, and the participant should understand the purpose of the videotaping.

The thinking-aloud approach to usability testing, while popular, is not particularly well suited to summative evaluations because it may create unnatural task performance demands that affect the validity of task performance measures (not to mention that thinking-aloud data is of questionable validity). The thinking-aloud approach also makes it difficult to collect data when testing individuals with speech impairments. If we do use a thinking-aloud procedure, we will make sure our participants get to practice the procedure ahead of time and/or observe someone else practicing the procedure.

As an alternative to continuous thinking aloud, which emphasizes the stream of consciousness of the participant; the key-event reporting method should be used. In this method certain key events that are of interest to the evaluation are identified in advance. The evaluator should brief the participant about the key events and ask the subject to report them when they occur. Examples of key events that are often of interest are as follows:

  1. "I can't find X"
  2. "I can't figure out how to do Y"
  3. "I didn't expect that to happen"
  4. "I see that I have made an error" or "I didn't mean to do that"
  5. "I don't know why that happened"
  6. "I don't know what to do next"

When the tasks associated with a given component of the SUT have been completed (or at whatever time specified by the test procedure), a user rating of accessibility should be collected. The rating should ordinarily be made using a Likert-type scale with an even number of rating points, unless there is a clear need to have a neutral point in the scale. A four-point scale is preferred unless there is a strong reason to use more scale points. Anchors are along the lines of:

  1. = Completely unacceptable
  2. = Marginally unacceptable
  3. = Marginally acceptable
  4. = Completely acceptable

An end-of-session debriefing in which comments and suggestions are explicitly solicited should be conducted. All serious comments offered by the participant during this debrief should be fully documented. Ad hoc comments made by the participant in earlier stages of the evaluation session, but not repeated by the participant during the debriefing, should be documented if the evaluator judges them to be important and germane to the accessibility of the product.

Pilot Testing. Pilot testing of the test methods and evaluation instruments should be conducted before proceeding. Documents, methods, and other materials should be updated as appropriate in accordance with the pilot testing results. Internal pilot testing need not be performed with actual or simulated impairments, although it is beneficial to do so when possible. When possible, external pilot testing should be conducted with users who have some impairment of interest and are not part of the evaluation team. Again, documents, methods, and other materials should be refined as appropriate in accordance with the pilot testing results.

Test Procedure. Proper informed consent must be obtained before proceeding with any aspect of the test. The participants' rights to discontinue participation at any time should be respected. Care should be given to ensure that each participant is fully capable of providing informed consent. Obtaining proper informed consent is especially critical when working with users with certain cognitive impairments. Evaluators should consult with their Institute Review Board's guidelines for obtaining informed consent from users with disabilities.

A test session should begin with participant orientation and training. The nature of the training is largely determined by the extent to which the SUT's learnability or intuitiveness is of interest. If the SUT's learnability or intuitiveness is not of interest participants should be provided detailed training on the use of the SUT. The evaluator should point out each of the components that will be evaluated during the test session. Training should also include instruction and practice on key event reporting. The evaluator should also inform the participant about the purpose of the test and how the data collected during the test will be used. If an assistant (e.g., a sign-language interpreter), accompanies the participant, the individual giving the assistance should also be properly briefed about the procedures to be followed.

The evaluator should be particularly sensitive to the fact that a given SUT may not be very usable for all users with disabilities, and that this could lead to some degree of frustration on the user's part. Evaluators should be prepared to intervene and move on to the next task to avoid unnecessary frustration.

In general, the test session should be conducted with the evaluator directing the participant in the performance of the tasks. The evaluator should monitor participant performance and be prepared to intervene when necessary. However, the evaluator may be in a separate observation room if this arrangement is more appropriate for the specific procedures used.

The basic unit or component of an accessibility test is a task scenario (or task sequence). Task scenarios are operationally realistic arrangements of tasks. A series (approximately 8-10) of scenarios should be developed that exercises the tasks identified in the task analysis. Repetition of key tasks is encouraged. When developing the scenarios, the operator should be mindful that every component of interest on the SUT is covered. For example, the task scenarios should cover maintenance tasks in addition to the more routine tasks. Some of the task conditions may have to be simulated. For example, when evaluating a printer it is important to evaluate the task of clearing a paper jam. The paper jam task can be simulated by instructing the participant to go through all of the steps involved in clearing a paper jam without actually having to induce a paper jam.

The evaluation should begin with the evaluator telling the participant about the scenario that is going to be performed. For example, if the purpose of the test was to evaluate keypad of a cellular phone the evaluator might explain to the participant that they will be retrieving a voice mail message and then returning an urgent phone call. Once the background for the scenario has been described, the evaluator should guide the participant through the scenario as required. There may be a tendency for an evaluator to wish to assist the participant in the completion of a task. Care should be exercised to ensure that any assistance offered by the evaluator does not unduly affect the outcome of the evaluation.

At the beginning of a task, there is a clear "start" event - which can be generated by the evaluator, by the participant, or by the SUT. The evaluator or an observer should use a stopwatch to measure task performance time. The end of the task is similarly identified by a clear event, which can be an SUT outcome, a participant-generated event, or an evaluator-generated event.

The final activity in the test session should be a debriefing of the participant. The debriefing should include opportunity for the subject to share any comments on the SUT and/or the test methods. If possible and appropriate, the debrief should include a quick review of the session, including a summary of the subject's performance (e.g., "You were able to complete 8 of the 10 tasks. The two tasks you did not complete were binding the notebooks and folding the 8 x 14 pullouts".) This review may help prompt the participant to make insightful comments.

Data Analysis. Data analysis involves the computation of task durations and the tabulations of errors. Each error should be documented and categorized. A summary tabulation of errors by error type should be generated. Analysis of data that involves interpretation of user comments should be cross-checked between analysts.

STEP 6: Documenting the Results

The checklist evaluation and the user-in-the-loop test reports should be organized by test objective and should make clear statements of outcomes. In general, computation of accessibility scores or percentage compliance ratings from the test data is not advised because of the potential for over-simplification of the test results. The test report should contain summaries of the checklist and the user-in-the-loop testing. The checklist should be documented with clear pass/fail ratings associated with each guideline or standard. In addition, evaluator comments should also be documented. All relevant observations, comments and measurements should be documented for the user-in-the-loop testing. Details of the functional capabilities and limitations of each of the participants should be specified in the report. The test report should also contain a summary of the test results organized by SUT component.

A separate analysis portion of the test document may contain a projection of task performance based on the checklist results and the user-in-the-loop testing. The assessment of task performance would allow consumers of the test results to determine the likelihood that a user with a given impairment could perform a given task.



Previous Page | Next Page | Back to Table of Contents