Parallel Design and Testing by Jakob Nielsen and Jan Maurits Faber

Summary: In parallel design multiple designers independently of each other design suggest user interfaces. These interfaces are then merged to a unified design. In a case study, measured usability from version 1 to version 2 was improved by 18% when using traditional iterative design and by 70% when using parallel design.

Paper by Jakob Nielsen and Jan Maurits Faber
Originally published in IEEE Computer Vol. 29 , No. 2 (February 1996), pp. 29-35.

Introduction

Companies are shortening software project schedules to introduce products more rapidly than competitors. At the same time, customers are demanding higher usability. Unfortunately, the goals of increased usability and decreased development time conflict with traditional usability engineering approaches.

The design process always requires several rounds where interfaces confront users' needs and capabilities and are modified accordingly. This approach is called iterative design. ^{[Ref. 1]} Our experience indicates the need for at least two iterations, yielding three versions, before the product is good enough for release. However, three or more iterations are better.

Unfortunately, testing and redesigning take time, thus delaying product release. Because major delays are intolerable, much effort has gone into improving user interface design efficiency, prototyping, and evaluation. This has led to a cheaper and faster approach called discount usability engineering. ^{[Ref. 2]} However, even discount usability engineering relies on traditional linear iteration. To yield final designs faster, we want parts of the usability engineering life cycle to take place at the same time, in a process we call parallel user interface design.

Parallel Design vs. Iterative Design
Case Study
Parallel Design Stage
Merged Design Stage
Comparing Test Results
Independent Iterative Design
Cost Accounting
Diversified Parallel Design
Deviations From Recommended Practice
Conclusion
Acknowledgments
References

Parallel Design vs. Iterative Design

Figure 1 shows the parallel design project model. Both iterative and parallel design are based on a version 0 concept, which describes generally what the interface is supposed to do. Parallel design speeds up time-to-market by exploring design alternatives simultaneously. The initial parallel design versions, denoted 1.a, 1.b, 1.c, and 1.d in Figure 1, do not have to be fully implemented. Merged version 2, based on each designer's best ideas, is the first to be implemented and tested.

Parallel design process, starting at the bottom and working up for improved designs

Figure 1. Recommended parallel design project model. Note that this article's case study followed Figure 3's slightly different project model.

Each merged design element can be the best of the parallel versions' corresponding elements or a synthesis of several parallel versions' elements. The merging designer can also introduce new elements, although it is probably better to postpone the introduction of most new elements until the iterative part of the parallel process, which begins with version 3.

A weakness of parallel design is the waste of resources when several designers do the same work, even though some design ideas will not be used. Because of this, we do not recommend final polishing of parallel design versions. A skilled designer can often produce reasonable design descriptions in less than a day. For very large and complicated systems, more time is needed. But it may be possible to concentrate initial parallel design efforts on a subset of the total system to explore alternatives without fully designing the interface.

Parallel design's basic trade-off is that more staff work is invested up front in order to spend less time exploring design revisions. Therefore, parallel design is best suited for projects where reduced time-to-market is essential and makes the up-front investment acceptable. For projects where cost savings are more important than early release, traditional iterative design is preferable.

Some readers may object to parallel design, calling it design by committee. Indeed, as far back as the year 959, the Viking Olaf the Peacock said, "I want only the shrewdest to decide; in my opinion the council of fools is all the more dangerous the more of them there are." ^{[Ref. 3]} To make parallel design work best, designers should work independently and promote their ideas without the type of compromises that lead to the camel that is a horse designed by committee. The merged design can draw upon their best ideas but should not be beholden to any designer nor necessarily include ideas from all designs.

Case Study

In our previous projects, we found that parallel designs were excellent for exploring design options. ^{[Ref. 4]} However, we did not collect metrics for the resulting designs' usability or the method's impact on shipment schedule or on the need for future revisions. Therefore, we conducted a study that implemented all designs and subjected them to traditional, controlled-usability measurements, even though this involved considerable resource expenditure that could be justified only for research purposes.

The case study concerned screen-based user interfaces to advanced telephone services like call forwarding, where incoming calls are routed to another telephone, and call waiting, where you are notified if somebody calls while you are on the line. Instead of using a screen-based telephone, we simulated this type of telephone on a personal computer, as shown in Figure 2. Using simulation was more flexible than using the telephone network and allowed faster implementation of alternative designs.

Picture of a smartphone

Figure 2. Software mock-up of a screen-based advanced telephone service interface. The user would operate this interface by clicking on button images with a mouse. In this example, the user has selected call forwarding from the main menu and is entering the number to which calls should be forwarded.

Our software mock-up was patterned after realistic telephone hardware, using a small set of buttons and an alphanumeric screen with 16 40-character lines. Of course, test users could not place calls using the simulated telephone, but that was not a problem because our project concerned only features used to dial and set up calls, not the calls themselves. Also, we could easily use a computer to generate a log file with exact user-action timings.

For our parallel design project, we first defined a functional specification for the interface, which was easy, as our functionality was that offered by the telephone network. For most projects, though, task analysis and customer field visits would be needed for the specification stage. We defined a set of 16 user test tasks. One task was, "You are waiting for an important business call from 758-2307 or 758-2257. However, you are going to your neighbor's house for a while. Arrange it so that calls from either business number will go to your neighbor's house, but other calls will ring at your house. Your neighbor's phone number is 201-4232." We defined measured usability to be a function of how well our test users performed these tasks.

Parallel Design Stage

Figure 3 shows our project model. Four user-interface designers took part in our parallel design stage. We chose this number of designers based on earlier experience with projects where three designers were used with great success ^{[Ref. 4]} but where more designers would have improved results. We do not claim that four is the optimal number of designers for all projects.

Design process for this research study (not recommended for commercial projects).

Figure 3. The project model used in the case study. For practical projects, we recommend the model in Figure 1.

Our four designers worked independently and had access only to their own designs. Each designer was first given the desired functionality's specification and the equipment's limitations. Each then designed a user interface and described it to a research assistant, who implemented it. To save time, the designers provided their specifications in a mixture of natural language and hand-drawn screen layouts. To reduce the possibility that the research assistant misinterpreted the designer's intentions, the assistant showed the designer the interface's running version. Designers were asked not to engage in iterative design at this stage but only to identify and correct disparities between the implemented design and their original intentions.

Four different version one designs

Figure 4. The initial screens from each of the four parallel designs. Top: versions 1.a and 1.b. Bottom: versions 1.c and 1.d. For the user tests, each screen was displayed as part of the simulated telephone interface in Figure 2.

User Testing

Each version was tested with 10 people who were timed performing the 16 test tasks. To keep performance from being affected by skills learned during earlier tests, rather than version usability, no person was used more than once. The designers were then given only their own test results and were told to modify their designs. Thus, their redesigns, versions 2.a-d, correspond to traditional iterative design and were produced only to compare iterative and parallel design.

We measured the time it took to perform the 16 test tasks and the number of times the user completed a task incorrectly and was told to try again. Task time is the classic usability metric. The number of errors was especially important to us because people generally use telephones without assistance.

To assess relative usability changes, we normalized the raw test results so that the overall usability score derived from the average task time and average error rate for all four designs would be 100. Improvements yielded higher scores. Overall usability was calculated as the geometric mean of the normalized scores for task time and number of errors, because this emphasized improved and reduced usability equally.

Test results for the initial parallel versions and their iterative design revisions are shown in Table 1. For the four designs, the usability improvement after one iteration ranged from 12 to 29 percent and averaged 18 percent. This was less than expected from our earlier survey, ^{[Ref. 1]} where iterative design improvement averaged 38 percent per iteration. However, the results were still within expected values, given that two projects in the earlier survey had improvements of 17 and 19 percent per iteration.

**Table 1.** *Measured usability for the four initial parallel designs and for the redesigned versions produced by the initial designers.*
Version	Task time in minutes	Normalized task time	Errors per task	Normalized errors	Overall usability
1.a	27	106	1.4	176	136
1.b	41	70	3.3	75	73
1.c	25	116	2.8	88	101
1.d	26	110	2.7	91	100
1 average	29	99	2.4	101	100
2.a	24	118	1.2	205	156
2.b	35	82	2.3	107	94
2.c	25	116	2.0	123	119
2.d	30	97	1.9	129	112
2 average	28	102	1.8	137	118

Figure 5 shows the measured usability of the original designs and their respective redesigns connected by arrows. The longer arrows indicate greater improvements. We used a logarithmic scale to give equal prominence to relative improvements, the most reasonable indication of redesign skill. Thus, all redesigns that reduced user errors by the same percentage would have equally long arrows.

Improvements in measured usability from one round to the next
Figure 5. Diagram showing the various designs' measured usability. Each circle indicates one design, and the placement of the circle shows that design's task time (how fast users were at using it) on the x-axis and its error rate on the y-axis (how many errors people made). Arrows connect iterative revisions of a design from one version to the next. Merged versions 2.x and 2.y were based on all four initial parallel versions, 1.a-d.

Optimal usability is in Figure 5's lower left corner, and most of the arrows point there, indicating task time and error rate improvements. The main exception is version 2.d, where the error rate dropped substantially but the task time increased slightly, because the design's help system was too extensive. Figure 5 shows it was easier to reduce user error rates than to speed up task performance, which is consistent with our experience from other development projects.

Merged Design Stage

Versions 2.x and 2.y represent the unifying step of the parallel design process. They are shown in Figure 5 with their subsequent iterative redesigns, versions 3.x and 3.y, and in Figure 6.

Figure 6. Merged designs based on the initial parallel versions shown in Figure 4. Left: version 2.x, produced by a designer who was shown the initial designs but not their user test results. Right: version 2.y, produced by a designer who was shown the initial designs and their user test results.

The designers of these merged versions did not participate in producing the original versions, but they were experienced user-interface professionals on about the same level as the four original designers. In other projects, one would often choose the most senior original designer to produce the merged design to avoid having to teach yet another designer about the project. A potential problem, though, is that some designers may become too enamored of their own ideas to fully appreciate competing designs. Under these circumstances, it is best to use senior designers and ask them to guard against bias.

The two designers who produced our merged versions worked independently and were not shown each other's designs. Normally, we would use only a single designer in this step, but we used two to gather additional research data. Version 2.x's designer had access to the four original parallel designs, versions 1.a-d, but was not given user test results. Version 2.y's designer had the four original designs and the user test results. Neither was shown the original designers' redesigns, since those were done only for research purposes and were not part of the parallel design process.

We did not give the designers of versions 2.x and 2.y any of the design rationale used in the initial versions, so we could keep their work as independent as possible for evaluation and comparison purposes. Normally, one would expect the designers of the merged versions to talk to the original designers. We expect that the availability of design rationale would help the designers of the merged versions make better decisions and make parallel design's relative performance even better than in our project. However, this would also entail additional time for discussions, meetings, and report writing.

User Testing

The two merged designs were subjected to the same user tests as the earlier versions. The designers then produced redesigns, and the resulting versions 3.x and 3.y were subjected to a final round of testing. The test results for versions 2.x, 2.y, 3.x, and 3.y are in Table 2. Versions 3.x and 3.y had very high measured usability and were on average 2.5 times better than the first versions. It is likely that further improvements could have been achieved by additional iterations, but we did not have resources for that.

**Table 2.** *Measured usability for the two merged designs that were based on the original four parallel designs and for the redesigns of the merged versions.*
Version	Task time in minutes	Normalized task time	Errors per task	Normalized errors	Overall usability
2.x	19	152	0.9	274	204
2.y	23	124	1.5	164	143
2 average	21	137	1.2	212	170
3.x	18	159	0.5	493	280
3.y	20	147	0.7	352	228
3 average	19	153	0.6	417	252

Version 2.x had higher measured usability than version 2.y, even though only version 2.y's designer had user test data. However, we do not have enough data to conclude that usability tests are unnecessary for the parallel process' merging step.

Note that the designer who moved from version 2.y to 3.y had access to the initial parallel versions' test data and achieved substantial measured usability improvements. This designer compared version 2.y's test results with those from the four original versions and looked closely at the original designs that tested better for specific tasks. This let the designer target productive redesign areas and avoid a time-consuming reading of all four usability test reports.

Comparing Test Results

Tables 1 and 2 show that version 2's average overall usability was 118 when designed by traditional iterative design and 170 when designed by merging four earlier parallel designs. Thus, parallel design was 44 percent better. During the next, iterative phase of the parallel design process, the measured usability improvement going from version 2.x to 3.x and from version 3.x to 3.y averaged 48 percent. This is in the expected range for improvements by iterative design, based on prior experience. ^{[Ref. 1]} Thus, parallel design did not sacrifice its improvement potential in later stages to achieve its usability boost in the merging stage.

Another question is whether it's better to simply pick the single best design out of the original options and proceed with that one design and not bother with the "losing" designs. In our case study, version 1.a was the "winner" among the first-round designs (at a usability score of 136). Iterating solely on this design generates a usability score of 156 (for version 2.a), which is lower than the average score for the merged designs (170 points, or 9% better).

Since our system mirrored the existing telephone system, we can compare our designs' usability with that of the existing system. Earlier studies ^{[Ref. 5]} showed that people had an average success rate of 61 percent the first time they used one of our system's 10 services on a traditional telephone, 84 percent when they used our parallel versions (1.a-d), and 95 percent when they used our merged versions (2.x and 2.y).

The earlier study found that people who used these services for the first time on a screen-based telephone, instead of a traditional telephone, had a success rate of 75 percent. Since the screen-based telephone used in the earlier study had a much smaller screen (16 × 7 characters) than the one called for in our design (40 × 16 characters), it is not surprising that its measured usability was poorer than that found for our initial designs.

We conclude that we had good initial parallel designs. They performed better than traditional telephones and screen-based telephones. Therefore, improvement by the merged versions cannot be attributed to initial design inferiority.

Independent Iterative Design

Even though usability improved significantly, one might hypothesize that this was due not to the parallel design method but to the fresh designers who worked on the merged versions. First, we do not believe these designers were better than the four parallel version designers. Also, we tried a project that introduced fresh designers at each stage, using a method we call independent iterative design. The results were miserable.

The new designers in this project not only fixed the previous design's usability problems but also introduced new ideas. This caused a phenomenon we call design thrashing, where the design never became stable or polished. Each time the design changed direction, new problems were introduced, causing the overall design's measured usability to be no better than that of its predecessor. Thus, it is unlikely that our parallel design project's improvements were due simply to using fresh designers for the merged versions.

Cost Accounting

Table 3 shows cost estimates for the parallel design process' various stages. Fixed costs related to project start-up remain the same no matter how many versions are designed and tested, but variable costs are incurred for each version. The variable costs are larger for a first (1.a-d) or merged (2.x-y) version than for a revision (2.a-d and 3.x-y) because the designer, and subsequently the programmer, spend more time on initial design than on revision.

**Table 3.** *Parallel design project cost estimates, based on costs of $30 per hour for a research assistant (RA), $125 per hour for a usability professional (UP), and a flat fee of $20 for a user.*
Fixed cost of project start-up	Cost
Learning domain, writing specifications: 160 hours RA	$4,800
Constructing test tasks, preparing pilot tests: 40 hours RA	$1,200
Pilot testing, 3 hours RA + 3 users	$150
Total	$6,150
Variable cost per parallel or merged version
Design session, confirming implementation: 9 hours RA + UP	$1,395
Programming: 17 hours RA	$510
User testing: 20 hours RA + 10 users	$800
Writing test report: 6 hours RA	$180
Total	$2,885
Variable cost per iterative design version
Redesign session, confirming implementation: 5 hours RA + UP	$775
Programming: 9 hours RA	$270
User testing: 20 hours RA + 10 users	$800
Writing test report: 6 hours RA	$180
Total	$2,025

Note added 2006: The numbers in the table were the ones that applied to phone company projects in 1996. Sadly, costs have gone up some for commercial usability projects over the last ten years. In particular, the cost per test user is closer to $200 when you test a more targeted audience, as is typical for website research. These cost increases don't change our general conclusion, but do be prepared to pay a good deal more than indicated in the table.

Our overall costs were slightly lower than the estimates for earlier iterative design projects, where the estimated fixed start-up costs ranged from $7,000 to $16,500 and the estimated variable costs for each version after the first ranged from $2,800 to $7,000. Although we are better at controlling costs now, they were lower primarily because this study's interface was smaller and simpler than the interfaces we usually work with.

There is no doubt that parallel design is more expensive than standard iterative design. If we eliminate the cost of the additional versions we tested only for research purposes, we can calculate the two project models' costs as follows, for a design ending after version 3:

Traditional iterative design: $6,150 fixed costs + $2,885 variable costs for one initial version + $4,050 variable costs for two subsequent versions = $13,085.
Parallel design: $6,150 fixed costs + $11,540 variable costs for four initial parallel versions + $2,885 variable costs for one merged version + $2,025 variable costs for one subsequent version = $22,600.

In our project, parallel design was 73 percent more expensive than iterative design. We still recommend parallel design because it achieves major usability improvements very fast. We don't know how usability would have improved if we had used iterative design and extended the iterative process. If we assume usability improvements of 18 percent per iteration, which was the average for our four initial designers, version 3's usability level would have been 139. This compares very poorly with the measured usability of 252 for the parallel process' version 3. An average of 6.6 versions, costing $20,375 and considerable time, would have been required for standard iterative design to reach a usability level of 252.

Diversified Parallel Design

In a variant called diversified parallel design, each parallel-stage designer optimizes an interface for one specific platform or user group without considering other platforms or user groups that will have to be supported. For example, one designer can design a novice user interface, while another designs an expert user interface. This allows both designers to explore the design space more fully than if each had to work on an all-encompassing design. This also provides a refined set of ideas from which the eventual unified design will support all intended platforms and users.

Deviations From Recommended Practice

We had at least ten people take each test, even though we would normally recommend using no more than five, because almost all interface usability problems can be found with 5 tests. However, our project required many test users to get reasonably tight confidence intervals on our usability measures.

Our test tasks and system functionality were fixed for all versions. Some development projects require constant specifications, but the product functionality definition normally changes as one learns more about product use. We had to keep our test tasks and product specifications constant to get comparable results from the testing of the various versions.

Our parallel design phase deviated from what we would recommend. First, we would not ask each parallel designer to redesign without seeing other designers' versions. Thus, versions 2.a-d were produced only for research purposes. Generally, the merged version would be the only version 2. Second, we implemented all versions as complete running programs because we wanted to know exactly how long it took users to perform realistic tasks. It might normally be sufficient for exploration purposes to implement most early versions as low-fidelity prototypes.

Conclusion

Our case study supports the value of parallel design. It reflects one possible use of the method, which would probably be even more successful for traditional graphical user interfaces with more design space.

Parallel design is more expensive than traditional iterative design, but it explores the design space in less time. In our case study, the improvement in measured usability from version 1 to 2 was 18 percent with traditional iterative design and 70 percent with parallel design. Since parallel design is more expensive, we cannot recommend it for all projects, but it has value when time-to-market is of the essence.

Acknowledgments

This work was done while the authors were with Bellcore's Applied Research area. The authors thank Sheila Borack for substantial help in scheduling subjects and for other matters related to running experiments. We thank Richard Herring for letting us reuse his task descriptions for user testing of advanced telephone services. We also benefited from George Furnas' helpful comments on a previous version of this article. Finally, we especially thank the six Bellcore user-interface designers who spent significant time designing and redesigning our interface. Since these people were, in some sense, the real subjects for our study, ethical considerations prevent us from listing their names. But they know who they are.

References

J. Nielsen, " Iterative User-Interface Design," IEEE Computer, Vol. 26, No. 11, Nov. 1993, pp. 32-41.
J. Nielsen, Usability Engineering, Academic Press, Boston, 1994.
M. Magnusson and H. Palsson, (translators), Laxdaela Saga, Penguin Books, London, 1969, p. 90. A translation of an anonymous Icelandic handwritten manuscript from about 1245.
J. Nielsen et al., "Comparative Design Review: An Exercise in Parallel Design," Proc. ACM INTERCHI'93 Conf., Assoc. for Computing Machinery, New York, 1993, pp. 414-417.
R.D. Herring, J.A. List, and E.A. Youngs, "Screen-Assisted Telephony and Voice Service Usability," Proc. 14th Int'l. Symp. on Human Factors in Telecommunications, 14th Int'l. Symp. on Human Factors in Telecommunications, Darmstadt, Germany, 1993.

Live Online

Self-Paced

Improving System Usability Through Parallel Design

Introduction

In This Article:

Parallel Design vs. Iterative Design

Case Study

Parallel Design Stage

User Testing

Merged Design Stage

User Testing

Comparing Test Results

Independent Iterative Design

Cost Accounting

Diversified Parallel Design

Deviations From Recommended Practice

Conclusion

Acknowledgments

References

Improving System Usability Through Parallel Design

Introduction

In This Article:

Parallel Design vs. Iterative Design

Case Study

Parallel Design Stage

User Testing

Merged Design Stage

User Testing

Comparing Test Results

Independent Iterative Design

Cost Accounting

Diversified Parallel Design

Deviations From Recommended Practice

Conclusion

Acknowledgments

References

Related Topics

Learn More:

Related Articles: