next up previous


A prototype of the system was tested during the course of a field trial that took place at Camp Pendleton in June 1995. During the course of the trial, three (male) mechanics performed partial LTI inspections. (Excluded were inspections of the engine plenum, a physically demanding procedure.) Participants were assigned to the study by their supervisor and were individually introduced to the system in a structured training session.

The training approach used a combination of modeling an experienced user and explicitly instructing the novice in proper use. Thus first the user observed the experimenter using the system (on a separate notebook computer), then was invited to use it himself and become comfortable with its operation. At that point, the wearable system was given to the user to try out and questions were entertained. The training process was limited to 10 minutes and was paced by the individual's progress (no participant needed the entire period). At the conclusion of training, all proceeded to the vehicle and the inspection was carried out. Upon completion, the mechanic participated in a structured interview that assessed their impressions of the device.

The system was instrumented to collect a variety of data, including: the actual utterances produced by the user, their decodings, decoder and task timings and the sequence of links traversed. System response was at a median of 4.2 xRT, producing a corresponding lag of 3.8 s per input (utterances were 0.8 s median duration). Recognition word error ranged between 12%-15% across subjects. Detailed analysis of the errors (Table 1) suggests that the majority of the recognition errors were due to factors that can be brought under control through additional development. This includes a better choice of microphone, a more complete domain language and more focussed user training.


source of error amount
Signal processing / mic 30%
Language coverage 35%
Instructions 12%
Other 23%
Table 1: Error Analysis for field trial data 

User interviews indicated that the participants came away with a favorable impression of the novel inspection device and indicated they would be willing to use it in regular work. At the same time, the users pointed out a number of deficiencies: the device appeared subjectively slower than the traditional paper-and-pencil system. There is reason to believe that some of this impression may be based on a simple lack of experience with the system (users will typically experience long-term improvement in task completion time while using a speech system, e.g. [9]). It also became apparent that an interface that is capable of actively guiding users when they exhibit difficulties would also be of value. We have since explored strategies for monitoring the input stream and detecting patterns that suggest the user is in trouble (for example, a sequence of identical inputs). This in turn can be used to trigger a separate clarification dialog.

It was clear that the design of the system could be improved in a number of ways. In particular, a better microphone (which we have since identified) and a more comprehensive coverage of the domain language (the task was designed without first-hand experience of the domain) can reduce the number of errors by a factor of two-thirds. The excessive response lag could also be reduced by more careful exploitation of the constraints available in this domain and by tailoring the properties of the speech system to conform more closely to the task language (our current implementation runs at 2.6 xRT and continues to be improved).

next up previous

Alex Rudnicky
Thu May 30 19:32:28 EDT 1996