Professional Documents
Culture Documents
10.1201 9781315137957-14 Chapterpdf
10.1201 9781315137957-14 Chapterpdf
10.1201 9781315137957-14 Chapterpdf
Systems
W illiam H. Rogers
Paul C. S ch u tte
Kara A. Latorella
NASA Langley Research Center
INTRODUCTION
281
282 ROGERS, SCHUTTE, LATORELLA
The fault management framework presented here integrates: (a) the oper
ational levels within which real-time fault management occurs, (b) cognitive
control levels (Rasmussen, 1983), (c) information processing (IP) stages
(Rasmussen, 1976), and (d) the derinition of fault management as a set of
operational tasks (detection, diagnosis, prognosis, compensation). Prior to
presenting the integrated FM framework, each of the four component
models is discussed separately.
Operational Levels
FIG. 14.1. The three operational levels within which real-time fault management
occurs in commercial aviation.
planned. The pilots may need to determine which engine has failed and shut
it down (FM at the systems level), but they also need to maintain the
stability of the aircraft by adding rudder input and power to another engine
(FM at the aircraft level), and they may need to divert to the nearest airport
rather than to continue the mission as planned (FM at the mission level).
Consideration of fault management as solely a systems level task simply
does not capture the interdependence and complexity of activities across
operational levels that result from a systems fault. Our integrated frame
work therefore describes fault management as it occurs across these three
operational levels.
R ule-
baaed
Association Stored rules
b ehavior S i^ s Recognition
state/task for task
S k ill-
based Automated
(Signs)
beh avior Feature formation sensorimotor
patterns
TTTTSensory input
TFTT
Signals Actions
FIG. 14.2. Simplified diagram o f the three levels of control of human behavior in
problem solving. (Reproduced from Rasmussen, 1983, with permission from IEEE.)
requires the operator to reason to plans for action from first principles, that
is, from their knowledge of relationships between environmental states and
variables (based on mental models), and their priorities for performance
criteria. Environmental information is interpreted and integrated to arrive
at abstract, internal representations of functional properties of the environ
ment. These “symbols” serve as inputs to the reasoning process. Although
RBB is said to be goal oriented because rules are selected according to some
objective, KBB is goal directed—sensory, motor, and cognitive behavior are
driven by the explicit formulation of a goal. KBB involves planning,
deciding among alternatives, and hypothesis testing.
Rasmussen (1986) noted that Fitts and Posner’s (1%2) skill-learning
phases (the early-cognitive stage, the intermediate-associative stage, and the
final-autonomous stage) mirror the KBB, RBB, and SBB hierarchy. Oper
ators in an unfamiliar situation begin by behaving in a goal-directed,
knowledge-based manner. After identifying patterns in the environment
and associating learned successful behaviors, operators begin to exhibit
RBB. Finally, after sufficient practice and refinement, environmental
patterns are not identified but directly perceived as signals, and behavior
proceeds without attentional control at the skill-based level.
Information Processing
K n o w le d g o 'b a se d d o m a in
FIG. 14.3. The relationship between the decision ladder and the three levels of control
of human behavior. (Reproduced from Rasmussen, 1986, with permission from
Elsevier Science Publishing Co., Inc.)
K N O W LE D G E -B A S E D DOM AIN
Task ^COMPENSATION
FIG. 14.4. The four fault management tasks overlaid on Rasmussen’s more elemental
information processes. (Adapted from Rasmussen, 1986, with permission from Elsevier
Science Publishing Co., Inc.)
dealing with a failure that does not pose an immediate or future danger to
the aircraft. Prognosis is “thinking ahead of the aircraft” to determine the
urgency with which a particular fault must be managed.
FIG. 14.5. The integrated fault management framework. (Adapted from Rasmussen,
1986, with permission from Elsevier Science Publishing Co., Inc.)
There are many factors that affect human fault management performance
in operational environments. There are contextual factors such as the
292 ROGERS, SCHUTTE, LATORELLA
from two basic failure modes: misapplication of good rules and application
of bad rules. A rule is “good” if it has demonstrated utility within a
particular situation. Although a rule itself might be good, its placement in
the stack from which rules are selected might be inappropriate for a given
situation. Thus, a good rule may be misapplied in the sense that another
rule should be applied instead. Additionally, a good rule can be misapplied
in the sense that no rule should be applied, that is, there is no rule that
provides the correct mapping, and hence knowledge-based processing is
required. The other failure mode of RB behavior is to apply bad rules, those
with encoding deficiencies or action deHciencies, to a situation. Encoding
deficient rules are those in which the rule’s conditional phrase is underspe
cified or inappropriately describes the environment. Action-deficient rules
are those in which the consequence of the rule results in inappropriate,
inefficient, or otherwise maladaptive responses.
Rule-based processing for deterministic faults has obvious advantages:
RB errors are relatively predictable. RB processing is more efficient than is
KB processing and, if formalized, allows a good deal of procedural
standardization. However, the use of rule-based processing when knowledge-
based processing is appropriate (classified previously as the misapplication
of a good rule) can have serious consequences for management of novel
faults. There is an interesting dilemma in managing novel faults, as pointed
out by Reason (1990): “Why do we have operators in complex systems? To
cope with emergencies. What will they actually use to deal with these
problems? Stored routines based on previous interactions with a specific
environment” (p. 183). Not only does there seem to be a natural tendency
for humans to migrate to rule-based processing due to cognitive economy,
but the explicit strategy of linking fault identification with prescribed
procedures in both training and design also induce pilots to operate in a
rule-based level. A recent study (Rogers, 1993) evaluated how pilots think
about fault management information. Results suggested that pilots prima
rily want information that will help in detection, initial identification, and
response first, and want deeper diagnostic and prognostic information last
if at all. This finding was interpreted as indicating a tendency to shortcut the
deeper tasks of diagnosis and prognosis, particularly in time-stressed
situations. This finding may be the result of the information they receive on
current flightdecks, or the frequency that they predict they will need
different types of information, but it reflects, nonetheless, that they have a
rule-based mind set. NASA Langley Research Center conducted a simula
tion study to evaluate a fault management aid called “Faultfinder” (Rogers,
1994; Schutte, 1994). The experimental setup provided CRT-displayed
checklists; however, there were no checklists for some faults that were
introduced. Many of the pilots spent considerable time looking through
checklists in an attempt to find one that seemed appropriate for the
14. FAULT MANAGEMENT IN AVIATION SYSTEMS 295
and the typical “multiple errors” sequence of events that often result in
catastrophe. This accident summary, contained in NTSB file 5032, is
provided here and is referred to numerous times in the following sections as
the “Monte Vista” accident. It involved a DC-9 that departed the end of the
runway at Monte Vista, Colorado, in 1989. Fortunately, it resulted in only
minor aircraft damage and no injuries or deaths. It should be noted that
most of the problems illustrated by this accident are at the aircraft and
mission operational levels.
While in cruise at 35000 ft. the no. 2 generator constant speed drive (CSD)
failed and the crew inadvertently disconnected the no. 1 generator CSD (an
irreversible procedure). Attempts to start the auxiliary power unit (APU)
above the start envelope were made but aborted due to a hot start. A descent
was made on emergency electrical power and a landing was made at an airport
which was inadequate for the aircraft. The crew chose to make a no flap
approach and to lower landing gear by emergency method when both systems
were operable by normal means. This decision was influenced by the lack of
landing gear and flap position indicators when the aircraft is operated on
battery power. The landing speed was fast and the aircraft departed the end
of the runway damaging the landing gear and the no. 1 engine. The crew failed
to manually de-pressurize the aircraft and the evacuation was delayed until a
knowledgeable passenger went to the cockpit and de-pressurized the aircraft.
The aircraft operators manual did contain APU starting procedures but not
an APU start envelope chart. (NTSB, 1990)
Detection
The flight crew may not detect a fault that requires intervention for a
variety of reasons. From a signal detection perspective (e.g., see Sorkin &
Woods, 1985), it may be difficult to identify a signal (fault) because it
occurs within a noisy environment and it is hard to separate from the noise
(i.e., system sensitivity), or the threshold that one sets to make the decision
that a signal is present (i.e., the detector’s response criterion) may be too
high (i.e., it takes a lot of evidence before one is willing to say there is a
signal).
In fault management, the nature of human perception and cognition can
affect the human’s sensitivity, or ability to distinguish the signal from the
noise. Humans are generally not well suited to vigilance tasks (e.g., see
Craig, 1984), and complacency induced by reliable automation can lead to
detection inefficiency (Wiener, 1981). To the extent that systems are highly
reliable, and/or humans monitor in a skill-based level, subtle deviations can
be easily missed. For example, Molloy and Parasuraman (1992) noted that
constant patterns of failures (e.g., highly reliable normal operation with
extremely rare failures) lead to inefficient monitoring (see also Parasura-
298 ROGERS, SCHUTTE, LATORELLA
contained compressor failure within the left engine. The failure was
accompanied by vibration that was indicated on the instruments as 2.5,
versus .7 on the right engine. The crew reported reluctance to rely on the
vibration indication because the engine had a history of unreliable vibration
indications and the manufacturer had not set any limits that would require
a precautionary shutdown. Many incidents have been identified where
parameters were not "normal,” but the crew was not certain that a system
fault that required intervention had occurred. This problem can be thought
of as a knowledge-based detection problem; that is, there are no specified
rules for determining that an abnormal state exists, and the flight crew often
does not have an adequate model of normality to determine the existence of
a system fault.
Given the poor sensitivity of the human detector for low-frequency events
and the other problems discussed previously, it is not surprising, as
discussed in the automation section that follows, that early automated
systems provided assistance for detection to advance FM on the flightdeck.
Diagnosis
problem occurred: The flight crew believed they had bad sensor readings
when in fact they had a real problem.
In addition, other kinds of contextual factors limit human rationality in
emergency situations (e.g., time-stress and cost of failure). Stress can
decrease the number of diagnostic tests used (Morris & Rouse, 1985), and
increase forgetting and distortion of symptom-state mappings (Embrey,
1986). Moray and Rotenberg (1989) found that operators in FM situations
experienced “cognitive lockup,” resulting in delayed responses to events on
the normally operating systems and serial processing in FM. It must be
recognized that in aviation, and other real-time environments where there
are time stresses and high costs associated with abnormal situations,
humans’ performance limitations and biases will be exacerbated. Morris
and Rouse (1985) provided an excellent review of task requirements and
human performance characteristics related to troubleshooting—diagnosing
equipment failures typically without time stress or other ongoing threads of
activity. They also extended their discussion to one of training trouble
shooting skills.
Prognosis
if what evolves is consistent with the prediction. Thus, “hypothesis and test”
symptomatic search cannot be applied to prognosis.
Knowledge-based rationality limitations and incomplete or inaccurate
models of functional relationships among systems, aircraft control, and
mission implications also bound the ability to accurately prognose. For
example, in the Monte Vista accident, the pilots probably did not have the
knowledge (i.e., a model of the altitude and speed envelope within which
the APU will usually start while in-flight) that would have allowed them to
predict that an APU start attempt would very likely be unsuccessful, or that
it would be more likely to be successful later when they reached a lower
altitude. In aviation, prognosis errors are especially critical at the aircraft
and missions level, because implications of future effects of faults at these
levels can be flight-critical.
Compensation
for asymmetrical thrust, or the inappropriate rudder input could have been
a compensatory slip.
Human performance issues in fault management and in the individual
tasks of detection, diagnosis, prognosis, and compensation have been
briefly reviewed. Although human abilities are critical to management of
unusual and novel faults (particularly in stochastic environments such as
aviation), human limitations and biases can have catastrophic effects in the
management of faults. Aircraft designers have attempted to compensate for
these limitations through the use of automation and automated FM aids.
rather than directing the human’s processing. Included in this latter type
would be advances in graphic displays and smart interfaces that, in addition
to providing new types of information, can provide information in a more
useful form for the task that must be performed (e.g., see Malin et al., 1991;
Rouse, 1978). Another form of fault management automation is that which
provides error avoidance and error checking, such as systems that only
allow valid data entries, and systems that question the flight crew’s
compensation for the fault if it is inconsistent with the automation’s fault
assessment. It should be noted that human performance issues in FM can be
addressed through training as well (e.g., see Landerweed, 1979; Mumaw,
Schoenfeld, & Persensky, 1992).
Recognizing that humans are not particularly good detectors of rare
events, and that knowledge-based processing is more error-prone and KB
errors are more difficult to detect and rectify, flightdeck fault management
automation has focused primarily on improving performance of the
detection task and reducing the requirement for knowledge-based process
ing. This section reviews past and current commercial flightdeck fault
management automation. Although automation that has addressed these
aspects of FM generally works well, other aspects of FM still require
attention. Remaining problems are described here, and future fault man
agement automated aids that could potentially reduce or eliminate the
remaining problems are discussed.
A major research and development effort in the late 1970s and early 1980s
(e.g., see Boucek, Berson, & Summers, 1986) aimed to reduce problems
associated with the proliferation of fault alerts on early generation aircraft.
This effort recognized the importance of simplicity, integration, standard
ization, and proceduralization in reducing pilot workload and error poten
tial. This led to recommendations to integrate systems alerts so that the
number of alerts was reduced and information could be obtained from a
single location. It was also suggested that automation not only monitor
aircraft system health, but provide improved guidance and status informa
tion as well. Fault identification should be easily associated with standard
ized procedures. Finally, it was suggested that faults and nonnormal states
be categorized into three priority levels: high priority, medium priority, and
low priority.
As a result of this work, commercial aircraft, beginning with Boeing’s
B-757 and B-767 and McDonnell Douglas’ MD-88, were designed to provide
integrated caution and warning systems. These integrated alerting systems,
along with standardized nonnormal and emergency procedures, greatly
reduced the number of aural and visual alerts, and provided the foundation
for standardizing and proceduralizing the whole FM function. Brief
306 ROGERS, SCHUTTE, LATORELLA
Even with these significant advances in FM, problems still exist. We next
outline four classes of problems facing pilots performing FM tasks and,
therefore, flightdeck designers in their efforts to facilitate FM through the
use of automation. One is that skill-based slips and lapses will always occur,
most commonly in compensation. The second problem is that pilots
perform FM in a mle-based level when a knowledge-based level is appro
priate. The third problem is that subtle, novel, complex, and multiple
faults, although extremely rare, will occur, and when they do, human
limitations in knowledge-based processing are problematic. The fourth
problem is that aiding automation, operational procedures, and training
thus far have addressed FM mainly as systems management, and often do
not consider FM in the context of aircraft-level and mission-level manage
14. FAULT MANAGEMENT IN AVIATION SYSTEMS 309
ment. The following sections discuss these problems and potential solu
tions.
FM at the Aircraft and Mission Levels. The final area that can be
enhanced by improvements in FM, independent of the types of processing
that are used, is fault management at the aircraft and mission levels. There
are generally a number of options that are available in aircraft control and
in mission replanning due to a particular failure. Fault management at the
mission level can obviously be critical to safety. For example, the selection
of an appropriate airport in fuel-critical or flight-critical situations can
mean the difference between a safe landing and a hull loss. In these
situations, the flight crew is often time-pressured and cannot consider all of
the relevant factors. In the Faultfinder study, the scenarios required pilots
to manage mission consequences of the fault, specifically to declare an
emergency, contact air traffic control (ATC) and company dispatch, and
divert to the closest airport. Although pilots did reasonably well in
managing the faults per se, they made many errors in handling all the flight
planning, communication, and task management (e.g., scheduling, assign
ing, and coordinating tasks and task resources) activities necessitated by the
faulted condition. It seems clear that this is a ripe area for future research
and potential application of automated aiding.
Management of the consequences of faults on these other flight functions
has traditionally been left to the flight crew because of the greater
312 ROGERS, SCHUTTE, LATORELLA
for the same reasons that fault management at the systems level beneHted
from integrated alerting systems.
SUMMARY
ACKNOWLEDGMENTS
The first author was supported for this work under NASA contract
NASl-18799, Task 15. The third author was supported for this work under
NASA graduate student researchers program fellowship NOT 50992.
REFERENCES
Malin, J. T., Schreckenghost, D. L., Woods, D. D ., Potter, S. S., Johannesen, L., Holloway,
M., & Forbus, K. D. (1991). M aking intelligent system s team players: Case studies and
design issues. Volume 1: H um an-com puter interaction design (NASA-TM-104738). Hous
ton, TX: NASA Johnson Space Center.
Molloy, R., & Parasuraman, R. (1992). Monitoring automation failures: Effects of automa
tion reliability and task complexity. Proceedings o f the H um an Factors Society 36th A nnual
M eeting (pp. 1518-1521). Santa Monica, CA: Human Factors Society.
Montgomery, G. (1989, October). A bductive diagnostics. Paper presented at the AIAA
Computers in Aerospace VII Conference, Monterey, CA.
Moray, N. (1981). The role of attention in the detection of errors and the diagnosis of failures
in man-machine systems. In J. Rasmussen & W. B. Rouse (Eds.), H um an detection and
diagnosis o f system failures (pp. 185-198). New York: Plenum Press.
Moray, N., & Rotenberg, I. (1989). Fault management in process control: Eye movements and
action. Ergonomics, 52(11), 1319-1342.
Morris, N. M., & Rouse, W. B. (1985). Review and evaluation of empirical research in
troubleshooting. H uman Factors, 27(5), 503-530.
Morrison, D. L., & Duncan, K. D. (1988). Strategies and tactics in fault diagnosis.
Ergonomics, 5/(5), 761-784.
Mumaw, R. J., Schoenfeld, I. E ., &Persensky, J. J. (1992, June). Training cognitive skills fo r
severe accident management. Paper presented at the IEEE 5th Conference on Human
Factors and Power Plants, Monterey, CA.
National Transportation Safety Board (NTSB). (1973). A ircraft accident report: Eastern A ir
Lines, Inc., L-1011, N310EA, M iami, Florida, December 29, 1972 (NTSB-AAR-73-14).
Washington, DC: National Transportation Safety Board.
National Transportation Safety Board (NTSB). (1984). A ircraft accident report: Eastern A ir
Lines, Inc., Lockheed L -lO ll, N334EA, M iam i International A irport, M iami, Florida, M ay
5, 1983 (NTSB-AAR-84-04). Washington, DC: National Transportation Safety Board.
National Transportation Safety Board (NTSB). (1986). A ircraft accident report: China
A irlines B-747-SP, 300 nm northwest o f San Francisco, CCA, February, 19,1985 (NTSB-
AAR-86-03). Washington, DC: National Transportation Safety Board.
National Transportation Safety Board (NTSB). (1990). [Summary of 1989 DC-9 Monte Vista,
CO accident]. Unpublished raw data from NTSB filed Report no. DEN90IA012 (file 5032).
Norman, D. A. (1981). Categorization of action slips. Psychological Review, 88, 1-15.
Rasmussen, J. (1976). Outlines o f a hybrid model of the process operator. In T. B. Sheridan
& G. Johannsen (Eds.), M onitoring behavior and supervisory control (pp. 371-383). New
York: Plenum.
Rasmussen, J. (1983). Skills, rules, and knowledge; signals, signs, and symbols, and other
distinctions in human performance models. IE E E Transactions on System s, M an, and
Cybernetics, /5(3), 257-266.
Rasmussen, J. (1986). Inform ation processing and hum an-m achine interaction: A n approach
to cognitive engineering. New York: North-Holland.
Reason, J. (1990). Human error. New York: Cambridge University Press.
Riley, V. (1994). H uman use o f autom ation. Unpublished doctoral dissertation. University of
Minnesota, Minneapolis.
Rogers, W. H. (1993). Managing systems faults on the commercial flight deck: Analysis of
pilots’ organization and prioritization of fault management information. In R. S. Jensen &
D. Neumeister (Ed.), Proceedings o f the 7th International Sym posium on A viation
Psychology (pp. 42-48). Columbus, OH: The Ohio State University.
Rogers, W. H. (1994, April). The effect o f an autom ated fa u lt m anagement decision aid on
p ilo t situation awareness. Paper presented at the First Automation Technology and Human
316 ROGERS, SCHUTTE, LATORELLA
Inc.
Wiener, E. L. (1989). H uman fa cto rs o f advanced technology Cglass cockpit**) transport
aircraft (NASA-CR-177528). Moffet Field, CA: NASA Ames Research Center.
Wiener, E. L., & Curry, R. E. (1980). Flight-deck automation: Promises and problems.
Ergonomics, 23, 995-1011.
Wohl, J. G. (1983). Connectivity as a measure o f problem complexity in failure diagnosis.
Proceedings o f the H uman Factors Society 27th A nnual M eeting (pp. 681-684). Santa
Monica, CA: Human Factors Society.
Woods, D. D. (1992). The alarm problem and directed attention (CSEL Report No.
92-TR-05). Columbus, OH: The University of Ohio, Cognitive Systems Engineering
Laboratory.
Woods, D. D. (1994). Cognitive demands and activities in dynamic fault management:
Abductive reasoning and disturbance management. In N. Stanton (Ed.), H uman factors o f
alarm design (pp. 63-92). London: Taylor & Francis.
Woods, D. D., Johannesen, L. J., Cook, R. L, & Sarter, N. B. (1994). Behind human error:
Cognitive system s, com puters, and hindsight (CSEKIAC State-of-the-Art Report). Dayton,
OH: Crew Systems Ergonomic Information and Analysis Center.
Woods, D. D., Roth, E. M ., & Pople, H. E ., Jr., (1990). Modeling the operator performance
in emergencies. Proceedings o f the H um an Factors Society 34th A nnual M eeting (pp.
1132-1137). Santa Monica, CA: Human Factors Society.
Yoon, W. C., & Hammer, J. M. (1988a). Aiding the operator during novel fault diagnosis.
IEEE Transactions on System s, M an, and Cybernetics, 75(1), 142-148.
Yoon, W. C., & Hammer, J. M. (1988b). Deep-reasoning fault diagnosis: An aid and a model.
IEEE Transactions on System s, M an, and Cybernetics, 75(4), 659-675.