Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Safety Science 82 (2016) 393–398

Contents lists available at ScienceDirect

Safety Science
journal homepage: www.elsevier.com/locate/ssci

Assessment of the Human Factors Analysis and Classification System


(HFACS): Intra-rater and inter-rater reliability
Awatef Ergai e, Tara Cohen a, Julia Sharp c, Doug Wiegmann b, Anand Gramopadhye d, Scott Shappell a,⇑
a
Department of Human Factors, Embry-Riddle Aeronautical University, United States
b
Department of Industrial and Systems Engineering, University of Wisconsin – Madison, United States
c
Department of Mathematical Sciences, Clemson University, United States
d
College of Engineering and Science, Clemson University, United States
e
College of Engineering, Northeastern University, United States

a r t i c l e i n f o a b s t r a c t

Article history: The Human Factors Analysis and Classification System (HFACS) is a framework for classifying and
Received 5 March 2015 analyzing human factors associated with accidents and incidents. The purpose of the present study
Received in revised form 11 July 2015 was to examine the inter- and intra-rater reliability of the HFACS data classification process.
Accepted 21 September 2015
Methods: A total 125 safety professionals from a variety of industries were recruited from a series of
Available online 11 November 2015
two-day HFACS training workshops. Participants classified 95 real-world causal factors (five causal
factors for each of the 19 HFACS categories) extracted from a variety of industrial accidents. Inter-rater
Keywords:
reliability of the HFACS coding process was evaluated by comparing performance across participants
HFACS
Inter-rater reliability
immediately following training and intra-rater reliability was evaluated by having the same participants
Intra-rater reliability repeat the coding process following a two-week delay.
Classification Results: Krippendorff’s Alpha was used to evaluate the reliability of the coding process across the various
Human-error HFACS levels and categories. Results revealed the HFACS taxonomy to be reliable in terms of inter- and
intra-rater reliability, with the latter producing slightly higher Alpha values.
Conclusion: Results support the inter- and intra-rater reliability of the HFACS framework but also reveal
additional opportunities for improving HFACS training and implementation.
Ó 2015 Elsevier Ltd. All rights reserved.

1. Introduction 1990). The structure of HFACS is hierarchical, defining nineteen


causal factor categories within four levels or tiers (see Fig. 1).
The role that human factors plays in the genesis of accidents The four tiers include unsafe acts, preconditions for unsafe acts,
and incidents is indisputable. However, the approach one takes unsafe supervision, and organizational influences. Each level is
when investigating and identifying human factors with respect to dependent on the previous one and factors are assumed to
accidents and incidents varies depending on an individual’s progress from active to latent conditions as they progress up the
workplace, their background, training, and knowledge of the field. hierarchy from unsafe acts to organizational influences. A brief
One methodology that has proven particularly useful across description of the HFACS framework is presented in Table 1 (for
domains is the Human Factors Analysis and Classification System a more complete description see Wiegmann and Shappell, 2003).
(HFACS; Wiegmann and Shappell, 2003). Indeed the HFACS One of the fundamental applications of HFACS involves the clas-
methodology has been employed in a variety of industrial settings sification of incident/accident causal factors into the HFACS causal
such as aviation (Li and Harris, 2006), mining (Patterson and categories. This classification or coding process is often performed
Shappell, 2010), maritime (Chen et al., 2013), rail (Reinach and on pre-existing causal factors associated with events that were not
Viale, 2006), and medicine (ElBardissi et al., 2007). originally investigated using HFACS. Rather, the HFACS framework
HFACS is a human error taxonomy based on Reason’s is applied post hoc in attempt to identify meaningful trends in the
well-known ‘‘Swiss Cheese” model of accident causation (Reason, human causal factors that were not apparent in the original data
structure. The reliability of this HFACS coding process impacts
the subsequent validity and utility of the HFACS output. If more
⇑ Corresponding author at: Department of Human Factors, Embry-Riddle
than one person codes the same causal factors differently, or
Aeronautical University, 600 S Clyde Morris Blvd, Daytona Beach, FL 32114-3900,
if the coding results vary for the same person over time, the final
United States. Tel.: +1 (405)640 5479.

http://dx.doi.org/10.1016/j.ssci.2015.09.028
0925-7535/Ó 2015 Elsevier Ltd. All rights reserved.
394 A. Ergai et al. / Safety Science 82 (2016) 393–398

ORGANIZATIONAL
INFLUENCES

Resource Organizational Organizational


Management Climate Process
UNSAFE
SUPERVISION

Inadequate Planned Failed to Supervisory


Supervision Inappropriate Correct Violations
Operations Problem

PRECONDITIONS
FOR
UNSAFE ACTS

Environmental Condition of Personnel


Factors Operators Factors

Physical Technological Adverse Mental Adverse Physical/ Crew Resource Personal


Environment Environment States Physiological Mental Management Readiness
States Limitations
UNSAFE
ACTS

Errors
Errors Violations

Decision Skill-Based Perceptual Routine Exceptional


Errors Errors Errors

Fig. 1. The HFACS framework (Shappell et al., 2007).

results become suspect. Decisions based on such analyses may Among the six studies that were designed to assess inter- and/
therefore lead to onerous comparisons across industries and or intra-rater reliability, some demonstrated acceptable levels of
produce ineffective mitigation/prevention plans that have little reliability (i.e., Li and Harris, 2006; O’Connor, 2008; O’Connor
meaningful impact on reducing risk or improving system safety. and Walliser, 2010). However, this was not always the case as
In general, reliability refers to the extent to which a framework, others (Olson and Shorrock, 2010; O’Connor and Walker, 2011;
experiment, test or measuring instrument yields the same result Olsen, 2011) showed less reliability. Accounting for these seem-
over repeated trials (Carmines and Zeller, 1979). Evaluating relia- ingly variable results, Cohen et al., 2015 suggested that lower
bility is a primary concern for many fields including the behavioral, levels of HFACS reliability were attributable to a variety of factors,
psychological, medical, and social sciences, particularly as new including: inadequate training of coders, the use of a small number
methods, tests, devices, and instruments are developed. There are of accident cases/causal factors, unnecessary modifications made
two types of reliability important in scientific research that to the HFACS framework, and an inconsistency in the methods
requires data classification: inter-rater and intra-rater reliability. used to assess both inter- and intra-rater reliabilities.
Inter-rater reliability refers to the framework’s ability to produce Unfortunately, many of these conclusions are speculative given
the same results irrespective of who conducts the analysis, that many of the articles reviewed by the authors lacked sufficient
whereas intra-rater reliability refers to the consistency of each detail to discern the exact nature of the conditions under which the
individual rater. Minor variations may occur in both cases; how- coding was conducted. To further address these issues, this study
ever, in general, the more stable the results, the more confident assessed the reliability of the HFACS framework as a general acci-
one can be that the results are reproducible and trustworthy. dent analysis tool using a large number of trained coders and mul-
A number of papers have previously examined the reliability of tiple real-world accident causal factors from a variety of industries.
the HFACS methodology. A recent review of these studies indicates
that the framework produces generally reliable results (Cohen,
et al., 2015). The Cohen review examined 111 HFACS manuscripts 2. Methods
in detail, of which 14 peer-reviewed articles reported inter- and/or
intra-rater reliability of HFACS. Notably however; only six of these 2.1. Participants
14 articles (Li and Harris, 2006; O’Connor, 2008; O’Connor and
Walliser, 2010; Olson and Shorrock, 2010; O’Connor and Walker, One hundred and twenty five safety professionals from a variety
2011; Olsen, 2011) were specifically designed to test reliability. of industries (e.g. aviation, mining, medicine and manufacturing)
The others merely reported reliabilities as part of quality check were recruited from a series of two-day training workshops
of data used in a larger study. designed and provided by the developers of HFACS (S.S. & D.W.).
A. Ergai et al. / Safety Science 82 (2016) 393–398 395

Table 1
Description of the HFACS categories.

Organizational influences
Organizational climate: Prevailing atmosphere/vision within the organization including such things as policies, command structure, and culture
Operational process: Formal process by which the vision of an organization is carried out including operations, procedures, and oversight among others
Resource management: How human, monetary, and equipment resources necessary to carry out the vision are managed

Unsafe supervision
Inadequate supervision: Oversight and management of personnel and resources including training, professional guidance, and operational leadership among other
aspects.
Planned inappropriate operations: Management and assignment of work including aspects of risk management, crew pairing, operational tempo, etc.
Failed to correct known problems: Those instances when deficiencies among individuals, equipment, training, or other related safety areas are ‘‘known” to the supervisor,
yet are allowed to continue uncorrected
Supervisory violations: The willful disregard for existing rules, regulations, instructions, or standard operating procedures by management during the course of their
duties

Preconditions for unsafe acts


Environmental Factors
Technological Environment: This category encompasses a variety of issues including the design of equipment and controls, display/interface characteristics, checklist
layouts, task factors and automation
Physical Environment: Included are both the operational setting (e.g., weather, altitude, terrain) and the ambient environment, such as heat, vibration, lighting and
toxins
Condition of the operator
Adverse mental states: Acute psychological and/or mental conditions that negatively affect performance such as mental fatigue, pernicious attitudes, and misplaced
motivation
Adverse physiological states: Acute medical and/or physiological conditions that preclude safe operations such as illness, intoxication, and the myriad of pharmacological
and medical abnormalities known to affect performance
Physical/mental limitations: Permanent physical/mental disabilities that may adversely impact performance such as poor vision, lack of physical strength, mental
aptitude, general knowledge, and a variety of other chronic mental illnesses
Personnel factors
Crew resource management: Includes a variety of communication, coordination, and teamwork issues that impact performance
Personal readiness: Off-duty activities required to perform optimally on the job such as adhering to crew rest requirements, alcohol restrictions, and other off-duty
mandates

Unsafe acts
Errors
Decision errors: These ‘‘thinking” errors represent conscious, goal-intended behavior that proceeds as designed, yet the plan proves inadequate or inappropriate for the
situation. These errors typically manifest as poorly executed procedures, improper choices, or simply the misinterpretation and/or misuse of relevant information
Skill-based errors: Highly practiced behavior that occurs with little or no conscious thought. These ‘‘doing” errors frequently appear as breakdown in visual scan
patterns, inadvertent activation/deactivation of switches, forgotten intentions, and omitted items in checklists often appear. Even the manner or technique with
which one performs a task is included
Perceptual errors: These errors arise when sensory input is degraded, as is often the case when flying at night, in poor weather, or in otherwise visually impoverished
environments. Faced with acting on imperfect or incomplete information, aircrew run the risk of misjudging distances, altitude, and decent rates, as well as
responding incorrectly to a variety of visual/vestibular illusions
Violations
Routine violations: Often referred to as ‘‘bending the rules” this type of violation tends to be habitual by nature and is often enabled by a system of supervision and
management that tolerates such departures from the rules.
Exceptional violations: Isolated departures from authority, neither typical of the individual nor condoned by management

2.2. Training from the National Transportation Safety Board (NTSB), Occupa-
tional Safety and Health Administration (OSHA), and other HFACS
Trainers with expertise in both human factors and the HFACS accident databases such as aviation, maintenance, food service,
methodology provided participants two days of intensive HFACS healthcare, mining and lodging. Each causal factor was selected
training. The first day included didactic training on HFACS includ- for clarity and was presented to participants in the format in which
ing a brief overview of human error. This was followed by an it was reported in the accident database. Some causal factors were
extensive discussion of each causal category within HFACS includ- modified for grammatical purposes only, and in a manner that did
ing detailed descriptions and examples. Day one culminated with not affect the content of the statements. This approach was
an in-class case study that illustrated the human factors associated adopted to enhance content and external validity of the causal fac-
with a real-world accident using HFACS. tor statements. Given the original accident reports were not coded
Day two started with a comprehensive review of the HFACS sys- using HFACS, a scoring key of correct answers was created for each
tem including a question and answer period. This was followed by of the 95 causal statements. Correct answers were determined a
several hands-on group exercises coding human causal factors priori by consensus among the HFACS expert research team.
within the HFACS framework. Also included were additional case An online survey platform was used as the primary data-
studies to further clarify the HFACS framework and its use in acci- gathering instrument in this study so that the evaluation could
dent/incident investigation. The training concluded with a large be conducted away from the original training facility for purposes
group discussion of any issues or areas in need of clarification. of the two-week follow-up assessment. In the form, each of the 95
causal factors was presented one at a time. The causal factor was
2.3. Coding assessment formatted as a statement, followed by the 19 possible HFACS cau-
sal categories. The participants were required to read and identify
Five causal factors associated with each of the 19 HFACS causal each causal factor, and then attribute it to the causal category that
categories (n = 95) were extracted from actual accident reports best described it by checking the appropriate box. Participants
396 A. Ergai et al. / Safety Science 82 (2016) 393–398

were not provided feedback on their performance nor were they Table 2
provided a copy of their exam. Overall a for each HFACS tier and category; inter-rater reliability.

HFACS tier HFACS category a


2.4. Procedures Unsafe acts 0.82
Skill based error 0.56
At the conclusion of the second day of HFACS training, partici- Decision error 0.46
Perceptual error 0.72
pants were given the opportunity to voluntarily participate in the Routine violation 0.76
study. Workshop attendees were provided a description of the Exceptional violation 0.63
assessment procedure and completed a consent form if they were Preconditions for unsafe 0.80
willing to participate. The study required participants to fill out acts
two assessments: the first given at the conclusion of the second Physical environment 0.82
day of training, the second given 14 days later. During the first ses- Technological environment 0.65
Adverse mental state 0.68
sion, the 95 causal statements were presented via the Internet and
Adverse physiological state 0.63
participants completed the coding on their own personal laptop Physical/mental limitations 0.73
computer or tablet. Two weeks later, participants were emailed a Communication coordination & 0.78
link to the online form and the same 95 casual factors were ran- planning
domly presented again for coding. There was no time limit for Fitness for duty 0.73

completing this coding process and participants were allowed to Unsafe supervision 0.73
use notes and reference materials to help complete the task. This Inadequate supervision 0.51
Planned inappropriate operations 0.49
process was used to ensure that the assessment was not simply a
Failed to correct a known problem 0.82
memory test but rather mimicked the coding process commonly Supervisory violation 0.53
utilized in the real-world.
Organizational influences 0.80
Resource management 0.62
Organizational climate 0.80
3. Results
Organizational process 0.69

Data from 125 participants who completed the first coding


activity, were collected and used for the inter-rater reliability anal-
ysis. Of these 125 participants, 59 (47%) completed the activity two
weeks following the training. Therefore, data from these 59 partic- Casual factor 64 was ‘‘The captain chose to continue fishing despite
ipants were included in the intra-rater reliability analysis. Krippen- the severe weather predictions and the exposed location of the
dorff’s Alpha (a robust version of Cohen’s Kappa applicable to both ship ‘‘Katmai”. The correct response would have been decision
inter- and intra-rater reliability) was used to analyze the data. error, however many participants viewed the captain as a supervi-
According to Krippendorff (2004) Alpha values of 0.80 are consid- sor and instead chose planned inappropriate operations or supervi-
ered reliable, values between 0.667 and 0.80 are considered tenta- sory violation. Casual factor 74 was ‘‘The chief engineer inspected
tively reliable and values less than 0.667 are considered unreliable. the construction site without using personal protective equip-
ment”. The correct choice was supervisory violation because the
actor was a chief engineer, however many selected exceptional vio-
3.1. Inter-rater reliability
lation because they viewed the individual as an employee and not
as a supervisor. Because factors like these were included, a number
The overall inter-rater reliability for both the HFACS tier and the
of people classified them differently, which could explain why
causal category levels were computed and are presented in Table 2.
there were such low levels of reliability for some of the categories.
When calculated across the four tier levels (unsafe acts,
preconditions for unsafe acts, unsafe supervision and organizational
influences), a value of a = 0.79 was obtained for Krippendorff’s 3.2. Intra-rater reliability analysis
Alpha. When calculated across the 19 categories, the Krippendorff’s
Alpha value was somewhat lower (a = 0.67) indicating that Of the original 125 participants who took the survey on the first
coders were less likely to agree on the particular category a specific day, 59 (47%) participants returned the survey two weeks later.
causal factor belonged within each level. Intra-rater reliability was determined by calculating Krippendorff’s
Krippendorff’s Alpha values at the causal category level were Alpha individually for each of the 59 participants who participated
lower than at the tier level (tier: a = 0.73–0.82; category: in both sessions. The intra-rater agreement results of these mea-
a = 0.46–0.82). The highest reliabilities were physical environment sures were tabulated for each participant. Krippendorff’s Alpha cal-
(a = 0.82) and organizational climate (a = 0.80). Skill based error culated across all 59 participants at the HFACS tier level and at the
(a = 0.56), decision error (a = 0.46), inadequate supervision HFACS category level and are presented in Table 3.
(a = 0.51), planned inappropriate operations (a = 0.49) and supervi- The overall average intra-rater agreement values were slightly
sory violation (a = 0.53) resulted in the lowest reliability levels. yet consistently higher than those observed for inter-rater reliabil-
Upon closer inspection of the 95 causal factors the analysis ity. However, a similar pattern emerged; Krippendorff’s Alpha val-
identified six in which less than 50% of the coders properly identi- ues were generally higher at the tier level in comparison with the
fied the causal category. Many of these cases included compound HFACS category levels (tiers range a = 0.83–0.88; category range
causal factors, which could explain why inter-rater reliability a = 0.57–0.89). With the exception of decision error (a = 0.57)
was particularly low for certain categories. Causal factor 92, for intra-rater reliability was found to be above a = 0.62. Note also that
example was: ‘‘The electrical operator got distracted by an external because the same causal factors were used in both sessions, those
noise and forgot to take readings on the main transformers”. Most six causal factors that had low reliability in the first session subse-
participants classified the code as skill based error, however many quently had low reliability in the second session, and may account
coders selected physical environment because of the term ‘‘external for the lower intra-rater reliability seen among decision errors and
noise” or adverse mental state because of the word ‘‘distracted”. supervisory violations.
A. Ergai et al. / Safety Science 82 (2016) 393–398 397

Table 3 sense. However, when assessing reliability on a larger scale, a lar-


Average a at the tier and category: intra-rater reliability. ger number of causal factors and participants seem warranted.
HFACS tier HFACS category Average It is also notable that the training involved in the current study
a was more robust than most other studies reviewed. Specifically,
Unsafe acts 0.88 two full days of didactic, hands-on training using HFACS was
Skill based error 0.66 provided by trainers with demonstrated expertise in HFACS. Others
Decision error 0.57 have reported less rigorous training when assessing HFACS
Perceptual error 0.82
Routine violation 0.82
reliability including simply a ‘‘brief overview” of HFACS (Baysari
Exceptional violation 0.75 et al., 2009).
Preconditions for unsafe 0.87
Finally, the present study was able to demonstrate both inter-
acts and intra-rater reliability using a conservative measure of agree-
Physical environment 0.87 ment (Krippendorff’s Alpha) that takes into account participant
Technological environment 0.75 agreement by chance alone. Unfortunately, many authors will rely
Adverse mental state 0.78
on the more liberal measure of inter-rater reliability, percent
Adverse physiological state 0.68
Physical/mental limitations 0.82 agreement, that simply calculates the number of agreements rela-
Communication coordination and 0.83 tive to the total number of classifications. As a result, those
planning instances where two coders agree simply by chance are included
Fitness for duty 0.73 and artifactually elevate reliability. By using Krippendorf’s Alpha,
Unsafe supervision 0.83 we have chosen a more conservative, and perhaps more accurate,
Inadequate supervision 0.66 measure of reliability.
Planned inappropriate operations 0.64
Failed to correct a known problem 0.85
Although the data presented here suggest that HFACS is a
Supervisory violation 0.62 reliable framework for classifying human factors associated with
Organizational influences 0.87
accidents and incidents, inter- and intra-rater reliability was not
Resource management 0.75 maximized. Clearly there are opportunities for improving the reli-
Organizational climate 0.89 ability of the HFACS coding process including: improving training
Organizational process 0.80 quality, further clarifying causal factors, and coder selection to
name a few.
As past research has found, training is a significant factor affect-
4. Discussion ing the intra-rater and inter-rater reliability, with levels being
found to be higher for trained coders than for the untrained ones,
In this study, HFACS was found to be generally reliable at both especially for intra-rater reliability (Weigle, 1998). More impor-
the tier and category levels. That is, the coders generally agreed tantly, it has been found that reliability levels can be improved if
upon which level of the HFACS hierarchy a causal factor belonged coders receive classroom training (Shohamy et al., 1992). Teaching
and were consistent upon retesting. Albeit, they had a slightly coders how to code is a fundamental component of any HFACS
more difficult time agreeing upon the particular category a specific training; although this was addressed in the training aligned with
causal factor belonged within each level. this study, some coders still had issues with how to code. For
The general reduction of reliability levels within HFACS from example, some participants could not distinguish between certain
the tier level to the category level is not uncommon (e.g. Olsen, categories belonging to the same tier, perhaps implying training
2011). Research in other domains has shown that as the number weaknesses and need for improvement.
of categories increases, reliability decreases. For example, Gwent Future training might also incorporate the use of a classification
(2010) demonstrated through a Monte-Carlo experiment, reliabil- tool or flowchart, with the intention of increasing the reliability of
ity decreases as the number of categories increases, while the HFACS. This flowchart could assist coders in the process of cor-
number of causal factors is kept constant. rectly classifying the mishap/accident causal factor into the appro-
What separates this study from others that have examined priate HFACS causal category. The flowchart could begin by asking
HFACS reliability is that this study used the original HFACS the coder sequential questions until he/she accurately identifies
methodology, a larger sample of participants and causal factors, the tier to which the causal factor belongs; then for each tier this
extensive coder training, and the use of traditional metrics process is repeated with an additional set of questions until the
designed for the assessment of both inter- and intra-rater reliabil- correct HFACS causal category is determined. In this digital and
ity. For example, the current study employed the original HFACS distance education age, this tool could be adopted as an online tool
framework as presented by Wiegmann and Shappell (2003), while and/or an application. One of the advantages of this online tool is it
many others have used derivatives of the original model such as can function as a refresher training, enhancing the coding process
DoD-HFACS or HFACS-ADF, that jeopardize the integrity and relia- when needed.
bility of the original model. It seems somewhat unreasonable to The causal factors used in this study were extracted from real-
assess the reliability of HFACS in general based on modified frame- world accident reports. These causal factors therefore were not
works whose derivations may be based more on personal prefer- written for subsequent HFACS coding. Consequently, some of these
ence rather than psychological theory. causal factors statements may have been ambiguous or may have
It’s also important to note that the current study used 95 unique contained multiple causal factors within the same statement.
causal factors for classification within the HFACS framework and a While care was taken to ensure the intelligibility of statements
relatively large sample size of 125 participants. In contrast, other by making sure they were grammatically correct, statements were
studies have varied considerably in the number of causal factors not modified to incorporate HFACS nomenclature to make them
and participants employed. Indeed, some have used as few as more easily classifiable. This was done in order to test the reliabil-
two coders while others have used a relatively small number of ity of the HFACS framework in an ecologically valid context. Pre-
causal factors (see Cohen et al., 2015, Table III). In cases where sumably, reliability values would be highest when using causal
the actual distribution of causal factors within HFACS is the focus factors from reports that actually employed the HFACS framework
of a given study rather than inter-rater reliability, this might make as the original coding scheme.
398 A. Ergai et al. / Safety Science 82 (2016) 393–398

Previous studies related to coding and classifying causal factors prevention plans and strategies. Once a company has classified its
have recruited participants from industries that match those of the accident and near miss cases using the HFACS taxonomy, it can
causal factors (e.g. using pilots to classify causal factors from avia- analyze these data searching for trends that point to weaknesses
tion accidents). For this study, participants from a variety of indus- in certain areas of the system. In addition, conducting association
tries were used to code causal factors from multiple industries; analysis among HFACS categories can help identify additional areas
therefore the background of the coders did not necessarily match for improvement. Information of this nature not only provides the
the industrial accidents from which the causal factors were safety professional with supplementary knowledge to guide lim-
extracted. Although the authors were careful to re-write any state- ited resources toward a more focused intervention, but also offers
ments that may have been grammatically incorrect, it is possible benefits to employee health through lowering frequency and
that the nomenclature used in the description of the causal factors, severity of work accidents, all of which have a positive impact on
may have caused confusion for those individuals outside of the cost and well-being at work.
industry in which the causal factor came. This may explain why
some of the reliability levels may not have been as high as References
expected. Future research should focus on using a large number
of coders and casual factors so that participants can be matched Baysari, M.T., Caponecchia, C., McIntosh, A.S., Wilson, J.R., 2009. Classification of
errors contributing to rail incidents and accidents: a comparison of two human
to their individual domain of expertise. error identification techniques. Saf. Sci. 47, 948–957.
Carmines, E., Zeller, R., 1979. Reliability and Validity Assessment. Sage Publications,
Thousands Oaks, CA.
5. Limitations Chen, S., Wall, A., Davies, P., Yang, Z., Wang, J., Chou, Y., 2013. A human and
organizational factors (HOFs) analysis method for marine casualties using
HFACS-Maritime Accidents (HFACS-MA). Saf. Sci. 60, 105–114.
One limitation of this study involves the number of participants
Cohen, T., Wiegmann, D., Shappell, S., 2015. Evaluating the reliability of the human
who failed to complete the second assessment. Of the 125 partici- factors analysis and classification system. Aviat. Space Environ. Med. 86 (8),
pants who completed the first coding activity, only 59 (47%) com- 728–735.
ElBardissi, A.W., Wiegmann, D.A., Dearani, J.A., Daley, R.C., Sundt, T.M., 2007.
pleted the second assessment. Because of this drop-out rate, while
Application of the human factors analysis and classification system
the sample size for the analysis of the intra-rater reliability was methodology to the cardiovascular surgery operating room. Soc. Thoracic
still large, it was significantly smaller than that of the inter-rater Surg. 83, 1412–1419.
reliability. Because participation was voluntary, unfortunately Gwent, Kilem, 2010. Handbook of Inter-rater Reliability, (2) Advanced Analytics LLC.
Krippendorff, K., 2004. Content Analysis: An Introduction to Its Methodology. Sage
those individuals who chose to participate in the second coding Publications, Incorporated.
activity may have been more inherently interested in the HFACS. Li, W.C., Harris, D., 2006. Pilot error and its relationship with higher organizational
These individuals also may have been more attentive and consci- levels: HFACS analysis of 523 accidents. Aviat. Space Environ. Med. 77 (10),
1056–1061.
entious during both the training and the coding process. The nat- O’Connor, P., 2008. HFACS with an additional layer of granularity: validity and
ure of this study also precludes knowing about the activities that utility in accident analysis. Aviat. Space Environ. Med. 79 (6), 599–606.
these participants engaged in during the two weeks between the O’Connor, P., Walliser, Philips E., 2010. Evaluation of a human factors analysis and
classification system used by trained raters. Aviat. Space Environ. Med. 81 (10),
first and second assessment. Some participants may have intermit- 957–960.
tently reviewed what they learned during training or actually O’Connor, P., Walker, P., 2011. Evaluation of a human factors analysis and
applied HFACS as part of their work assignments. All of these fac- classification system as used by simulated mishap boards. Aviat. Space
Environ. Med. 82 (1), 44–48.
tors could account for the slightly higher levels of intra-rater reli- Olson, N.S., Shorrock, S.T., 2010. Evaluation of the HFACS-ADF safety classification
ability levels compared to inter-rater reliability levels. In any system: inter-coder consensus and intra-coder consistency. Accid. Anal. Prev.
case, however, trends in reliability levels between intra- and 42, 437–444.
Olsen, N.S., 2011. Coding ATC incident data using HFACS: inter-coder consensus. Saf.
inter-reliability were generally similar.
Sci. 49 (10), 1365–1370.
Patterson, J.M., Shappell, S.A., 2010. Operator error and system efficiencies: analysis
of 508 mining incidents and accidents from Queensland, Australia using HFACS.
6. Conclusions Accid. Anal. Prev. 42, 1379–1385.
Reason, James, 1990. Human Error. Cambridge University Press, New York.
This study furthers the research on the utility of HFACS as an Reinach, S., Viale, A., 2006. Application of a human error framework to conduct train
accident/incident investigations. Accid. Anal. Prev. 38, 396–406.
accident casual factors classification tool with the goal of investi- Shappell, S., Detwiler, C., Holcomb, K., Hackworth, C., Boquet, A., Wiegmann, D.,
gating both inter-rater and intra-rater reliability of the HFACS cod- 2007. Human error and commercial aviation accidents: a comprehensive, fine-
ing process. This study supplements past research by using a large grained analysis using HFACS. Hum. Factors 49, 227–242.
Shohamy, E., Gordon, C., Kraemer, R., 1992. The effect of rater’s background and
sample size of coders and causal factors from actual incidents/ training on the reliability of direct writing tests. Modern Language J. 76 (1),
accidents from a variety of industries. Results support the inter- 27–33.
and intra-rater reliability of the HFACS framework but also suggest Weigle, S.C., 1998. Using FACETS to model rater training effects. Language Test. 15
(2), 263–287.
additional avenues of continued improvement.
Wiegmann, D.A., Shappell, S.A., 2003. A Human Error Approach to Aviation Accident
Reliable HFACS data is essential for empirical research on safety Analysis: the Human Factors Analysis and Classification System. Ashgate,
systems and on the effectiveness of any mitigation and/or accident Burlington, VT.

You might also like