Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Safety Science 83 (2016) 59–73

Contents lists available at ScienceDirect

Safety Science
journal homepage: www.elsevier.com/locate/ssci

Evaluation of a safety culture intervention for Union Pacific shows


improved safety and safety culture
Michael Zuschlag a,⇑, Joyce M. Ranney a, Michael Coplen b
a
Office of Safety Management and Human Factors, US Department of Transportation John A. Volpe National Transportation Systems Center, 55 Broadway, Cambridge, MA 02142, USA
b
FRA Human Factors Division, Office of Research, Development, and Technology, 1200 New Jersey Avenue, SE, Mail Stop 20, Washington, DC 20590, USA

a r t i c l e i n f o a b s t r a c t

Article history: The Federal Railroad Administration (FRA) sponsored a multiyear pilot demonstration of Clear Signal for
Received 1 February 2013 Action (CSA), a safety culture intervention implemented with Behavioral Science Technology Inc., at a
Received in revised form 26 August 2015 Union Pacific (UP) service unit. CSA combines peer-to-peer feedback, continuous improvement, and
Accepted 1 October 2015
safety-leadership development. The US Department of Transportation John A. Volpe National
Transportation Systems Center conducted an independent program evaluation of the pilot, using qualita-
tive and quantitative measures. The evaluation found that, over two years, the site experienced signifi-
Keywords:
cant improvements in safety outcomes, operations, and safety culture, including an 80% drop in at-risk
Peer-to-peer feedback
Railroad
behaviors, a 79% decrease in engineer decertification rates, a 81% decline in the rate derailments and
Changing At-Risk Behavior (CAB) other incidents, and better labor–management relations. Comparison locations showed no improvements
Continuous improvement in the decertifications or derailments. The success of the pilot, in addition to successes UP had earlier with
Safety culture CSA-type processes, encouraged UP to expand these processes throughout its transportation department.
Safety leadership The success of this pilot and other similar pilots led to the development and adoption of the FRA’s
Railroad Safety Risk Reduction Program in the Rail Safety Improvement Act of 2008, and the implemen-
tation of similar safety-culture programs by other carriers.
Ó 2015 Elsevier Ltd. All rights reserved.

1. Introduction counteract systemic upstream causes of accidents and injuries


(Reason, 1997, 2003; DeJoy, 2005). However, railroad culture has
1.1. Background several characteristics that limit such trust, especially between
labor and management (Coplen, 1999). The culture has a
Despite continued efforts by the Federal Railroad Administra- command-and-control management style, reactive tendencies,
tion (FRA), management, and unions, safety systems in the railroad and inclinations to inflict punishment for accidents and injuries,
industry have stagnated (Coplen, 1999). The current safety systems inclinations that arise from a rule-and-discipline approach to
are imbedded in the industry’s ‘‘safety culture.” In this article safety (Gamst, 1982) and litigious incentives. In practicing com-
safety culture is defined as the factors that determine an organiza- mand and control, managers tend not to elicit input (including
tion’s (labor and management) commitment, style, and proficiency safety-related information) from workers but instead to issue
in ensuring safety that result from safety-related beliefs, values, orders. Reactive tendencies discourage proactively collecting infor-
attitudes, competencies, and behavioral patterns (Reason, 1997, mation on conditions or trends that may lead to accidents or inju-
2003). The railroad safety systems have progressively elevated ries. Instead, labor and management react to each injury or
safety performance until 1986, when improvements stalled (FRA, accident as a separate incident. Traditionally, management tends
2001, 2008). Therefore, it is possible that the safety culture may to blame accidents and injuries on rules violations, while labor
be limiting further safety improvements (Ranney and Nelson, tends to blame workplace deficiencies or management pressure
2003). for productivity. Managers characteristically respond to injuries
The key aspect of an effective safety culture is a ‘‘trust culture,” and accidents by disciplining workers, including firing them, for
where the organization’s members trust each other (Reason, 2003). safety rule infractions. Injured workers often sue the company
This trust is necessary to open rich communication on safety for financial compensation under the 1908 Federal Employers Lia-
issues, allowing an organization to identify and ultimately bility Act (FELA). Fear of discipline and lawsuits breeds distrust and
chills cooperation and communication between workers and man-
⇑ Corresponding author. Tel.: +1 617 494 3250. agers, stifling the sharing of safety information.
E-mail address: michael.zuschlag@dot.gov (M. Zuschlag).

http://dx.doi.org/10.1016/j.ssci.2015.10.001
0925-7535/Ó 2015 Elsevier Ltd. All rights reserved.
60 M. Zuschlag et al. / Safety Science 83 (2016) 59–73

To improve safety, the FRA Human Factors Division is exploring sometimes several managers, develops a checklist of safe and at-
new approaches that counteract these cultural tendencies. These risk worker behaviors and working conditions based on analyses
approaches achieve this by incorporating the following features, of injury reports and other sources of safety information (Krause,
which are characteristic of a positive safety culture (Reason, 1997). The steering committee recruits, trains, and coaches work-
1997; Phimister et al., 2004): ers to be ‘‘peer observers,” who observe the safety of their cowork-
ers (overtly, with their permission) then conduct with them
 Nondisciplinary: Seeking to improve safety without punishment anonymous nonconfrontational feedback sessions devoid of any
or blame through protective elements such as worker disciplinary connections. The feedback includes both acknowledg-
anonymity. ing any observed safe behavior, and discussing any observed at-
 Proactive: Collecting data on at-risk behaviors and conditions to risk behavior. By focusing on the behavioral and conditional ante-
prevent associated accidents or injuries before they occur, and cedents of accidents, CSA seeks to proactively prevent accidents
thus reduce the incentives for workers and managers to blame before they occur.
each other.
 Systems-safety-analysis orientation: Gathering and using rich 1.2.1.2. Continuous improvement (CI). Within the CI component,
objective data to identify underlying organizational factors in workers are trained to interview their coworkers during the feed-
safety. back sessions about the coworkers’ explanations for any observed
 Cooperative: Engaging stakeholders within both management at-risk behaviors or conditions. Thus, a feedback session has feed-
and labor. back from the observed peer to the observer, in addition to from
 Sustainable: Including mechanisms for long-term sustainment. observer to peer. The observing worker records on the checklist
all data on the behaviors and the explanations. The steering com-
These features improve safety by creating an environment mittee aggregates and objectively analyzes these data through
where individuals freely exchange information upward, down- root-cause problem-solving to identify the systemic causes for bar-
ward, and laterally across the organizational hierarchy, providing riers to enhancing safety. Potential systemic causes include organi-
the open communication necessary to solve safety problems. zational policy, training, tool design, environmental conditions,
This paper presents an evaluation of one such approach, the procedures, and cultural aspects. The steering committee executes
FRA’s Clear Signal for Action (CSA) applied to a transportation corrective actions against barriers that it can remove, for example,
department. With funding and sponsorship from the FRA, Behav- through feedback to workers during PPF sessions. If the barriers
ioral Science Technology Inc. (BST) actively designed and imple- require actions beyond the authority of the steering committee,
mented the demonstration pilot. The US Department of such as new equipment purchase or procedures changes, a joint
Transportation John A. Volpe Center, also with sponsorship from labor–management barrier removal team reviews and prioritizes
the FRA, independently conducted a formative and summative the barriers, then develops corrective actions, which management
evaluation (Rossi et al., 1999). This article presents a summary of executes. Data-gathering continues after a corrective action is
the summative evaluation. deployed, to allow its effectiveness to be evaluated.

1.2. Clear Signal for Action (CSA) 1.2.1.3. Safety-leadership development (SLD). Within the SLD com-
ponent, managers are trained in effective nondisciplinary, proac-
1.2.1. CSA implementation tive techniques for enabling employees to work safely, including
CSA integrates three approaches that have been applied previ- but not limited to supporting safety-related activities such as feed-
ously to improve safety proactively: back sessions and barrier removal. These SLD processes are con-
ducted parallel to existing disciplinary processes. SLD activities
 Peer-to-peer feedback (PPF), where workers observe each other, are not a substitute for addressing rules violations.
and exchange feedback about the safety of their behavior, con-
ditions, and organizational factors (Geller, 2001; Krause, 1995). 1.2.2. Integration of behavioral and safety culture approaches
 Continuous improvement (CI), where workers and managers When used alone, PPF approaches have often placed too little
cooperatively gather and analyze data to identify systemic emphasis on the influence of upstream managers, systems, and
causes of observed at-risk behaviors and conditions, and then policies and procedures on at-risk behavior and conditions, result-
implement corrective actions to address the causes ing in negative reviews from several unions (Spigener and Hodson,
(Harrington, 1987; Juran, 1964; Krause, 1995). 1997; Howe, 1999; Frederick and Lessin, 2000). Thus, recent vari-
 Safety-leadership development (SLD), where managers are ants of PPF, such as CSA, have acknowledged that behavioral-
trained to promote proactive safety practices such as PPF and oriented safety interventions can complement culture-oriented
CI (Krause et al., 1999). safety interventions such as CI and SLD. (DeJoy, 2005). These new
variants integrate PPF with CI, utilizing the peer-to-peer sessions
Fig. 1 illustrates a combined theory of action and theory of as opportunities to collect the data needed by CI. SLD encourages
change (Funnel and Rogers, 2011) for CSA. managers to implement corrective actions that need management
Detailed CSA activities are listed in the box headed Implementa- support and otherwise targets ‘‘latent” factors in accidents and
tion, and their theoretical outcomes are depicted in the two columns injuries that are further back in the chain of causation, such as
of boxes designated Proximal Outcomes and Distal Outcomes. Prox- safety climate and culture (Reason, 1997). SLD trains organiza-
imal outcomes result directly from implementation activities, while tional leadership to eliminate these causes since it has the
distal outcomes are mediated by proximal outcomes. The arrows resources and authority to alter the direction of the organization.
indicate the effects of prior activities and outcomes on subsequent It also trains managers to provide the necessary resources and
ones, with influence moving primarily in a left-to-right direction. the integration of CSA into other safety programs so it becomes
Within the Implementation box, activities are grouped according institutionalized. SLD can therefore accelerate changes initiated
to their primary association with PPF, CI, or SLD. by PPF and CI and make them lasting characteristics of the organi-
zation’s safety culture.
1.2.1.1. Peer-to-peer feedback (PPF). To initiate the PPF component, By combining PPF, CI, and SLD, responsibility for safety is
a local CSA process steering committee, composed of workers and distributed among workers and managers. PPF activities are
M. Zuschlag et al. / Safety Science 83 (2016) 59–73 61

Fig. 1. Theory of action for CSA, showing activities and theoretical outcomes (PPE: personal protective equipment).

predominantly or fully workers’ responsibility since peers make  Safety culture: The factors that determine an organization’s
observations and provide feedback. Because SLD changes organiza- (labor and management) commitment, style, and proficiency
tional leadership, it is fully managers’ responsibility. CI is a joint in ensuring safety that result from safety-related beliefs, values,
responsibility, but because management is accountable for ensur- attitudes, competencies, and behavioral patterns (Reason, 1997,
ing that the systemic corrective action process is effective, man- 2003). This evaluation is specifically concerned with attitudes
agers have more responsibility than workers (Deming, 2000; and behavioral patterns related to labor–management trust
Walton, 1986). Involving both parties in safety helps to embed a (Reason, 1997, 2003), the lack of which has theoretically limited
safety process into the culture by giving each party a stake in its safety advances in the railroad industry (Coplen, 1999).
successful outcome. In contrast, when a safety process is solely  Occurrences: Safety events associated with injuries; fatalities;
the responsibility of either labor or management, it will tend to damage to equipment, for example, from derailments; and asso-
have a more limited impact on the non-responsible party. CSA is ciated close calls, for example, safety-rule violations such as
thus an integration of behavioral-oriented and culture-oriented engineer decertifications.
safety interventions (DeJoy, 2005). PPF improves safety and culture
from the bottom up via the activities of workers, SLD improves As depicted by the arrows in Fig. 1, the implementation directly
safety and culture from the top down via the activities of man- promotes the targeted safety practices of both managers and work-
agers, and CI represents the meeting of the top-down and ers through its SLD and PPF components, respectively. Workers’
bottom-up processes, where workers and managers share informa- practices change as a result of both the feedback sessions and
tion on safety problems and solutions. the training required to perform them (Ranney et al., 2010). The
CI process directly improves behavioral and systemic conditions
1.2.3. Theoretical outcomes on safety performance and culture identified in data analysis.
The theory of action for CSA shown in Fig. 1 specifies the out- There are reciprocal effects between specific practices and
comes on the following: safety culture. Changes in safety attitudes encourage workers and
managers to change their practices. Conversely, changes in prac-
 Worker practices: The at-risk and safe behaviors identified in the tices may change attitudes (Krause, 1995). Safer practices also
PPF checklist and targeted by feedback sessions. Examples improve the culture by building better labor–manager relations.
include walking with eyes on the path and following crew- As managers see workers encouraging each other to work more
safety communication procedures. safely, their trust that workers will perform tasks safely increases
 Systemic conditions: Aspects of the physical and organizational and they perceive less of a need for discipline. With managers
environment that impact safety, such as facilities, tools, train- engaging in more proactive, non-disciplinary approaches to safety,
ing, and disciplinary policies. Examples include switches that workers are assured of management’s commitment to their safety.
are in good operating condition, availability of proper tools As discipline becomes less frequent and management’s commit-
and personal protective equipment (PPE), and training in safe ment to safety becomes more apparent, workers trust managers
worker practices such as the proper way to align a coupler. more.
 Management practices: On-the-job manager behaviors that take Improved systemic conditions also improve labor–management
the form of proactive, nondisciplinary promotion of safe behav- relations within the safety culture by demonstrating manage-
iors, conditions, and processes. Examples include coaching ment’s commitment to safety. Improved labor–management rela-
employees on safe behavior and encouraging participation in tions can reciprocally improve systemic conditions by fostering
safety processes. These practices are in addition to implementa- labor–management and cross-trade cooperation in improving
tion activities such as consistently providing the necessary bud- safety conditions.
get, equipment (e.g., computers for data analysis), and physical Safer practices and improved systemic conditions reduce occur-
space for training and operations. rences, as does the safety culture through increased dialogue about
62 M. Zuschlag et al. / Safety Science 83 (2016) 59–73

safety. Collectively, these changes result in improved worker and The newer workers were somewhat less distrustful of
manager safety practices beyond those specifically targeted by management.
implementation.
In summary, CSA improves safety culture simultaneously with
1.3.2. CAB structure and initiation
safety performance through ‘‘boot-strapping” reciprocal causation.
CAB leadership and day-to-day operations were executed by
two full-time ‘‘facilitators”: an engineer from BLET and a conductor
from UTU. The steering committee comprised an additional eight
1.3. CAB: An implementation of CSA
workers, who met approximately once a month. A newly arrived
superintendent (head of the service unit) supported the implemen-
1.3.1. Context and motivations
tation, in part due to seeing CSA-like Total Safety Culture success-
Between 2005 and 2008, the FRA sponsored a CSA demonstra-
fully implemented in a mechanical department in the UP Central
tion pilot in the transportation department on the San Antonio Ser-
Region. The superintendent appointed a senior service-unit man-
vice Unit of the Southern Region of Union Pacific Railroad (UP).
ager to serve part-time as the local management sponsor for
Transportation departments in railroads comprises road opera-
CAB, acting as chief liaison between the steering committee and
tions, which are driving trains long distances on main line track
management.
between terminals, and switching operations, which are sorting
BST’s model of CSA includes customizing it to fit with local con-
cars in yards. Behavioral Science Technology Inc. (BST), a company
texts. For example, the stakeholders at San Antonio chose not to
that has implemented CSA-like programs in a broad range of indus-
focus at first on traditional industrial safety threats, such tripping
tries, designed and instructed the implementation of the San Anto-
hazards and pinch points. Instead, CAB initially focused on behav-
nio CSA process, which local stakeholders named Changing At-Risk
iors to improve alertness and teamwork for locomotive cab opera-
Behavior (CAB).
tions on the road. Its focus was limited to practices related to high-
Like all the service units of UP, the San Antonio Service Unit
workload situations, such as operating under constraining signals,
comprised hundreds of transportation workers, most working out
a condition that UP calls Cab Red Zone (CRZ). Fourteen months
of a central hub (in this case, the city of San Antonio), with others
after its origination, CAB expanded to include safety in yard-
working out of peripheral terminals up to 150 miles away (e.g.,
switching operations, using a different checklist of behaviors. In
Laredo and Del Rio). CAB covered approximately 1100 workers;
the current paper, the implementations are distinguished as
the size of the workforce remained unchanged throughout the
CAB–CRZ and CAB-Switching. The evaluation of CAB thus com-
evaluation period, and turnover was low.
prised two phases: a first phase, from August 2005 through
During the CAB implementation, UP and the FRA started a sim-
September 2006, which included only CAB–CRZ, and a second
ilar CSA demonstration pilot on the Livonia Service Unit (Coplen
phase, from October 2006 to about January 2008, which included
and Ranney, 2009) and UP, acting independently from the FRA, ini-
CAB-Switching and the continuation of CAB–CRZ. The evaluation
tiated a PPF process on the Houston Service Unit. BST provided
also included a four-year baseline period prior to the first phase,
consultation services for Livonia, and, to a lesser extent, Houston.
for which data indicated the site’s state prior to CSA
Livonia and Houston were also in the Southern Region, although
implementation.
their safety processes were limited to switching operations, while
The CAB process began in August 2005 with the initiation of
CAB covered both switching and road operations.
regular peer-to-peer feedback sessions. Often the employees
UP was interested in partnering with FRA with these demon-
observed their peers while engaged in their normal work (e.g., a
stration pilots because they had been working with the three ele-
conductor may observe an engineer across the cab of a locomo-
ments of CSA – PPF, CI, and SLD – since 1988. In 1993, UP had
tive). However, many employees also dedicated entire days to con-
established a successful CSA-type program entitled Total Safety
duct only PPF sessions (e.g., ride in several different cabs per day,
Culture (Geller, 2001) in its mechanical department. After seeing
observing both engineer and conductor). For each observation,
it succeed there, UP wanted to see if it could work in transportation
the observer and peer completed the feedback exchange at a safe
where most accidents occur. UP was particularly focused on the
and convenient break in the work (e.g., while the train waited at
Southern Region because a series of high-profile accidents there
a siding for another train to pass). Employees would choose to
had increased FRA scrutiny of the region’s safety. The severity of
observe primarily based on available opportunities, especially for
these particular accidents may have justified the scrutiny, but
CAB–CRZ. CRZ conditions may or may not occur during a trip,
the Southern Region’s levels of safety through that time were actu-
and the ability of an employee to observe the crew of a particular
ally steady or improving (Zuschlag et al., 2012), and UP’s overall
train depended on the ability to arrange rendezvouses with the
injury rate was lower than most US railroads (FRA, 2006).
train. However, especially for CAB-Switching, the steering commit-
Local managers and workers, along with their unions, the
tee also chose to focus PFF sessions on locations where they felt it
Brotherhood of Locomotive Engineers and Trainmen (BLET) and
was needed most. The criteria for choosing locations did not
the United Transportation Union (UTU), worked together to imple-
change over the course of the evaluation period.
ment CAB. The recent serious accidents in the region raise both
manager and worker awareness of safety, and encouraged the
two parties to cooperate for improvements despite the historic dis- 1.3.3. CAB evolution and challenges
trust between them. Labor support for CAB was also abetted by an Approximately halfway through the evaluation period, the
influx of new workers throughout the corporation prior to the eval- superintendent was promoted and left the service unit. In UP, this
uation period. While average UP years-of-service for transportation tenure of one to two years is fairly typical for a railroad superinten-
workers at SASU was 16 years, the influx created a bimodal distri- dent, and was observed in other service units of the UP Southern
bution. One side of the distribution, comprising two-thirds of the Region. CAB facilitators briefed the newly appointed superinten-
workers, averaged 7 years of service, while the other side averaged dent on the process, and he continued to strongly support the
33 years of service. The worker’s age distribution, likewise, was implementation. At about the same time, the local management
bimodal, with one side averaging 35 years of age and the other sponsor was also promoted and left the service unit. His lieutenant,
averaging 53 years of age. The influx of relatively new workers who had previously earned credibility with the workers for his
facilitated support for CAB because older workers had developed support of CAB among other safety work, became the new local
a deep distrust of the management over decades of employment. management sponsor.
M. Zuschlag et al. / Safety Science 83 (2016) 59–73 63

By the end of the evaluation period, over half the workforce had to more than four PPF sessions per worker, the total number of PPF
been trained to conduct PPF sessions, a rate somewhat greater than sessions remained below what had been planned at the start of the
expected at the start of implementation. The higher rate was implementation. However, by other measures, the implementation
achieved because the superintendent arranged that additional was very strong. For example, as related above, management was
training sessions be held when rail traffic was low and manpower motivated to make CSA successful and supplied the worker-hours
demands were reduced (e.g., in January of each year). Originally and other resources to support it. In addition workers were trained
the steering committee conducted training only at a terminal in faster than planned. These characteristics, combined with the com-
the city of San Antonio, but over the course of the evaluation it pletion of the program activities (e.g., data analysis and corrective
expanded training to the peripheral terminals. During this time, action execution), implied a strong implementation (Egan et al.,
three of the original steering committee members were systemat- 2009).
ically replaced by other workers, consistent with the BST consul- Trust between management and labor at the site was problem-
tant’s advice for terms limits and rotation. Two of the new atic, a condition that augurs poorly for safety process success
members were from the outlying terminals to provide balanced (Reason, 1997). However, the parties were inclined to cooperate
representation. more than normal for a railroad due to joint concern over the
To support CAB across the service unit, the superintendent set a recent serious accidents. The distrust was also counteracted by
fixed monthly ‘‘budget” of worker-hours for CAB training, PPF ses- the role behaviors of the leadership (Pedersen et al., 2012). Both
sions, analysis, and other CAB activities. In an unusual move for a the superintendent and regional vice president resisted the rail-
railroad, the superintendent delegated responsibility for allocating road inclination to exert managerial control over the process,
the budget among CAB activities to the facilitators demonstrating which would have likely led to worker distrust and disenchant-
commitment to the program and trust in the facilitators. Normally, ment. Instead, the leadership empowered the workers by giving
a manager (e.g., the management sponsor), rather than a worker, them control over the process and budget, encouraging them to
would receive such responsibility. Through the first year of the find their own solutions to problems. This concrete and visible
implementation, management was pleased with the way in which way of going above and beyond the normal procedures demon-
the facilitators allocated the budget. strated commitment to the program and a show of good faith
However, as the implementation expanded to peripheral ter- which likely invested the workers in the process, helped grow
minals and to switching operations in the second year of imple- the trust in the leadership.
mentation, the facilitators had increasing difficulty completing In summary, the implementation, while not perfect, was rea-
all activities. The rate of PPF sessions, a metric of implementation sonably strong and complete. Any lack of observable outcomes
strength, was falling far below the planned level. The facilitators for safety or safety culture cannot be attributed to implementation
requested a larger budget commiserate with the expansion of failure, and, conversely, any observed outcomes may be reasonably
CAB, but, due to external economic and organizational pressures, attributed to the CSA implementation (Egan et al., 2009).
the superintendent could barely maintain the existing budget.
The facilitators appealed to the superintendent’s superior, the
1.4. Evaluation question
transportation vice president for the region. The typical railroad
management response to such a crisis would be to exercise ‘‘com-
This paper is summative evaluation of a pilot demonstration of
mand and control,” and takeover budgeting responsibility from
CSA for outcomes for safety or safety culture, comprising a
the workers. However, the vice president instead issued the
‘‘bottom-line” assessment of the effectiveness of CSA at the site
facilitators a challenge: for every worker-day of CAB administra-
(Rossi et al., 1999). For this purpose, the evaluation question was,
tion the facilitators eliminated, the vice president would increase
‘‘What are the effects of the CSA process on safety and safety cul-
the budget by one-and-a-half worker-days. The steering
ture?” The evaluation looked for improvements in:
committee rose to the challenge and reduced administration
costs by 20%, earning a 30% budget increase. The combined
 Worker practices.
increase in efficiency and budget size allowed an increase
 Occurrences, specifically incidents of material damage and
in the rate of PPF sessions and the expansion to the entire
engineer decertifications.
service unit.
 Safety culture, specifically the presence of a ‘‘trust culture” in
labor–management relations.
1.3.4. Implementation strength
For the sake of relative brevity, this paper excludes a formal
Additional evaluation questions and the formative evaluation
evaluation of implementation, and instead focuses on the evalua-
are addressed in Zuschlag et al. (2012).
tion of the outcomes, especially the effects of CSA on safety and
safety culture. Details of the context, pre-existing mechanisms
(Pedersen et al., 2012), and the implementation evaluation are in 2. Evaluation design, procedures, and data analysis
a full government report on the demonstration pilot (Zuschlag
et al., 2012). Only a brief summary is provided herein. Randomized controlled trials (RCT) were infeasible for this eval-
The implementation evaluation, using quantitative and qualita- uation, as is often the case for studies concerning occupational
tive measures, found CAB to be sufficiently strong to proceed with safety programs (Pedersen et al., 2012). Instead, the evaluation
the outcome evaluation (Rossi et al., 1999). Management acted as used a mixed-methods design, with both quantitative and qualita-
‘‘safety climate engineers” (Simard and Marchand, 1994, 1997) tive methods (Creswell, 2003). A mixed-methods design has
providing the necessary resources for the process and enlisting advantages over RCT for assessing the effectiveness of organiza-
labor officials and workers to support the program. Worker sup- tional change efforts, such as safety culture interventions, which
port for the process grew over the evaluation period in part due are embedded in a complicated system. For example, CAB operated
to knowledge that the all-worker steering committee maintained in the context of complications such as customized program imple-
control of the feedback session data to ensure no negative reper- mentations and leadership turn-over. The evaluation period for
cussions for participating workers. CAB spanned approximately two and a half years (August 2005
At the end of the evaluation period, the workers had partici- to about January 2008), providing ample time for extraneous
pated in approximately 4800 PPF sessions. While this corresponds events. By combining qualitative and quantitative methods for
64 M. Zuschlag et al. / Safety Science 83 (2016) 59–73

multiple data sources, the mixed-method design can account for Having members of the same population rate the same videos
the impact of context and extraneous events that influence the allowed calculation of the consistency of the ratings, as shown in
ability of the program to produce outcomes. Specifically in this Table 1.
evaluation, we used qualitative methods to peer inside the ‘‘black Reliability is necessary in order for actual changes in practices to
box” of the program and directly observe the influence of the con- be statistically detectable with CAB feedback session data. Time
text on the mechanism comprising the program (Pawson and Tilly, drift represents a systematic tendency for the same practices to
1997; Pawson, 2002). In addition, when a detailed theory of change be rated as safer (positive drift) or more at-risk (negative bias) in
drives selection of the measures, the mixed-methods design rivals later than in earlier training. Retest drift represents a tendency for
randomized control trials in an ability to make plausible inferences the same practices to be rated as safer (positive) or more at-risk
of causation (GAO, 2009). This summative evaluation used a con- (negative) relative to how long ago an individual worker was
current nested research strategy (Creswell, 2003): the quantitative trained.
data were considered the primary measure for improvements in Complete ratings of training videos were collected from 108
safety or safety culture, while the qualitative data provided expla- peer observers at the end of training and from 13 peer observers
nations and insight into the processes associated with the quanti- during subsequent coaching. Reliability was significantly higher
tative results. Qualitative data also provided an opportunity to than 80% (83.06%, t(107) = 4.56, p < 0.0001); this is above the min-
capture unexpected outcomes. imal standard for inter-judge reliability that is often used in eval-
Qualitative measures of outcomes, utilizing a case-study uation research (Rossi et al., 1999). Time drift was not significant
methodology (Yin, 2009), included primarily open-ended inter- (r = 0.0312, n = 108, p = 0.749), indicating that peer observers
views with workers and managers and were used for explanatory trained at the beginning of the evaluation period rated the same
purposes. Quantitative measures included: sessions as equally at-risk as those trained at the end of this period.
However, the analysis for retest drift showed that peer observers
 PPF feedback session data. rated the videos as more at-risk during coaching than during train-
 Corporate safety data supplied by UP. ing (average difference, 7.50%; t(106) = 2.05; p < 0.0427), sug-
 Close-ended attitude and behavior surveys of workers and gesting that experienced peer observers became stricter over
managers. time. The proportion of experienced peer observers increased
throughout the evaluation period since observers trained in the
In addition to the above measures for the outcomes of CSA, the beginning stayed to conduct feedback sessions at the end. This sug-
evaluation included measures for an implementation evaluation gests the possibility that the true average at-risk scores may be
and the assessment of the role of initial conditions and events in lower than reported for later periods in the evaluation and that
establishing CSA at the site (as summarized above in Section 1.3 any improvement in practices that may be found would in fact
and detailed in Zuschlag et al., 2012). These measures included be greater than indicated.
field notes, process artifacts, and project records as additional qual- Overall, the feedback session ratings appeared to be adequate or
itative data sources for assessing the context and mechanisms. By perhaps even conservative for measuring positive changes in
using a theory of change, combining qualitative and quantitative practices.
methods, and performing an implementation evaluation to assess
if an implementation failure occurred, the method is thus consis- 2.2. Corporate safety data
tent with a realistic model of evaluation (Pedersen et al., 2012;
Pawson et al., 2005). UP provided corporate safety data from its own tracking sys-
tems to allow the calculation of changes in the rates of relevant
safety occurrences since the start of CAB. The number of occur-
2.1. Feedback session data rences in the dataset was too few and the information on them
too sparse to allow useful sub-categorization of the occurrences
In accordance with BST training and materials, the CAB steering in order to analyze for changes in the character of the occurrences.
committee collected the checklists completed during feedback ses- Interviews with managers revealed no evidence of changes in the
sions and scored the data on each checklist to record which behav- character of occurrences at any site.
iors were performed safely and which were at-risk. For purposes of
evaluation, the overall percentage of at-risk behaviors for the ser- 2.2.1. Engineer decertifications
vice unit for each month of the evaluation period provided an indi- The railroad industry defines an engineer decertification as an
cation of that month’s safety of worker practices targeted by CAB– occurrence in which a locomotive engineer loses FRA authorization
CRZ and CAB-Switching. Autocorrelations of the data (Cohen et al., to run trains due to a serious safety violation. In this evaluation,
2003) were not significant, indicating that the at-risk percentages decertifications served as a leading indicator of catastrophic road
were independent month-to-month, and that ordinary regression accidents, which can easily result from the associated safety
analyses, rather than a time-series analyses, were suitable. If CAB violations.
is effective at improving CRZ or switching practices, then the per- Decertifications are a relatively objective measure avoiding the
centage of at-risk behaviors should, on average, decrease over the measurement problems associated with minor injuries (Pedersen
months of the evaluation period. et al., 2012). Automated electronic devices aboard the trains and
An analysis of data from training videos assessed the consis- along the tracks detect most of decertifications the evaluation ana-
tency of worker recordings of at-risk behaviors. CAB–CRZ trainers lyzed, making such occurrences resistant to over-reporting and
showed one of two videos at the end of peer-observer training. under-reporting. The decertification process involves an investiga-
Each scripted video portrayed a typical mix of safe and at-risk tion including the railroad management, the union, and the FRA,
behaviors for five CRZ events. Trainees used their checklists to and includes documented evidence and a hearing, where all parties
record safe and at-risk behaviors seen in the videos. The same crosscheck the evidence and conclusions. This crosschecking by
videos were used for the coaching of many peer observers some- parties with different interests further reduces the chance of bias.
time after training, but during coaching each peer observer saw a The data were limited to three types of decertifications that are
different video than that seen during training. associated with CRZ practices because each often results from a
M. Zuschlag et al. / Safety Science 83 (2016) 59–73 65

Table 1 performing a quasi-experiment and contrasting the data showing


Measures of consistency for CAB feedback session data. an apparent safety effect with no-treatment ‘‘comparison data”
Measurement Calculation that, in theory, should not be affected by the process (Rossi et al.,
Reliability Percentage of agreement between a peer observer and the 1999; Shadish et al., 2002).
standard observation for each CRZ event in the video as The evaluation team worked with UP management to select
determined by the steering committee comparison data such that the associated operations were similar
Time drift Trend in average at-risk scores over time, as indicated by to those of the treatment (e.g., type of work, number of workers,
correlation of the scores with the date that the video was
rated in training
style of management). For this paper, comparison data were
Retest drift Difference in average at-risk scores between training and obtained for other service units or yards that did not implement
coaching the safety process at the same time. For CAB–CRZ, decertification
performance at the San Antonio Service Unit was compared with
that at other service units in UP’s Southern Region: Fort Worth,
loss of crew attention under CRZ conditions. These types are: pass- Houston, and Livonia. According to UP management, all four ser-
ing a stop signal, moving outside an authorized stretch of track, vice units shared the following characteristics:
and exceeding train or track speed limits by more than 10 mph.
Decertification data were further filtered to occurrences in which  Approximate size of territory and number of workers.
the formal investigation of the decertification found the engineer  Age and tenure of the workers.
to be at fault. This comprises the vast majority of the decertifica-  Unions representing the workers.
tion data. If CAB–CRZ is effective in reducing occurrences, CRZ-  Type of work and traffic.
related decertifications should decrease, beginning in the first  Corporate and regional organizational culture and manage-
phase, after the initiation of CAB–CRZ. ment, including the same regional vice president.
 Role behaviors of local management regarding safety (manage-
2.2.2. Incidents of material damage ment at all sites were interested in cooperating with workers
The railroad industry defines ‘‘incidents” as occurrences involv- and starting CSA or similar programs, but only San Antonio
ing operation of on-track equipment that result in property dam- implemented CSA for road work during the evaluation period).
age (e.g., see FRA, 2011). These occurrences include collisions,  Level of FRA scrutiny (owing to recent high-profile accidents).
fires, and derailments, with the latter accounting for approxi-  Definitions and reporting procedures for safety occurrences.
mately 84% of the incidents in the data. The incident data in this
evaluation are resistant to under-reporting because by definition In addition, the service units all had approximately the same
they involve physical hardware damage and its repair, and this is baseline level of decertifications, varying between 1.2 and 1.5 per
difficult-to-impossible to hide or ignore. For example, in the case 200,000 worker-hours. San Antonio had the lowest baseline rate,
of defining a derailment, a wheel of a train is either on the track but the differences among the service units was not significant
or off. If it is off, then special equipment must be ordered to get (multiple Poisson occurrence rate comparison (Nelson, 1982)
it back on before the train can be moved. A given transportation v2(3) = 1.67, p = 0.644). Qualitative data from the implementation
crew might not report damage from a collision on its shift, but then evaluation revealed no unusual events or changes to any of the
the subsequent crew or maintenance personnel will discover the sites during the evaluation period (e.g., no catastrophic accidents).
damage and report it. While there may be disciplinary This lack of a difference in changes is most crucial for the validity
repercussion for causing a derailment, it cannot be fixed without quasi-experiment since an effect of treatment is indicated by the
management intervention. relative performance of the treatment site to the comparison sites
UP provided both FRA-reportable and non-reportable incidents over time. In summary, the comparison sites were a suitable con-
for this evaluation, the difference being the cost of the incidents. trast for San Antonio.
The FRA states that any incident that costs the carrier more than For CAB-Switching, comparison with the other three service
a certain amount (for example, $7700 in 2006) must be reported units in the region was problematic because two of these service
to the government and is hence a reportable incident. The inclu- units had similar safety programs for switching that coincided with
sion of non-reportable incidents substantially increases the sample CAB-Switching, and therefore they were not used. Instead, compar-
size and thereby the statistical power of the analyses. UP provided isons within the San Antonio Service Unit were made among the
data only on incidents attributed in a UP investigation to an error three locations with the largest yards in the service unit: Eagle
by a transportation worker. Incidents attributed to physical faults, Pass, Laredo (comprising two yards), and the San Antonio Complex
such as a rail breaking, were not included in the data. An incident (comprising three yards in the city of San Antonio).
investigation involves union officials and management represent- At the end of the evaluation period, different locations had dif-
ing all major departments. The FRA audits the investigation process ferent strengths of implementation of CAB-Switching due to differ-
to check its accuracy, and may conduct its own investigation for ent start dates and rates of training. Managers at all three locations
more serious incidents. Thus, the checks and balances are similar wanted to initiate CAB-Switching as soon as possible—there was
to that in a decertification investigation. equal interest to improve safety at all yards. However, logistics
The vast majority of incidents are associated with switching forced the different strengths of implementation. Laredo was a
rather than with CRZ operations, so if CAB-Switching is effective peripheral terminal, requiring travel of the San-Antonio-based
in reducing occurrences, incidents should decrease, beginning in trainers; thus the steering committee deferred its implementation
the second phase, after the initiation of CAB-Switching. until the end of the evaluation period, resulting in a weak imple-
mentation at that time. Eagle Pass was also a peripheral terminal,
2.2.3. Analysis strategies for corporate safety data but had relatively few workers (approximately 20) who were all
2.2.3.1. Comparison data. As this is a field study, strict experimental trained in just a few visits in under two months at the start of
control is not possible, and there is always the possibility of con- CAB-Switching. With all workers trained from the beginning of
founds with the safety process being the true cause of changes in CAB-Switching, Eagle Pass had the strongest implementation. The
safety. However, there can be an increase in the confidence in San Antonio Complex required no trainer traveling, but its size
the analysis of the effectiveness of the safety process by was comparable to Laredo, so San Antonio’s training took much
66 M. Zuschlag et al. / Safety Science 83 (2016) 59–73

longer than Eagle Pass’s. By the end of the evaluation period, about Analyses of gap times and other statistical analyses of elapsed
half its workers at the San Antonio Complex were trained, giving it time until an event are routine in medical research (Lee and
an intermediate implementation strength at that time. Wang, 2003; Bradburn et al., 2003) and reliability engineering
In addition to having different number of workers, the three (Department of Defense, 1996). For example, mean time between
locations had different baseline occurrence rates (v2(2) = 17.25, failures is a familiar statistic from reliability engineering. In this
p = 0.0002). The San Antonio Complex had the highest rate evaluation, safety occurrences are treated no differently than fail-
(12.15 incidents per 100,000 car-moves), but this was not signif- ures in a population of replaceable components (Nelson, 2003).
icantly different than the rate at Eagle Pass (9.69, two Poisson In contrast, analysis of railroad safety data traditionally represents
occurrence rate comparison (Nelson, 1982) F(50, 196) = 1.204, data points as the occurrence rates for a convenient block of time
p = 0.188). The rates for the San Antonio Complex and Eagle Pass (e.g., the monthly decertification rate) rather than gap-times. How-
were each significantly higher than the rate for Laredo (5.93, F ever, for relatively sparse frequencies of occurrences, such as found
(98, 196) = 2.008, p < 0.0001, and F(98, 48) = 1.601, p = 0.036, at a single railroad service unit, analyses of such block-time data
respectively). have lower statistical power than analysis of gap times. Block-
Thus, as a group, the three locations were more similar than dif- time analysis can also invalidate the normality and homoscedastic-
ferent: two out of three were approximately the same size, and two ity assumptions of parametric statistical analyses (Zuschlag et al.,
out of three had approximately the same baseline occurrence rate. 2012). Gap time data are thus preferred for this data set.
In other areas, all three locations shared the same characteristics as
listed above for the four service units of the Southern Region. In 2.2.3.3. Survival analysis. Most inferential statistical analysis
addition, the three locations shared the same superintendent and requires that the gap times be statistically independent and iden-
service-unit-level of management. Most crucially, the qualitative tically distributed. All gap time data were checked for indepen-
data from the implementation evaluation revealed no unusual dence by calculating the lag-1 and lag-2 autocorrelations (Cohen
events or changes to any of the locations during the evaluation per- et al., 2003). The data were checked for being identically dis-
iod. The three locations were thus suitable for contrasting against tributed by inspecting Weibull probability plots (Nelson, 1982).
each other. When gap times are statistically independent and identically
Data were gathered for the baseline years prior to, in addition to distributed, survival analysis techniques are suitable for inferential
during, the CAB implementation, constituting a pre-post design analysis (Cook and Lawless, 2010). Survival analysis comprises a
with no-treatment comparisons. With such comparison data avail- suite of statistical techniques to analyze the effects of variables
able, the chief statistical analysis is the relative performance of the on the times until an occurrence, which may include analysis of
treatment and comparison data from baseline through interven- gap times. Survival analysis is preferred over conventional least
tion—that is, the presence of a statistical interaction. If the changes squares techniques such as analysis of variance because, compared
in safety are different for the treatment and comparison data, the to conventional analyses, survival analysis:
implication is that some factor associated with only the treatment
data, such as the safety process, is specifically affecting the treat-  Is more capable of addressing the approximately exponential
ment data. On the other hand, if there are no differences in safety distributions typical of gap times.
changes, there is a strong possibility that a single factor other than  Produces more accurate parameter estimates due the use of
the treatment is responsible for changes in both treatment and maximum likelihood estimation.
comparison data.  Allows inclusion of a ‘‘right-censored” gap time, being the first
occurrence that occurs after the end of the evaluation period.

2.2.3.2. Gap times. The occurrence data were analyzed as gap times
This evaluation employs two forms of survival analysis to ascer-
(Cook and Lawless, 2010; Maguire et al., 1952) to determine the
tain the relationship between the CAB safety process and the
occurrence rate changes associated with CAB. A gap time (also
occurrences rates: Cox regression and Weibull regression (Lee
known as inter-arrival time) is the time between a pair of adjacent
and Wang, 2003; Nelson, 1982). The Weibull regression fits a
occurrences. ‘‘Time” is expressed in units that represent the site’s
specific parametric distribution to the data, while the Cox regres-
exposure to the risk of an occurrence. In the railroad industry,
sion derives a nonparametric distribution from the data.
the unit of exposure for incidents is typically car-moves (the num-
Like any regression, Cox and Weibull regressions produce coef-
ber of cars moved through a location), while the unit of exposure
ficients, bi, representing the magnitude of the effect for each pre-
for decertifications is typically worker-hours at the site. Because
dictor variable. Specifically, exp(bi) is the ratio of the ‘‘hazard” or
the gap times are calculated as the elapsed exposure rather than
chance of an occurrence at any given moment for each integer
calendar time, the data are corrected or effectively normalized
increment of the predictor variable (Bradburn et al., 2003). With
for different levels of exposure at different times and places (e.g.,
a predictor variable representing the presence of CAB, one can cal-
different size yards).
culate the percent change in the chance of the occurrences associ-
The timeline in Fig. 2 shows the gap times between five hypo-
ated with CAB (Hosmer et al., 2008).
thetical decertifications. For instance, the gap between the April
20th and April 23rd decertifications is 24,000 worker-hours, and
the decertification between the April 23rd and April 28th decerti- 2.3. Practices and safety-culture survey
fications is 40,000 worker-hours, and so forth.
Gap times are the mathematical inverse of a rate. For example, A forced-choice survey included measures of practices and
if 24,000 worker-hours were completed since a previous decertifi- safety culture, specifically self-reported CRZ work practices and
cation, then there was 1 decertification per 24,000 worker-hours labor–management relations. Table 2 describes the scales, along
for that time period, equal to a rate of 1/24,000  200,000 or with their sources and supporting research, and the outcome
8.33 decertifications per 200,000 worker-hours.1 Thus, if gap times related to each.2
increase, then the rate must decrease, and vice versa. The scales were evaluated for inter-item reliability with use of
Cronbach’s alpha because this is the first time that such scales have
1
The constant of 200,000 is included to be consistent with the railroad convention been used in the railroad industry. These scales were added to 11
of representing rates as roughly the frequency per 100 workers working 40 h a week
2
for one year. Please contact the first author for more detailed information about the scales.
M. Zuschlag et al. / Safety Science 83 (2016) 59–73 67

Fig. 2. Example of gap times.

Table 2
Forced-choice survey scales and their relation to outcomes.

Scale Description No. of items Item source Outcome


Unsafe behaviors – CRZ Self-reported extent to which employees 6 UP code of operating rules, CRZ practices
follow CRZ rules and proceduresa checked by UP subject-matter experts
Labor–management relations Perceived trust and friendliness between labor 6 Dastmalchian et al. (1989) Safety culture
and management, with particular attention to safety
a
Conceptually based on Hofmann and Stetzer (1996). Managers reported on the practices of their subordinates, whereas workers reported for themselves. Yard employees
skipped this scale.

Table 3 questions changed to follow each respondent’s unique thoughts.


Numbers of respondents for the CRZ scale for each phase and respondent type. Thus, respondents were not all asked the same questions.
Respondent typea To obtain the richest information on outcomes, purposive sam-
Phase Workers Managers
pling was used to identify the interviewees (Crano and Brewer,
2002). That is, sampling was deliberately weighted toward select-
First 179 (19%) 16 (32%)
Second 86 (9%) 26 (52%)
ing respondents with rich knowledge to provide insight. The
respondents therefore tended to be those very knowledgeable
a
Return rates are in parentheses. about CAB and the workforce reactions to it. Nonetheless, efforts
were made to include representatives of various viewpoints. Inter-
viewees included workers and managers, BLET and UTU members,
proprietary scales in an organizational climate survey used by BST yard and road workers, and workers with various levels of involve-
for evaluating its implementation sites. Service-unit management ment in CAB, such as steering committee members and workers
distributed the survey by mail to workers and managers on two trained or not trained in CAB. Participants included respondents
occasions: early in the first phase, in December 2005, and after who were both supportive and skeptical of CAB.
the end of the second phase, in April 2008. Table 3 lists numbers While representative sampling is preferred for answering nar-
of respondents and return rates. Qualitative data suggested that row research questions, purposive is preferred for obtaining broad
the lower worker response rate for the second phase was due the insight about a phenomenon (Crano and Brewer, 2002; Patton,
new superintendent (see Section 1.3.3) coordinating less with the 2002). As mentioned above the interviews served to illuminate
unions on administering the survey than his predecessor (while the processes responsible for any observed quantitative outcomes,
continuing to strongly support CAB). The distributions of the self- therefore purposive was more appropriate.
reported years of service and ages of the respondents from each Table 4 shows the types and numbers of respondents. Facilita-
administration were compared with the actual respective distribu- tors and the local management sponsor were interviewed on all
tions from data provided by the service unit’s human resources three occasions. The top three rows represent labor members.
department. There were no significant differences, implying that Therefore at the initial interviews there were 14 employees and
respondents from each administration were representative of the five managers. CAB facilitators and top management helped to
workers and managers at the site. identify prospective respondents, selecting people who were
If CAB has an effect as predicted, improvements in average scale respected, credible, and neutral rather than pro-labor or pro-
scores should be seen from the first to the second phase. management. Other than those directly involved in CAB, the inter-
viewees were different for each phase. Stakeholder involvement
2.4. Interviews throughout the lifecycle of the program, such as through annual
interviews and participation in respondent selection, has been
Interviews provided qualitative data concerning perceptions of shown to be a critical factor in meaningful use of evaluation find-
the outcomes (Patton, 2002). In this evaluation the interviews ings (Johnson et al., 2009).
served primarily to illuminate the process of any observed quanti- The evaluation team analyzed interview data by breaking the
tative results. Respondents were asked about outcomes attributed responses into comments, then identifying and sorting the com-
to CAB and probed on the role CAB played in any reported changes ments into themes representing the most frequent comments. To
to safety and labor–management relations.3 The interviews were accomplish this two or three team members sorted comments sep-
confidential and were conducted one interviewee at a time. To allow arately into themes and then reviewed each other’s work to arrive
comparisons as CAB was implemented, interviews were conducted at a consensus on the themes (Miles and Huberman, 1994; Patton,
approximately annually, for a total of three times: in the winters 2002).
of 2005 and 2006 (‘‘initial” interviews), 2006 and 2007 (‘‘midterm”), Comparing interview data across the evaluation period pro-
and 2007 and 2008 (‘‘final”). The interviews were open-ended and vided detailed insight into changes in management practices, sys-
semistructured; respondents were allowed to reply freely, and temic conditions, and safety culture, the latter also measured by
the forced-choice survey. For this paper, safety culture is defined
3
An interview guide may be obtained by contacting the first author. to include safety-related norms, competencies, beliefs, values,
68 M. Zuschlag et al. / Safety Science 83 (2016) 59–73

Table 4 to the monthly percentages. Following the trend for CAB–CRZ


Respondents to periodic interviews included 19 at initial, then 18, then 16 at final. scores, PPF sessions by CAB peer observers showed a decreasing
Respondents No. of respondents tendency for at-risk switching practices from the beginning of
Initial (Winter Midterm (Winter Final (Winter of CAB-Switching (r = 0.569, n = 17, p = 0.0157). Based on a linear
of 2005–2006) of 2006–2007) 2007–2008) regression of these data, the percentage of at-risk practices
CAB steering 4 4 4
decreased, from 5.13% in August 2005 to 1.05% in January 2008.
committee In other words, the rate of at-risk practices at the end of the eval-
members uation period was approximately one-fifth that at the start of CAB-
Trained CAB peer 5 4 3 Switching.
observers
Workers untrained 5 3 3
in CAB 3.1.2. Survey: Unsafe Behaviors–CRZ scale
The Unsafe Behaviors–CRZ scale had a Cronbach’s alpha of
Management
Frontline/ 3 3 4 0.837, indicating acceptable reliability (Rosenthal and Rosnow,
middle 1991).
Upper 2 2 1 A two-by-two analysis of variance (ANOVA) evaluated changes
Corporate/ 0 2 1
in unsafe CRZ behaviors, with respondent type (worker and man-
regional
ager) and phase (first and second) as independent variables. The
Total 19 18 16
reported use of CRZ practices was significantly more frequent for
the second phase (M = 3.988) than for the first phase (M = 3.749,
F(1, 294) = 5.739, p = 0.017). There was no significant effect of
respondent type (F(1, 294) = 0.049, p = 0.825) or interaction
between respondent type and phase (F(1, 294) = 1.079, p = 0.300),
indicating that workers and managers saw the same degree of
improvement in workers’ behavior.

3.1.3. Summary findings for worker practices


Evidence from both the feedback session data and the survey
indicate that the worker practices targeted by CAB improved with
the introduction of the process. Feedback session data suggest that
at-risk practices fell to a fraction of their rates at the beginning of
the implementation. Survey data indicate that workers and man-
agers both saw these improvements.

3.2. Occurrences

The 1-lag and 2-lag autocorrelations for the gap times of decer-
Fig. 3. Monthly percent at-risk scores for CAB–CRZ and CAB-Switching feedback tifications and incidents were all low (|r| < 0.2) and on average not
sessions. significantly different from zero, indicating that gap times appear
to be adequately independent for inferential statistical analysis.
Weibull probability plots indicated the data were from a single dis-
attitudes, relations, and interaction patterns (Reason, 1997). Thus,
tribution (Nelson, 1982). Gap times were thus suitable for survival
interview responses relevant to changes in any of these, singly or in
analysis.
combination, were used to explain the process of safety culture
change under CSA.
3.2.1. Decertifications
To evaluate the outcome of CAB–CRZ on occurrences, the San
3. Findings Antonio Service Unit was compared to the other three Southern
Region service units combined. A Cox regression was performed
3.1. Worker practices on the gap times of CRZ-related decertifications since the start of
the first phase. Predictor variables were date, service unit (San
3.1.1. Feedback session data Antonio versus the others), and the date-by-service-unit interac-
3.1.1.1. CAB–CRZ. Fig. 3 depicts the average at-risk scores for CAB– tion. This regression found a significant interaction (v2(1) = 4.68,
CRZ feedback sessions between August 2005 and December 2007. p = 0.030), indicating that, as time progressed since the start of
The dark blue straight line represents the least-squares best-fit lin- CAB–CRZ, the San Antonio rate of decertifications changed signifi-
ear relations to the monthly percentages. cantly more than the other service units.
Throughout the evaluation period, PPF sessions by CAB peer Table 5 shows the results of Cox regressions of date on the
observers showed decreasing tendency for at-risk CRZ practices decertification gap time for each service unit separately. The per-
(r = –0.797, n = 29, p < 0.0001). Based on a linear regression of these cent changes in chance of decertifications were calculated from
data, the percentage of at-risk practices decreased, from 7.14% in the regression coefficients. A positive percent represents a percent
August 2005 to 1.05% in December 2007. In other words, the rate increase in the chance of a decertification at any given moment
of at-risk practices at the end of the evaluation period was less while a negative percent represents a percent decrease in the
than one-sixth the rate at the beginning. chance.
The San Antonio Service Unit, the site that implemented CAB,
3.1.1.2. CAB-Switching. Fig. 3 shows average at-risk scores for CAB- showed a significant 79-% decrease in chance of decertifications
Switching feedback sessions from the start of CAB-Switching in through the first and second phases, from September 2005 (when
October 2006 through December 2007, with the light magenta CAB started) through the end of available data in February 2008.
straight line represent the least-squares best-fit linear relations The other three service units of the Southern Region (Fort Worth,
M. Zuschlag et al. / Safety Science 83 (2016) 59–73 69

Table 5
Cox regressions of date on gap times of decertifications for phases and each service
unit.

Phase Statistical Service unit


value
San Fort Houston Livonia
Antonio worth
Baseline Change in 0% 13% 5% 83%
chance
n 45 50 54 37
p 0.997 0.817 0.924 0.965
First and Change in 79% 0% 22% 193%
second chance
n 42 72 74 37 Fig. 4. Rates of human factors incidents per 100,000 car moves at site yards. Dashed
p 0.014 1.000 0.543 0.109 lines represent no significant difference.

Houston, and Livonia) showed no significant change in the chance


of decertifications. During the four-year baseline period, no service proportion of reportable and non-reportable incidents for any site
unit showed a significant trend in decertifications. since the start of CAB-Switching (lowest p = 0.259). There is no evi-
dence that the rate of the more costly incidents changed differently
than the rate of the less costly incidents.
3.2.2. Incidents
To evaluate the outcome of CAB-Switching on occurrences, the 3.2.3. Summary findings for occurrences
major yards of the San Antonio Service Unit were divided into Occurrences decreased in the presence of CSA. From the begin-
three locations, each corresponding to a different strength of ning of CAB–CRZ until the end of the evaluation period, there was a
implementation due to different start dates and rates of CAB- gradual decline in the chance of engineer decertifications most clo-
Switching training: sely related to the practices that CAB–CRZ promoted. This decline
appears to have begun at the same time as CAB–CRZ rather than
 Eagle Pass: a yard with a strong CAB implementation. having continued a trend preceding it. Other service units of the
 San Antonio Complex: three yards with moderately strong Southern Region, which had no CSA process for CRZ practices, did
implementations. not experience a decline, implying that the cause was unique to
 Laredo: a pair of yards with relatively weak implementations. the San Antonio Service Unit, the only unit with a CAB–CRZ pro-
cess. The decrease implies a reduction in serious safety violations,
Fig. 4 shows the incident rates at the three locations during the such as passing a stop signal, which are often precursors to catas-
Second Phase and before the Second Phase (Baseline and First trophic road accidents. Thus, a reduced chance of decertification
Phase). implies a reduced probability of such accidents.
A Weibull regression was performed on the gap times of inci- Human factors incident rates dropped significantly at Eagle Pass
dents. Predictor variables were phase (baseline and first phase ver- and the San Antonio Complex after CAB-Switching began at those
sus second phase), location, and the phase-by-location interaction. locations. Rates did not significantly change at Laredo, which did
This regression found a significant location by phase interaction, not start CAB-Switching until the final months of the second phase.
indicating that the three yard locations had different changes in The decreases in incident rates were not the product of a general
rates (v2(2) = 6.16, p = 0.046). Both Eagle Pass and the San Antonio improving trend that began prior to CAB-Switching. While the
Complex had significant decreases in their incident rates (v2(1) improvement at Eagle Pass was significantly better than that of
= 7.54, p = 0.006, and v2(1) = 4.13, p = 0.042, respectively), while the San Antonio Complex and Laredo, the improvement of the
Laredo, which had the weakest implementation, did not signifi- San Antonio Complex was not significantly better than that of Lar-
cantly change (v2(1) = 0.01, p = 0.909). Eagle Pass, which had the edo. This lack of significance is not surprising given the relatively
strongest implementation, experienced an 81-% drop in the rate short evaluation period (about one year) for CAB-Switching.
of incidents over a 1-year period, which was significantly greater For both decertifications and incidents, the chance of occur-
than the modest 32-% drop at the San Antonio Complex (v2(1) rences associated with CSA fell to about a fifth of that before imple-
= 4.19, p = 0.041), and also significantly greater than the change mentation of CSA.
at Laredo (v2(1) = 6.06, p = 0.014). The change at the San Antonio
Complex was not significantly different from the change at Laredo
(v2(1) = 1.20, p = 0.2733). 3.3. Safety culture
In baseline and the first phase, the incident rate for Eagle Pass
and the San Antonio complex were not significantly different from In this section quantitative survey results are followed by the
each other (F(50, 196) = 1.204, p = 0.188), but were each signifi- related interview data to illuminate the apparent processes behind
cantly higher than the rate of Laredo (F(98, 48) = 1.601, p = 0.036 the scale score changes.
and F(98, 196) = 2.008, p < 0.0001). In the second phase, the rate
for Eagle Pass was significantly lower than the rates of Laredo (F 3.3.1. Survey: Labor–Management Relations scale
(8, 40) = 2.186, p = 0.049) and San Antonio (F(8, 74) = 3.097, The Labor–Management Relations scale had a Cronbach’s alpha
p = 0.0045), while San Antonio and Laredo were not significantly of 0.809, indicating modest reliability.
different (F(44, 92) = 1.349, p = 0.129). A two-by-two ANOVA evaluated changes in the Labor–Manage-
Prior to the second phase, no location showed a significant ment Relations scale, with respondent type (worker and manager)
trend in incident rates, except that Laredo had an unexplained and phase (first and second) as independent variables. The analysis
spike in incidents straddling the transition between the first and showed that labor–management trust was significantly better in
second phase. Fisher exact tests of the frequencies of reportable the second phase (M = 2.881) than in the first phase (M = 2.324, F
and non-reportable incidents found no significant change in the (1, 303) = 11.123, p = 0.001) for both workers and managers. The
70 M. Zuschlag et al. / Safety Science 83 (2016) 59–73

mean improvement on the scale was not insubstantial (accounting Worker, Final Interviews
for 10% of the variance), but remained below the 3.0 midpoint of I have seen a lot of improvements in the facilities, in providing
the 5-point Likert response stem used by the scale. This suggests employees with more stuff, such as locker rooms, more stuff in
that while safety culture improved, some labor–management the engine, such as AC in it. That has totally changed in the last
issues remained. Managers generally saw relations as being signif- three years. I’ve seen the facilities are now clean.
icantly better than did workers for both phases (Ms = 3.472 com-
Management practices such as barrier removal may have con-
pared with 2.378, F(1, 303) = 56.296, p < 0.001). This is a
tributed to improving safety at the site. However the change in
commonly found difference in perspective that may reflect labor
management practices likely had the additional effect of improving
and management’s different bargaining roles (K. Bell, BST, personal
worker perceptions of management. In the interviews, workers
communication, August 26, 2013; M. Mangan, BST, personal com-
(and managers) reported seeing that management commitment
munication, August 27, 2013). There was no significant interaction
to safety had improved since the baseline period:
between respondent type and phase (F(1, 303) = 0.050, p = 0.823),
indicating that workers and managers saw the same degree of Worker, Initial Interviews
change, and that there was no change in the gap between workers Management has a great safety attitude if it benefits them. They
and managers. push cars through even if they are not safe. Then they strong-
arm employees with safety when they want to. All management
at all levels is the same.
3.3.2. Interviews: Labor–Management Relations
In this section concerning the interviews, italic text designates Worker, Final Interviews
themes from the qualitative analysis of the interviews. All inter- I see that 80 percent management have commitment . . .but . . .
view quotes shown in this paper were representative of typical lower management can only execute what upper management
responses for the corresponding theme, rather than unusual or says.
extreme responses. These interview themes suggest changes in
As the worker’s quote above from the final interviews indicates,
attitudes and behavioral patterns related to trust producing
commitment may have been improved, but workers still see issues
changes in commitment, style, and proficiency in ensuring safety
in translating that commitment into action. This may account for
(Reason, 1997).
labor–management relations improving yet still remaining rela-
From the initial interviews through the midterm and final inter-
tively low as measured by the survey.
views, respondents reported improvements in relations between
Management’s commitment toward safety implies that rules
labor and management, echoing the quantitative results found by
are enforced consistently and that managers can be relied on for
the survey. The interviews suggested some factors that may have
safety. Some workers and many managers reported greater fairness
contributed to the improvements in labor–management relations
by management and greater trust between workers and managers, an
indicated by the survey.
improvement in behavioral patterns since the initial interviews:
In the initial interviews, workers were asked to describe their
chief safety concerns. Among the top concerns were the conditions Worker, Final Interviews
of facilities and equipment, as indicated by the worker quoted I have a lot more trust with my managers [than I did a few years
below:4 ago]. I can go to all of them. I know that a lot of the older guys
still don’t trust managers. I know that my managers want me to
Worker, Initial Interviews
work safe.
There is debris all over the yard, the gondola cars are overfull,
and then there are gigantic pieces of metal lying around. There Managers reciprocated the changes in worker perceptions:
are holes two feet deep from work done by Maintenance of Manager, Final Interviews
Way. You could break a leg if you fell in one of these holes. When I first started, the mentality between managers and trainmen
was that [managers] considered trainmen as oxen that have to be
Consistent air conditioning for the train cabs was also men-
whipped to get them going. . .. Over the years, management’s
tioned as an equipment problem:
thoughts have changed. The workers are smart people and have
Worker, Initial Interviews smart things to say, too. . .. There are things that they could teach me.
AC [air conditioning] in cabs – this is a problem. It has
With workers seeing managers as having greater commitment
improved, but not to where it should be. . .. There is a big differ-
to safety, and workers and managers each trusting the other more,
ence when one gets off a train with AC than one without it. Lots
it is understandable that labor–management relations scale scores
of the fleet are not equipped.
would improve.
Management is responsible for conditions of facilities and Greater mutual trust facilitates more communication and coop-
equipment. Thus, workers’ safety concerns implied they perceived eration (Reason, 2003). Furthermore, the joint labor–management
poor organization support for safety. Workers felt management barrier removal team provided a venue to exercise communication
practices inadequately promoted safety. and cooperation. Consistent with these factors, respondents
As part of CAB, the barrier removal team identified barriers reported improved communication and cooperation between
related to the conditions of facilities and equipment, which man- management and labor on safety:
agement removed. Workers appear to have noticed the difference. Worker, Initial Interviews
Most workers and managers reported improvements in facilities A worker was told to pull 100-plus cars out under certain con-
and equipment since the initial interviews. Locomotive air condi- ditions that broke the train in two. It was a manager’s idea to
tioning, yard clean-up, and building repair in particular were not cut away a smaller set. Could have avoided it but he
mentioned: wouldn’t listen to the worker’s idea.

4
The other top concern was fatigue, especially as related to the work schedule. Worker, Final Interviews
Workers would later report improvements regarding this issue, thanks to more I have experienced one incident myself, where managers
reliable train arrival information, but these advances could not be directly linked to
approached me after letting them know about problems in
the CAB process.
M. Zuschlag et al. / Safety Science 83 (2016) 59–73 71

getting a switch lined up. The four managers went there to no significant rate change at yards with a relatively weak CSA
investigate; they talked to us and said thank you for bringing implementation.
it to our attention  Labor–management relations, specifically trust, improved at the
demonstration site, indicating the development of a more effec-
Such openness in turn was associated with more reports of
tive safety culture.
workers initiating communication with managers:
Manager, Final Interviews In summary, CAB has demonstrated the potential for CSA to
It used to be that the employees wouldn’t even talk to a man- improve safety and safety culture in railroad transportation
ager without their union local chair, but now I can deal with departments.
the employees directly. Before, they wouldn’t talk to us even
on the smallest issue, but now the conversations are more open 4.2. Evaluation limits
and frequent.
4.2.1. Context of the CAB process
Manager, Final Interviews Despite the comprehensive nature of the evaluation, there are
I think CAB has opened up the relationship between manage- some limitations to take into account when drawing conclusions.
ment and workers. It may be the one avenue that opened doors The findings were from a specific service unit at a specific point
that hadn’t been available before. in history. This site had the following features, which may be rele-
vant to the generalizability of the results:
3.3.3. Summary findings for safety culture
 The number of employees at the treatment site ranged from 20
Quantitative survey data indicate that labor–management rela-
(for CAB-Switching) to 1100 (for CAB–CRZ).
tions improved with a successful CSA implementation. Consistent
 The pilot was conducted on a US freight railroad, which has cer-
with previous research (see DeJoy, 2005), qualitative data suggests
tain operational and organizational characteristics (e.g., two
that the increase in the labor–management relations scale score
crew members in a CAB, which allows them to observe each
could be related to the apparent improvements in safety commu-
other; FELA, which normally discourages labor–management
nication and cooperative behavioral patterns mentioned in inter-
trust).
views. These improvements were, in turn, related to the
 The US freight railroad industry in general was experiencing
improved perceived management commitment to safety that was
growth, owing to increase demands for coal (McPhee, 2005).
associated with the barrier removal identification subsequent
 Labor and management were relatively motivated to cooperate
management improvement of facilities and equipment. CSA thus
on safety due to recent high-profile accidents and an influx of
may improve labor–management trust by providing a means for
new workers, although this was counteracted by a long history
ongoing cooperation and communication. Specifically, the
of labor–management distrust and a US legal system that pro-
barrier-removal process provides sustainable opportunities for
vides incentives to clash over safety (Zuschlag et al., 2012).
workers and managers to address safety through cooperation
 Top managers were inclined to empower employees with con-
rather than bargaining. The quantitatively measured improvement
trol over the safety process, even when the process encountered
in labor–management relations, bolstered by the explanatory
difficulties.
power of the qualitative analysis, indicate a shift in the safety cul-
 There were no unusual changes in leadership, organizational
ture toward a ‘‘trust culture” (Reason, 2003), a key feature of an
structure, or other highly disruptive events.
effective safety culture that is traditionally lacking in the railroad
 The baseline level of safety was relatively high, consistent with
industry (Coplen, 1999). The improvement in labor–management
the theoretical potential of CSA to move railroads to the next
relations is consistent with predictions for safety processes like
level of safety.
CSA that feature a nondisciplinary, proactive, cooperative,
systems-safety-analysis orientation (Reason, 1997).
Whether or not the same results can be observed at other sites
and times depends on the similarity of the sites, the implementa-
4. Discussion tion support, the management’s role behavior and motivation, the
level of success in adapting the CSA process to conditions at the
4.1. Outcome summary locality, and the selection of safety measures that are appropriate
for the evaluation (e.g., decertifications, rather than catastrophic
The theory of change, presented in Section 1.2.1., usefully road accidents) and resistant to over-reporting or under-
guided the outcomes to measure. Quantitative results from process reporting. Similar results might be more likely at similar US freight
metrics, surveys, and corporate safety data, depicted on Fig. 1, all sites than at passenger or freight sites outside the US.
point toward safety-related improvements from implementing
CSA. Interview data did not contradict the quantitative findings, 4.2.2. Rigor and confounds
but rather provided consistent explanations for the quantitative As a field study that could not randomly assign participants to
findings. To summarize: conditions, this evaluation nonetheless achieved a high degree of
rigor through the use of mixed research methods, pre-post quasi-
 Worker practices targeted by the CSA process improved with experimental design with no-treatment comparison, and systematic
the program’s introduction. Process metrics indicated that the employment of a theory of change (GAO, 2009). It is nearly inevitable
commission of at-risk behaviors dropped by about 80%. that confounds or events will occur simultaneously with safety-
 The chance of engineer decertification gradually declined by process implementation, rendering ambiguous the apparent cause
79% after the introduction of CSA at the service unit, whereas of any observed safety improvements. In the case of CAB, however,
the chance of decertification at comparison service units did regional, corporate, or industry-wide confounds with time are
not decline. implausible alternative explanations for improvements in occur-
 The rate of incidents of material damage dropped by 81% at a rences because such improvements would also be observed at com-
yard with a strong CSA implementation, and by 32% at yards parison sites. The treatment and comparisons sites were similar in
with a moderately strong implementation, whereas there was size, initial safety level, type of work, organizations (for both labor
72 M. Zuschlag et al. / Safety Science 83 (2016) 59–73

and management), worker tenure, leadership and leadership role site cannot be attributed to either regression to the mean or
behaviors, motivation for safety, regulator oversight, and definitions organizationally-induced confounds. It is most plausibly attributed
of safety occurrences. Yet, this evaluation found safety occurrence to CAB–CRZ, which was unique to the San Antonio Service Unit.
improvements uniquely at the treatment sites. In the case of incidents, Eagle Pass showed the greatest improve-
This leaves local confounds as alternative explanations for ment but at baseline its occurrence rate was significantly higher
safety improvements at the demonstration site. The full report of than the comparison site Laredo, which may cast suspicion on the
this demonstration (Zuschlag et al., 2012) documents all relevant results. However, the rate for Eagle Pass at baseline was not signifi-
confounds and analyzes their potential as alternative explanations. cantly different than the rate for the San Antonio Complex, yet Eagle
However, any field setting has multiple factors occurring and influ- Pass had a significantly lower rate than the San Antonio Complex at
encing each other, and it is ultimately misleading to search for one follow-up. If there were regression to the mean or organizational
feature that is responsible for all change. Anyone interested in confounds, then the San Antonio Complex should have improved
replicating the outcomes of CAB at another site ought to appraise at least as much as Eagle Pass. Indeed, because the San Antonio Com-
the unfolding events at San Antonio, from the initial context plex is larger, it should arguably have had more organizational moti-
through the constant evolution of CAB, and consider how to vation to improve its safety record than Eagle Pass because the San
achieve or exploit equivalent conditions and processes at the Antonio Complex had a greater raw number of incidents (in fact,
intended site. For example, CAB likely benefited from the formative qualitative data indicated that all yards in the service unit were
evaluation, where the ongoing evaluation was used to improve motivated to improve safety). Finally, with regard specifically to
CAB’s implementation. Thus, to maximize the chances of successful regression to the mean, Eagle Pass had the lowest occurrence rate
implementation, CSA processes should include equivalent activi- at follow-up –significantly lower than Laredo. Its occurrence rate
ties, specifically, stakeholder involvement in: did not merely approach the mean over time; it fell significantly
below the mean. This is inconsistent with regression to the mean.
 Collection and analysis of data on implementation performance The most plausible and parsimonious interpretation is that increas-
and outcomes. ingly strong CAB-Switching implementations led to greater
 Feedback on implementation performance to improve the over- improvements in incident rates.
all implementation.5
4.3. Conclusions and recommendations

4.2.3. Implications of baseline occurrence rates


This demonstration pilot shows the potential for CSA to
It is natural for safety interventions to be applied first at sites
improve safety performance and culture in the US freight railroad
where safety is worst. However, confounding an intervention with
industry. However, safety programs cannot be expected to have
the worst baseline safety complicates the interpretation of the
impact unless they are implemented successfully. This article
results of a pre-post quasi-experiment, such as used in this evalu-
focused on the impact of the Union Pacific demonstration pilot at
ation. The treatment site may have the worst safety at baseline for
San Antonio, Texas, while a separate report covers the substantial
multiple reasons, including due to some unusual random transi-
amount of data collected and analyzed concerning the strength
tory factors. Thus, improvements may appear at follow-up due to
of the implementation (Zuschlag et al., 2012). However, questions
regression to the mean when some of the unusual transitory fac-
still remain. Therefore, future research could fruitfully focus on
tors are probabilistically likely to become absent (Rosenthal and
explaining the key elements of a strong implementation to help
Rosnow, 1991).
managers learn how to leverage the money spent on safety.
Also, organizational reactions to a poor safety record can compli-
The FRA is committed to finding and promoting new safety
cate interpretations. Precisely because the treatment site has the
approaches that reduce risk through focusing on unsafe conditions
worst safety record, organizations may formally or informally initi-
and practices that might lead to occurrences rather than taking the
ate multiple simultaneous interventions on the site resulting in con-
traditional approach of focusing on occurrences themselves (FRA,
founds with the intervention. For example, in addition to initiating
2008). CSA is one such risk-reduction method that is apparently
CSA, a manager may increase the frequency of operational safety
workable for the railroad industry.
testing of employees, or the FRA may conduct more thorough safety
Furthermore, CSA appears to have the ability to transform the
inspections. More insidiously, organizational reactions can yield an
safety culture away from the characteristics that limit safety
illusory improvement: a manager with the worst safety record may
improvements and toward proactive, non-disciplinary, cooperative,
be under pressure to improve the apparent record, which may trans-
systems-safety approaches. Indeed, there is reason to believe that
late into under-reporting (Pedersen et al., 2012).
such transformation is underway beyond the demonstration site.
None of these complications were plausibly operating for the
Based on the success of CAB in San Antonio and TSC in UP’s mechan-
occurrence results reported in this paper. The safety measures in
ical departments, UP elected to expand its TSC safety process to the
this evaluation, decertifications and incidents of material damage
transportation departments at all of its locations (Grimaila, 2007).
(mostly derailments), are resistant to under-reporting, unlike
The success of CSA at UP and similar pilot implementations also
minor injuries. Qualitative data collected as part of the implemen-
led to the development and adoption of the FRA’s Railroad Safety
tation evaluation indicated that the only safety interventions at the
Risk Reduction Program in the Rail Safety Improvement Act
treatment sites were those connected to CAB. In the case of decer-
(2008). These pilots also encouraged the implementation of similar
tifications, the treatment site, which was the San Antonio Service
safety-culture programs by other carriers, including Amtrak, Tor-
Unit, had approximately the same chance of decertifications at
onto Transit, and BNSF.
baseline as the comparisons sites. Qualitatively, the treatment
and comparison sites were under the same FRA scrutiny following
Acknowledgments
some high-profile accidents, and management at all sites were
interested in improving safety (e.g., two out three comparison sites
This evaluation would not have been possible without the coop-
started CSA-related processes for their yards during the evaluation
eration of a large number of managers and employees at Union
period). Thus, improvements in decertifications at the treatment
Pacific Railroad.
5
BST’s process included precisely these activities, and the evaluators’ formative The authors would especially like to thank Brian Gorton,
activities are best regarded as a supplement to what was happening anyway. Michael Byars, and Kelvin Phillips for their considerable assistance,
M. Zuschlag et al. / Safety Science 83 (2016) 59–73 73

along with Lance Fritz, Joe Santamaria, Roby Brown, Michael Gamst, F.C., 1982. The development of operating rules. Proc. Railway Fuel Operat.
Officers Assoc. 46 (155), 162–171.
Mitchell, Mark Barnum, Ted Lewis, Shane Keller, Ronald Tindall,
Geller, E.S., 2001. The Psychology of Safety Handbook. CRC Press, Boca Raton, LA.
Greg Burger, John Dunn, Russell Elley, Mike Araujo, Paul Dillon, Grimaila, R., 2007. Testimony Before U.S. House of Representatives Committee on
Carl Eddington, José Gutierez, Wil Hardiman, Chad Jistel, Oscar Transportation and Infrastructure, October 25, 2007.
‘‘Doctor” Mayfield, Fernando Nanez, Pat Pino, Martin Vacca, Mario Harrington, J.J., 1987. The Improvement Process: How America’s Leading Companies
Improve Quality. McGraw-Hill, New York, NY.
Valadez, John Paul Schuster, and Andy Wright. Thanks also to Hofmann, D.A., Stetzer, A., 1996. A cross-level investigation of factors influencing
George Wollard and Jay Finney of Behavioral Science Technology unsafe behaviors and accidents. Pers. Psychol. 49 (2), 307–339.
Inc. (BST) for providing education and insights into the implemen- Hosmer, D.W., Lemeshow, S., May, S., 2008. Applied Survival Analysis: Regression
Modeling of Time-to-event Data, secon ed. John Wiley & Sons, Hoboken, NJ.
tation of methods like Clear Signal for Action in the railroad indus- Howe, J., 1999. Debunking behavior based safety. Occupational Health & Safety.
try. Along with Kelly Johnson and others at BST, George also Newslett. UAW Health Saf. Department 1 (5).
gathered data for us from a survey customized to our require- Juran, J.M., 1964. Managerial Breakthrough: A New Concept for the Manager’s Job.
McGraw-Hill, New York, NY.
ments. Jonny Morell from Fulcrum Corporation and Demetra Collia Krause, T.R., 1995. Employee-Driven Systems for Safe Behavior. Integrating
from the Bureau of Transportation Statistics provided additional Behavioral and Statistical Methodologies. Van Nostrand Reinhold, New York, NY.
technical assistance, and Christopher Nelson of the RAND Corpora- Krause, T.R., 1997. The Behavior-Based Safety Process. Managing Involvement for an
Injury-Free Culture, second ed. John Wiley & Sons, New York, NY.
tion provided insight into realist program evaluation. Special Krause, T.R., Seymour, K.J., Sloat, K.C.M., 1999. Long-term evaluation of a behavior-
thanks to Wayne Nelson for instructing and advising on gap time based method for improving safety performance: a meta-analysis of 73
analysis and survival analysis. Shuang Wu of Computer Sciences interrupted time-series replications. Saf. Sci. 32, 1–18.
Johnson, K., Greenseid, L.O., Toal, S.A., King, J.A., Lawrenz, F., Volkov, B., 2009.
Corporation assisted in the processing and analyses of the data.
Research on evaluation use: a review of the empirical literature from 1986 to
Katherine Blythe, Alison Stieber, Cassandra Oxley, and other 2005. Am. J. Eval. 30 (3), 377–410.
MacroSys staff provided editorial assistance. Lee, E.T., Wang, J., 2003. Statistical Methods for Survival Data Analysis. John Wiley &
The work was performed under an interagency agreement Sons, Hoboken, NJ.
Maguire, B.A., Pearson, E.S., Wynn, A.H.A., 1952. The time intervals between
between the FRA Human Factors Division and the Office of Safety industrial accidents. Biometrika 39 (1–2), 168–180.
Management and Human Factors of the US Department of Trans- McPhee, J., 2005. Coal Train—I: Disassembling the Planet for Powder River Coal.
portation John A. Volpe National Transportation Systems Center. New Yorker, October 3.
Miles, M.B., Huberman, A.M., 1994. Qualitative Data Analysis: An Expanded
Sourcebook, second ed. Sage, Thousand Oaks, CA.
References Nelson, W.B., 1982. Applied Life Data Analysis. John Wiley & Sons, Hoboken, NJ.
Nelson, W.B., 2003. Recurrent Events Data Analysis for Product Repairs, Disease
Bradburn, M.J., Clark, T.G., Love, S.B., Altman, D.G., 2003. Survival analysis Part III: Recurrences, and other Applications. Society for Industrial and Applied
Basic concepts and first analyses. Br. J. Cancer 89, 605–611. Mathematics, Philadelphia, PA.
Cohen, J., Cohen, P., West, S.G., Aiken, L.S., 2003. Applied Multiple Regression/ Patton, M.Q., 2002. Qualitative Research and Evaluation Methods, third ed. Sage,
Correlation Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Thousand Oaks, CA.
Mahwah, NJ. Pawson, R., 2002. Evidence-based policy: the promise of realist synthesis.
Cook, R.J., Lawless, J.F., 2010. The Statistical Analysis of Recurrent Events. Springer Evaluation 8 (3), 340–358.
Science + Business Media, New York, NY. Pawson, R., Greenhalgh, T., Harvey, G., Walshe, K., 2005. Realist review – a new
Coplen, M.K., 1999. Compliance with Railroad Operating Rules and Corporate method of systematic review designed for complex policy interventions. J.
Culture Influences. Results of a Focus Group and Structured Interviews. Federal Health Serv. Res. Policy 10 (suppl. 1), 21–34.
Railroad Administration, DOT/FRA/ORD-99/09, DOT-VNTSC-FRA-97-7. Pawson, R., Tilly, N., 1997. Realistic Evaluation. Sage, Thousand Oaks, CA.
Coplen, M.K., Ranney, J., 2009. Safe Practices, Operating Rule Compliance, and Pedersen, L.M., Nielsen, K.J., Kines, P., 2012. Realistic evaluation as a new way to
Derailment Rates Improve at Union Pacific Yards with STEEL Process – A Risk design and evaluate occupational safety interventions. Saf. Sci. 50, 48–54.
Reduction Approach to Safety. Research Results. Federal Railroad Phimister, J.R., Bier, V.M., Kunreuther, H.C., 2004. The accidents precursors project:
Administration, RR09-08. <http://www.fra.dot.gov/eLib/Details/L04248> overview and recommendations. In: Phimister, J.R., Bier, V.M., Kunreuther, H.C.
(10.10.14). (Eds.), Accident Precursor Analysis and Management: Reducing Technological
Crano, W.D., Brewer, M.B., 2002. Principles and Methods of Social Research, second Risk through Diligence. National Academies Press, Washington, DC.
ed. Lawrence Erlbaum Associates, Mahwah, NJ. Railroad Safety Risk Reduction Program in the Rail Safety Improvement Act of 2008,
Creswell, J.W., 2003. Research Design: Qualitative, Quantitative, and Mixed 2008. 49 U.S.C. §20156.
Methods Approaches, second ed. Sage, Thousand Oaks, CA. Ranney, J., Nelson, C., 2003. Impacts of Participatory Safety Rules Revision in U.S.
Dastmalchian, A., Blyton, P., Adamson, R., 1989. Industrial relations climate: testing Railroad Industry: An Exploratory Assessment. Federal Railroad Administration,
a construct. J. Occup. Psychol. 62, 21–32. DOT-VNTSC-FRA-02-05.
Dejoy, D.M., 2005. Behavior change versus culture change: divergent approaches to Ranney, J., Zuschlag, M., Coplen, M., Nelson, C., 2010. Behavior-based Safety at
managing workplace safety. Saf. Sci. 43, 105–129. Amtrak-Chicago, Associated with Improved Safety Culture and Reduced Injuries
Deming, W.E., 2000. Out of the Crisis. MIT Press, Cambridge, MA. and Costs. Manuscript in Preparation (copy on file with author).
Department of Defense, 1996. Handbook for Reliability Test Methods, Plans, and Reason, J., 1997. Managing the Risks of Organizational Accidents. Ashgate,
Environments for Engineering, Development, Qualification, and Production. Aldershot, UK.
MIL-HDBK-781A. Reason, J., 2003. Managing Maintenance Error. Ashgate Publishing Company, UK.
Egan, M., Bambra, C., Petticrew, M., Whitehead, M., 2009. Reviewing evidence on Rosenthal, R., Rosnow, R.L., 1991. Essentials of Behavioral Research: Methods and
complex social interventions: appraising implementation in systematic reviews Data Analysis, second ed. McGraw-Hill, New York, NY.
of the health effects of organisational-level workplace interventions. J. Rossi, P.H., Freeman, H.E., Lipsey, M.W., 1999. Evaluation: A Systematic Approach,
Epidemiol. Community Health 63 (1), 4–11. seventh ed. Sage, Thousand Oaks, CA.
Federal Railroad Administration, 2001. Switching Operations Fatality Analysis. Shadish, W.R., Cook, T.D., Campbell, D.T., 2002. Experimental and Quasi-
Severe Injuries to Train and Engine Service Employees: Data Description and Experimental Designs for Generalized Causal Inference. Houghton Mifflin,
Injury Characteristics. Switching Operations Fatality Analysis (SOFA) Working Boston.
Group. <www.fra.dot.gov/downloads/safety/sofa/SOFA_Injury.pdf> (05.01.11). Simard, M., Marchand, A., 1994. The behaviour of first-line supervisors in accident
Federal Railroad Administration, 2006. Railroad Safety Statistics 2005 Annual prevention and effectiveness in occupational safety. Saf. Sci. 17, 169–185.
Report. FRA Office of Public Affairs. <http://safetydata.fra.dot.gov/officeofsafety/ Simard, M., Marchand, A., 1997. Workgroups’ propensity to comply with safety
ProcessFile.aspx?doc=bull2005-book.exe> (May, 2011). rules: the influence of macro-micro organizational factors. Ergonomics 40, 172–
Federal Railroad Administration, 2011. FRA guide for preparing accident/incident 188.
reports. FRA Office of Safety, DOT/FRA/RRS-22. <http://safetydata.fra.dot.gov/ Spigener, J.B., Hodson, S.J., 1997. Are labor unions in danger of losing their
OfficeofSafety/ProcessFile.aspx?doc= leadership position in safety? Prof. Saf. (12), 37–39, 1997
FRAGuideforPreparingAccIncReportspubMay2011.pdf> (July, 2013). U.S. Government Accountability Office (GAO), 2009. Program Evaluation: A Variety
Federal Railroad Administration, 2008. The FRA Risk Reduction Program: A New of Rigorous Methods Can Help Identify Effective Interventions. Report to
Approach for Managing Railroad Safety. White Paper. FRA Office of Research and Congressional Requesters. GAO-10-30, November 23, 2009.
Development Division, Human Factors Program. <http://www. Walton, M., 1986. The Deming Management Method. Berkley Publishing, New York,
fra.dot.gov/downloads/safety/ANewApproachforManagingRRSafety.pdf> NY.
(05.01.11). Yin, R.K., 2009. Case Study Research, fourth ed. Sage, Thousand Oaks, CA.
Frederick, J., Lessin, N., 2000. Blame the worker: the rise of behavioral-based safety Zuschlag, M., Ranney, J., Coplen, M., Harnar, M., 2012. Transformation of Safety
programs. Multinat. Monit. 21 (11), 10–17. Culture on the San Antonio Service Unit of Union Pacific Railroad. Federal
Funnel, S.C., Rogers, P.J., 2011. Purposeful Program Theory: Effective Use of Theories Railroad Administration, DOT/FRA/ORD-12/16, DOT-VNTSC-FRA-10-07. <http://
of Change and Logic Models. Jossey-Bass, San Francisco. www.fra.dot.gov/eLib/details/L04121> (17.12.12).

You might also like