Professional Documents
Culture Documents
Safety Science: Michael Zuschlag, Joyce M. Ranney, Michael Coplen
Safety Science: Michael Zuschlag, Joyce M. Ranney, Michael Coplen
Safety Science
journal homepage: www.elsevier.com/locate/ssci
a r t i c l e i n f o a b s t r a c t
Article history: The Federal Railroad Administration (FRA) sponsored a multiyear pilot demonstration of Clear Signal for
Received 1 February 2013 Action (CSA), a safety culture intervention implemented with Behavioral Science Technology Inc., at a
Received in revised form 26 August 2015 Union Pacific (UP) service unit. CSA combines peer-to-peer feedback, continuous improvement, and
Accepted 1 October 2015
safety-leadership development. The US Department of Transportation John A. Volpe National
Transportation Systems Center conducted an independent program evaluation of the pilot, using qualita-
tive and quantitative measures. The evaluation found that, over two years, the site experienced signifi-
Keywords:
cant improvements in safety outcomes, operations, and safety culture, including an 80% drop in at-risk
Peer-to-peer feedback
Railroad
behaviors, a 79% decrease in engineer decertification rates, a 81% decline in the rate derailments and
Changing At-Risk Behavior (CAB) other incidents, and better labor–management relations. Comparison locations showed no improvements
Continuous improvement in the decertifications or derailments. The success of the pilot, in addition to successes UP had earlier with
Safety culture CSA-type processes, encouraged UP to expand these processes throughout its transportation department.
Safety leadership The success of this pilot and other similar pilots led to the development and adoption of the FRA’s
Railroad Safety Risk Reduction Program in the Rail Safety Improvement Act of 2008, and the implemen-
tation of similar safety-culture programs by other carriers.
Ó 2015 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.ssci.2015.10.001
0925-7535/Ó 2015 Elsevier Ltd. All rights reserved.
60 M. Zuschlag et al. / Safety Science 83 (2016) 59–73
To improve safety, the FRA Human Factors Division is exploring sometimes several managers, develops a checklist of safe and at-
new approaches that counteract these cultural tendencies. These risk worker behaviors and working conditions based on analyses
approaches achieve this by incorporating the following features, of injury reports and other sources of safety information (Krause,
which are characteristic of a positive safety culture (Reason, 1997). The steering committee recruits, trains, and coaches work-
1997; Phimister et al., 2004): ers to be ‘‘peer observers,” who observe the safety of their cowork-
ers (overtly, with their permission) then conduct with them
Nondisciplinary: Seeking to improve safety without punishment anonymous nonconfrontational feedback sessions devoid of any
or blame through protective elements such as worker disciplinary connections. The feedback includes both acknowledg-
anonymity. ing any observed safe behavior, and discussing any observed at-
Proactive: Collecting data on at-risk behaviors and conditions to risk behavior. By focusing on the behavioral and conditional ante-
prevent associated accidents or injuries before they occur, and cedents of accidents, CSA seeks to proactively prevent accidents
thus reduce the incentives for workers and managers to blame before they occur.
each other.
Systems-safety-analysis orientation: Gathering and using rich 1.2.1.2. Continuous improvement (CI). Within the CI component,
objective data to identify underlying organizational factors in workers are trained to interview their coworkers during the feed-
safety. back sessions about the coworkers’ explanations for any observed
Cooperative: Engaging stakeholders within both management at-risk behaviors or conditions. Thus, a feedback session has feed-
and labor. back from the observed peer to the observer, in addition to from
Sustainable: Including mechanisms for long-term sustainment. observer to peer. The observing worker records on the checklist
all data on the behaviors and the explanations. The steering com-
These features improve safety by creating an environment mittee aggregates and objectively analyzes these data through
where individuals freely exchange information upward, down- root-cause problem-solving to identify the systemic causes for bar-
ward, and laterally across the organizational hierarchy, providing riers to enhancing safety. Potential systemic causes include organi-
the open communication necessary to solve safety problems. zational policy, training, tool design, environmental conditions,
This paper presents an evaluation of one such approach, the procedures, and cultural aspects. The steering committee executes
FRA’s Clear Signal for Action (CSA) applied to a transportation corrective actions against barriers that it can remove, for example,
department. With funding and sponsorship from the FRA, Behav- through feedback to workers during PPF sessions. If the barriers
ioral Science Technology Inc. (BST) actively designed and imple- require actions beyond the authority of the steering committee,
mented the demonstration pilot. The US Department of such as new equipment purchase or procedures changes, a joint
Transportation John A. Volpe Center, also with sponsorship from labor–management barrier removal team reviews and prioritizes
the FRA, independently conducted a formative and summative the barriers, then develops corrective actions, which management
evaluation (Rossi et al., 1999). This article presents a summary of executes. Data-gathering continues after a corrective action is
the summative evaluation. deployed, to allow its effectiveness to be evaluated.
1.2. Clear Signal for Action (CSA) 1.2.1.3. Safety-leadership development (SLD). Within the SLD com-
ponent, managers are trained in effective nondisciplinary, proac-
1.2.1. CSA implementation tive techniques for enabling employees to work safely, including
CSA integrates three approaches that have been applied previ- but not limited to supporting safety-related activities such as feed-
ously to improve safety proactively: back sessions and barrier removal. These SLD processes are con-
ducted parallel to existing disciplinary processes. SLD activities
Peer-to-peer feedback (PPF), where workers observe each other, are not a substitute for addressing rules violations.
and exchange feedback about the safety of their behavior, con-
ditions, and organizational factors (Geller, 2001; Krause, 1995). 1.2.2. Integration of behavioral and safety culture approaches
Continuous improvement (CI), where workers and managers When used alone, PPF approaches have often placed too little
cooperatively gather and analyze data to identify systemic emphasis on the influence of upstream managers, systems, and
causes of observed at-risk behaviors and conditions, and then policies and procedures on at-risk behavior and conditions, result-
implement corrective actions to address the causes ing in negative reviews from several unions (Spigener and Hodson,
(Harrington, 1987; Juran, 1964; Krause, 1995). 1997; Howe, 1999; Frederick and Lessin, 2000). Thus, recent vari-
Safety-leadership development (SLD), where managers are ants of PPF, such as CSA, have acknowledged that behavioral-
trained to promote proactive safety practices such as PPF and oriented safety interventions can complement culture-oriented
CI (Krause et al., 1999). safety interventions such as CI and SLD. (DeJoy, 2005). These new
variants integrate PPF with CI, utilizing the peer-to-peer sessions
Fig. 1 illustrates a combined theory of action and theory of as opportunities to collect the data needed by CI. SLD encourages
change (Funnel and Rogers, 2011) for CSA. managers to implement corrective actions that need management
Detailed CSA activities are listed in the box headed Implementa- support and otherwise targets ‘‘latent” factors in accidents and
tion, and their theoretical outcomes are depicted in the two columns injuries that are further back in the chain of causation, such as
of boxes designated Proximal Outcomes and Distal Outcomes. Prox- safety climate and culture (Reason, 1997). SLD trains organiza-
imal outcomes result directly from implementation activities, while tional leadership to eliminate these causes since it has the
distal outcomes are mediated by proximal outcomes. The arrows resources and authority to alter the direction of the organization.
indicate the effects of prior activities and outcomes on subsequent It also trains managers to provide the necessary resources and
ones, with influence moving primarily in a left-to-right direction. the integration of CSA into other safety programs so it becomes
Within the Implementation box, activities are grouped according institutionalized. SLD can therefore accelerate changes initiated
to their primary association with PPF, CI, or SLD. by PPF and CI and make them lasting characteristics of the organi-
zation’s safety culture.
1.2.1.1. Peer-to-peer feedback (PPF). To initiate the PPF component, By combining PPF, CI, and SLD, responsibility for safety is
a local CSA process steering committee, composed of workers and distributed among workers and managers. PPF activities are
M. Zuschlag et al. / Safety Science 83 (2016) 59–73 61
Fig. 1. Theory of action for CSA, showing activities and theoretical outcomes (PPE: personal protective equipment).
predominantly or fully workers’ responsibility since peers make Safety culture: The factors that determine an organization’s
observations and provide feedback. Because SLD changes organiza- (labor and management) commitment, style, and proficiency
tional leadership, it is fully managers’ responsibility. CI is a joint in ensuring safety that result from safety-related beliefs, values,
responsibility, but because management is accountable for ensur- attitudes, competencies, and behavioral patterns (Reason, 1997,
ing that the systemic corrective action process is effective, man- 2003). This evaluation is specifically concerned with attitudes
agers have more responsibility than workers (Deming, 2000; and behavioral patterns related to labor–management trust
Walton, 1986). Involving both parties in safety helps to embed a (Reason, 1997, 2003), the lack of which has theoretically limited
safety process into the culture by giving each party a stake in its safety advances in the railroad industry (Coplen, 1999).
successful outcome. In contrast, when a safety process is solely Occurrences: Safety events associated with injuries; fatalities;
the responsibility of either labor or management, it will tend to damage to equipment, for example, from derailments; and asso-
have a more limited impact on the non-responsible party. CSA is ciated close calls, for example, safety-rule violations such as
thus an integration of behavioral-oriented and culture-oriented engineer decertifications.
safety interventions (DeJoy, 2005). PPF improves safety and culture
from the bottom up via the activities of workers, SLD improves As depicted by the arrows in Fig. 1, the implementation directly
safety and culture from the top down via the activities of man- promotes the targeted safety practices of both managers and work-
agers, and CI represents the meeting of the top-down and ers through its SLD and PPF components, respectively. Workers’
bottom-up processes, where workers and managers share informa- practices change as a result of both the feedback sessions and
tion on safety problems and solutions. the training required to perform them (Ranney et al., 2010). The
CI process directly improves behavioral and systemic conditions
1.2.3. Theoretical outcomes on safety performance and culture identified in data analysis.
The theory of action for CSA shown in Fig. 1 specifies the out- There are reciprocal effects between specific practices and
comes on the following: safety culture. Changes in safety attitudes encourage workers and
managers to change their practices. Conversely, changes in prac-
Worker practices: The at-risk and safe behaviors identified in the tices may change attitudes (Krause, 1995). Safer practices also
PPF checklist and targeted by feedback sessions. Examples improve the culture by building better labor–manager relations.
include walking with eyes on the path and following crew- As managers see workers encouraging each other to work more
safety communication procedures. safely, their trust that workers will perform tasks safely increases
Systemic conditions: Aspects of the physical and organizational and they perceive less of a need for discipline. With managers
environment that impact safety, such as facilities, tools, train- engaging in more proactive, non-disciplinary approaches to safety,
ing, and disciplinary policies. Examples include switches that workers are assured of management’s commitment to their safety.
are in good operating condition, availability of proper tools As discipline becomes less frequent and management’s commit-
and personal protective equipment (PPE), and training in safe ment to safety becomes more apparent, workers trust managers
worker practices such as the proper way to align a coupler. more.
Management practices: On-the-job manager behaviors that take Improved systemic conditions also improve labor–management
the form of proactive, nondisciplinary promotion of safe behav- relations within the safety culture by demonstrating manage-
iors, conditions, and processes. Examples include coaching ment’s commitment to safety. Improved labor–management rela-
employees on safe behavior and encouraging participation in tions can reciprocally improve systemic conditions by fostering
safety processes. These practices are in addition to implementa- labor–management and cross-trade cooperation in improving
tion activities such as consistently providing the necessary bud- safety conditions.
get, equipment (e.g., computers for data analysis), and physical Safer practices and improved systemic conditions reduce occur-
space for training and operations. rences, as does the safety culture through increased dialogue about
62 M. Zuschlag et al. / Safety Science 83 (2016) 59–73
safety. Collectively, these changes result in improved worker and The newer workers were somewhat less distrustful of
manager safety practices beyond those specifically targeted by management.
implementation.
In summary, CSA improves safety culture simultaneously with
1.3.2. CAB structure and initiation
safety performance through ‘‘boot-strapping” reciprocal causation.
CAB leadership and day-to-day operations were executed by
two full-time ‘‘facilitators”: an engineer from BLET and a conductor
from UTU. The steering committee comprised an additional eight
1.3. CAB: An implementation of CSA
workers, who met approximately once a month. A newly arrived
superintendent (head of the service unit) supported the implemen-
1.3.1. Context and motivations
tation, in part due to seeing CSA-like Total Safety Culture success-
Between 2005 and 2008, the FRA sponsored a CSA demonstra-
fully implemented in a mechanical department in the UP Central
tion pilot in the transportation department on the San Antonio Ser-
Region. The superintendent appointed a senior service-unit man-
vice Unit of the Southern Region of Union Pacific Railroad (UP).
ager to serve part-time as the local management sponsor for
Transportation departments in railroads comprises road opera-
CAB, acting as chief liaison between the steering committee and
tions, which are driving trains long distances on main line track
management.
between terminals, and switching operations, which are sorting
BST’s model of CSA includes customizing it to fit with local con-
cars in yards. Behavioral Science Technology Inc. (BST), a company
texts. For example, the stakeholders at San Antonio chose not to
that has implemented CSA-like programs in a broad range of indus-
focus at first on traditional industrial safety threats, such tripping
tries, designed and instructed the implementation of the San Anto-
hazards and pinch points. Instead, CAB initially focused on behav-
nio CSA process, which local stakeholders named Changing At-Risk
iors to improve alertness and teamwork for locomotive cab opera-
Behavior (CAB).
tions on the road. Its focus was limited to practices related to high-
Like all the service units of UP, the San Antonio Service Unit
workload situations, such as operating under constraining signals,
comprised hundreds of transportation workers, most working out
a condition that UP calls Cab Red Zone (CRZ). Fourteen months
of a central hub (in this case, the city of San Antonio), with others
after its origination, CAB expanded to include safety in yard-
working out of peripheral terminals up to 150 miles away (e.g.,
switching operations, using a different checklist of behaviors. In
Laredo and Del Rio). CAB covered approximately 1100 workers;
the current paper, the implementations are distinguished as
the size of the workforce remained unchanged throughout the
CAB–CRZ and CAB-Switching. The evaluation of CAB thus com-
evaluation period, and turnover was low.
prised two phases: a first phase, from August 2005 through
During the CAB implementation, UP and the FRA started a sim-
September 2006, which included only CAB–CRZ, and a second
ilar CSA demonstration pilot on the Livonia Service Unit (Coplen
phase, from October 2006 to about January 2008, which included
and Ranney, 2009) and UP, acting independently from the FRA, ini-
CAB-Switching and the continuation of CAB–CRZ. The evaluation
tiated a PPF process on the Houston Service Unit. BST provided
also included a four-year baseline period prior to the first phase,
consultation services for Livonia, and, to a lesser extent, Houston.
for which data indicated the site’s state prior to CSA
Livonia and Houston were also in the Southern Region, although
implementation.
their safety processes were limited to switching operations, while
The CAB process began in August 2005 with the initiation of
CAB covered both switching and road operations.
regular peer-to-peer feedback sessions. Often the employees
UP was interested in partnering with FRA with these demon-
observed their peers while engaged in their normal work (e.g., a
stration pilots because they had been working with the three ele-
conductor may observe an engineer across the cab of a locomo-
ments of CSA – PPF, CI, and SLD – since 1988. In 1993, UP had
tive). However, many employees also dedicated entire days to con-
established a successful CSA-type program entitled Total Safety
duct only PPF sessions (e.g., ride in several different cabs per day,
Culture (Geller, 2001) in its mechanical department. After seeing
observing both engineer and conductor). For each observation,
it succeed there, UP wanted to see if it could work in transportation
the observer and peer completed the feedback exchange at a safe
where most accidents occur. UP was particularly focused on the
and convenient break in the work (e.g., while the train waited at
Southern Region because a series of high-profile accidents there
a siding for another train to pass). Employees would choose to
had increased FRA scrutiny of the region’s safety. The severity of
observe primarily based on available opportunities, especially for
these particular accidents may have justified the scrutiny, but
CAB–CRZ. CRZ conditions may or may not occur during a trip,
the Southern Region’s levels of safety through that time were actu-
and the ability of an employee to observe the crew of a particular
ally steady or improving (Zuschlag et al., 2012), and UP’s overall
train depended on the ability to arrange rendezvouses with the
injury rate was lower than most US railroads (FRA, 2006).
train. However, especially for CAB-Switching, the steering commit-
Local managers and workers, along with their unions, the
tee also chose to focus PFF sessions on locations where they felt it
Brotherhood of Locomotive Engineers and Trainmen (BLET) and
was needed most. The criteria for choosing locations did not
the United Transportation Union (UTU), worked together to imple-
change over the course of the evaluation period.
ment CAB. The recent serious accidents in the region raise both
manager and worker awareness of safety, and encouraged the
two parties to cooperate for improvements despite the historic dis- 1.3.3. CAB evolution and challenges
trust between them. Labor support for CAB was also abetted by an Approximately halfway through the evaluation period, the
influx of new workers throughout the corporation prior to the eval- superintendent was promoted and left the service unit. In UP, this
uation period. While average UP years-of-service for transportation tenure of one to two years is fairly typical for a railroad superinten-
workers at SASU was 16 years, the influx created a bimodal distri- dent, and was observed in other service units of the UP Southern
bution. One side of the distribution, comprising two-thirds of the Region. CAB facilitators briefed the newly appointed superinten-
workers, averaged 7 years of service, while the other side averaged dent on the process, and he continued to strongly support the
33 years of service. The worker’s age distribution, likewise, was implementation. At about the same time, the local management
bimodal, with one side averaging 35 years of age and the other sponsor was also promoted and left the service unit. His lieutenant,
averaging 53 years of age. The influx of relatively new workers who had previously earned credibility with the workers for his
facilitated support for CAB because older workers had developed support of CAB among other safety work, became the new local
a deep distrust of the management over decades of employment. management sponsor.
M. Zuschlag et al. / Safety Science 83 (2016) 59–73 63
By the end of the evaluation period, over half the workforce had to more than four PPF sessions per worker, the total number of PPF
been trained to conduct PPF sessions, a rate somewhat greater than sessions remained below what had been planned at the start of the
expected at the start of implementation. The higher rate was implementation. However, by other measures, the implementation
achieved because the superintendent arranged that additional was very strong. For example, as related above, management was
training sessions be held when rail traffic was low and manpower motivated to make CSA successful and supplied the worker-hours
demands were reduced (e.g., in January of each year). Originally and other resources to support it. In addition workers were trained
the steering committee conducted training only at a terminal in faster than planned. These characteristics, combined with the com-
the city of San Antonio, but over the course of the evaluation it pletion of the program activities (e.g., data analysis and corrective
expanded training to the peripheral terminals. During this time, action execution), implied a strong implementation (Egan et al.,
three of the original steering committee members were systemat- 2009).
ically replaced by other workers, consistent with the BST consul- Trust between management and labor at the site was problem-
tant’s advice for terms limits and rotation. Two of the new atic, a condition that augurs poorly for safety process success
members were from the outlying terminals to provide balanced (Reason, 1997). However, the parties were inclined to cooperate
representation. more than normal for a railroad due to joint concern over the
To support CAB across the service unit, the superintendent set a recent serious accidents. The distrust was also counteracted by
fixed monthly ‘‘budget” of worker-hours for CAB training, PPF ses- the role behaviors of the leadership (Pedersen et al., 2012). Both
sions, analysis, and other CAB activities. In an unusual move for a the superintendent and regional vice president resisted the rail-
railroad, the superintendent delegated responsibility for allocating road inclination to exert managerial control over the process,
the budget among CAB activities to the facilitators demonstrating which would have likely led to worker distrust and disenchant-
commitment to the program and trust in the facilitators. Normally, ment. Instead, the leadership empowered the workers by giving
a manager (e.g., the management sponsor), rather than a worker, them control over the process and budget, encouraging them to
would receive such responsibility. Through the first year of the find their own solutions to problems. This concrete and visible
implementation, management was pleased with the way in which way of going above and beyond the normal procedures demon-
the facilitators allocated the budget. strated commitment to the program and a show of good faith
However, as the implementation expanded to peripheral ter- which likely invested the workers in the process, helped grow
minals and to switching operations in the second year of imple- the trust in the leadership.
mentation, the facilitators had increasing difficulty completing In summary, the implementation, while not perfect, was rea-
all activities. The rate of PPF sessions, a metric of implementation sonably strong and complete. Any lack of observable outcomes
strength, was falling far below the planned level. The facilitators for safety or safety culture cannot be attributed to implementation
requested a larger budget commiserate with the expansion of failure, and, conversely, any observed outcomes may be reasonably
CAB, but, due to external economic and organizational pressures, attributed to the CSA implementation (Egan et al., 2009).
the superintendent could barely maintain the existing budget.
The facilitators appealed to the superintendent’s superior, the
1.4. Evaluation question
transportation vice president for the region. The typical railroad
management response to such a crisis would be to exercise ‘‘com-
This paper is summative evaluation of a pilot demonstration of
mand and control,” and takeover budgeting responsibility from
CSA for outcomes for safety or safety culture, comprising a
the workers. However, the vice president instead issued the
‘‘bottom-line” assessment of the effectiveness of CSA at the site
facilitators a challenge: for every worker-day of CAB administra-
(Rossi et al., 1999). For this purpose, the evaluation question was,
tion the facilitators eliminated, the vice president would increase
‘‘What are the effects of the CSA process on safety and safety cul-
the budget by one-and-a-half worker-days. The steering
ture?” The evaluation looked for improvements in:
committee rose to the challenge and reduced administration
costs by 20%, earning a 30% budget increase. The combined
Worker practices.
increase in efficiency and budget size allowed an increase
Occurrences, specifically incidents of material damage and
in the rate of PPF sessions and the expansion to the entire
engineer decertifications.
service unit.
Safety culture, specifically the presence of a ‘‘trust culture” in
labor–management relations.
1.3.4. Implementation strength
For the sake of relative brevity, this paper excludes a formal
Additional evaluation questions and the formative evaluation
evaluation of implementation, and instead focuses on the evalua-
are addressed in Zuschlag et al. (2012).
tion of the outcomes, especially the effects of CSA on safety and
safety culture. Details of the context, pre-existing mechanisms
(Pedersen et al., 2012), and the implementation evaluation are in 2. Evaluation design, procedures, and data analysis
a full government report on the demonstration pilot (Zuschlag
et al., 2012). Only a brief summary is provided herein. Randomized controlled trials (RCT) were infeasible for this eval-
The implementation evaluation, using quantitative and qualita- uation, as is often the case for studies concerning occupational
tive measures, found CAB to be sufficiently strong to proceed with safety programs (Pedersen et al., 2012). Instead, the evaluation
the outcome evaluation (Rossi et al., 1999). Management acted as used a mixed-methods design, with both quantitative and qualita-
‘‘safety climate engineers” (Simard and Marchand, 1994, 1997) tive methods (Creswell, 2003). A mixed-methods design has
providing the necessary resources for the process and enlisting advantages over RCT for assessing the effectiveness of organiza-
labor officials and workers to support the program. Worker sup- tional change efforts, such as safety culture interventions, which
port for the process grew over the evaluation period in part due are embedded in a complicated system. For example, CAB operated
to knowledge that the all-worker steering committee maintained in the context of complications such as customized program imple-
control of the feedback session data to ensure no negative reper- mentations and leadership turn-over. The evaluation period for
cussions for participating workers. CAB spanned approximately two and a half years (August 2005
At the end of the evaluation period, the workers had partici- to about January 2008), providing ample time for extraneous
pated in approximately 4800 PPF sessions. While this corresponds events. By combining qualitative and quantitative methods for
64 M. Zuschlag et al. / Safety Science 83 (2016) 59–73
multiple data sources, the mixed-method design can account for Having members of the same population rate the same videos
the impact of context and extraneous events that influence the allowed calculation of the consistency of the ratings, as shown in
ability of the program to produce outcomes. Specifically in this Table 1.
evaluation, we used qualitative methods to peer inside the ‘‘black Reliability is necessary in order for actual changes in practices to
box” of the program and directly observe the influence of the con- be statistically detectable with CAB feedback session data. Time
text on the mechanism comprising the program (Pawson and Tilly, drift represents a systematic tendency for the same practices to
1997; Pawson, 2002). In addition, when a detailed theory of change be rated as safer (positive drift) or more at-risk (negative bias) in
drives selection of the measures, the mixed-methods design rivals later than in earlier training. Retest drift represents a tendency for
randomized control trials in an ability to make plausible inferences the same practices to be rated as safer (positive) or more at-risk
of causation (GAO, 2009). This summative evaluation used a con- (negative) relative to how long ago an individual worker was
current nested research strategy (Creswell, 2003): the quantitative trained.
data were considered the primary measure for improvements in Complete ratings of training videos were collected from 108
safety or safety culture, while the qualitative data provided expla- peer observers at the end of training and from 13 peer observers
nations and insight into the processes associated with the quanti- during subsequent coaching. Reliability was significantly higher
tative results. Qualitative data also provided an opportunity to than 80% (83.06%, t(107) = 4.56, p < 0.0001); this is above the min-
capture unexpected outcomes. imal standard for inter-judge reliability that is often used in eval-
Qualitative measures of outcomes, utilizing a case-study uation research (Rossi et al., 1999). Time drift was not significant
methodology (Yin, 2009), included primarily open-ended inter- (r = 0.0312, n = 108, p = 0.749), indicating that peer observers
views with workers and managers and were used for explanatory trained at the beginning of the evaluation period rated the same
purposes. Quantitative measures included: sessions as equally at-risk as those trained at the end of this period.
However, the analysis for retest drift showed that peer observers
PPF feedback session data. rated the videos as more at-risk during coaching than during train-
Corporate safety data supplied by UP. ing (average difference, 7.50%; t(106) = 2.05; p < 0.0427), sug-
Close-ended attitude and behavior surveys of workers and gesting that experienced peer observers became stricter over
managers. time. The proportion of experienced peer observers increased
throughout the evaluation period since observers trained in the
In addition to the above measures for the outcomes of CSA, the beginning stayed to conduct feedback sessions at the end. This sug-
evaluation included measures for an implementation evaluation gests the possibility that the true average at-risk scores may be
and the assessment of the role of initial conditions and events in lower than reported for later periods in the evaluation and that
establishing CSA at the site (as summarized above in Section 1.3 any improvement in practices that may be found would in fact
and detailed in Zuschlag et al., 2012). These measures included be greater than indicated.
field notes, process artifacts, and project records as additional qual- Overall, the feedback session ratings appeared to be adequate or
itative data sources for assessing the context and mechanisms. By perhaps even conservative for measuring positive changes in
using a theory of change, combining qualitative and quantitative practices.
methods, and performing an implementation evaluation to assess
if an implementation failure occurred, the method is thus consis- 2.2. Corporate safety data
tent with a realistic model of evaluation (Pedersen et al., 2012;
Pawson et al., 2005). UP provided corporate safety data from its own tracking sys-
tems to allow the calculation of changes in the rates of relevant
safety occurrences since the start of CAB. The number of occur-
2.1. Feedback session data rences in the dataset was too few and the information on them
too sparse to allow useful sub-categorization of the occurrences
In accordance with BST training and materials, the CAB steering in order to analyze for changes in the character of the occurrences.
committee collected the checklists completed during feedback ses- Interviews with managers revealed no evidence of changes in the
sions and scored the data on each checklist to record which behav- character of occurrences at any site.
iors were performed safely and which were at-risk. For purposes of
evaluation, the overall percentage of at-risk behaviors for the ser- 2.2.1. Engineer decertifications
vice unit for each month of the evaluation period provided an indi- The railroad industry defines an engineer decertification as an
cation of that month’s safety of worker practices targeted by CAB– occurrence in which a locomotive engineer loses FRA authorization
CRZ and CAB-Switching. Autocorrelations of the data (Cohen et al., to run trains due to a serious safety violation. In this evaluation,
2003) were not significant, indicating that the at-risk percentages decertifications served as a leading indicator of catastrophic road
were independent month-to-month, and that ordinary regression accidents, which can easily result from the associated safety
analyses, rather than a time-series analyses, were suitable. If CAB violations.
is effective at improving CRZ or switching practices, then the per- Decertifications are a relatively objective measure avoiding the
centage of at-risk behaviors should, on average, decrease over the measurement problems associated with minor injuries (Pedersen
months of the evaluation period. et al., 2012). Automated electronic devices aboard the trains and
An analysis of data from training videos assessed the consis- along the tracks detect most of decertifications the evaluation ana-
tency of worker recordings of at-risk behaviors. CAB–CRZ trainers lyzed, making such occurrences resistant to over-reporting and
showed one of two videos at the end of peer-observer training. under-reporting. The decertification process involves an investiga-
Each scripted video portrayed a typical mix of safe and at-risk tion including the railroad management, the union, and the FRA,
behaviors for five CRZ events. Trainees used their checklists to and includes documented evidence and a hearing, where all parties
record safe and at-risk behaviors seen in the videos. The same crosscheck the evidence and conclusions. This crosschecking by
videos were used for the coaching of many peer observers some- parties with different interests further reduces the chance of bias.
time after training, but during coaching each peer observer saw a The data were limited to three types of decertifications that are
different video than that seen during training. associated with CRZ practices because each often results from a
M. Zuschlag et al. / Safety Science 83 (2016) 59–73 65
longer than Eagle Pass’s. By the end of the evaluation period, about Analyses of gap times and other statistical analyses of elapsed
half its workers at the San Antonio Complex were trained, giving it time until an event are routine in medical research (Lee and
an intermediate implementation strength at that time. Wang, 2003; Bradburn et al., 2003) and reliability engineering
In addition to having different number of workers, the three (Department of Defense, 1996). For example, mean time between
locations had different baseline occurrence rates (v2(2) = 17.25, failures is a familiar statistic from reliability engineering. In this
p = 0.0002). The San Antonio Complex had the highest rate evaluation, safety occurrences are treated no differently than fail-
(12.15 incidents per 100,000 car-moves), but this was not signif- ures in a population of replaceable components (Nelson, 2003).
icantly different than the rate at Eagle Pass (9.69, two Poisson In contrast, analysis of railroad safety data traditionally represents
occurrence rate comparison (Nelson, 1982) F(50, 196) = 1.204, data points as the occurrence rates for a convenient block of time
p = 0.188). The rates for the San Antonio Complex and Eagle Pass (e.g., the monthly decertification rate) rather than gap-times. How-
were each significantly higher than the rate for Laredo (5.93, F ever, for relatively sparse frequencies of occurrences, such as found
(98, 196) = 2.008, p < 0.0001, and F(98, 48) = 1.601, p = 0.036, at a single railroad service unit, analyses of such block-time data
respectively). have lower statistical power than analysis of gap times. Block-
Thus, as a group, the three locations were more similar than dif- time analysis can also invalidate the normality and homoscedastic-
ferent: two out of three were approximately the same size, and two ity assumptions of parametric statistical analyses (Zuschlag et al.,
out of three had approximately the same baseline occurrence rate. 2012). Gap time data are thus preferred for this data set.
In other areas, all three locations shared the same characteristics as
listed above for the four service units of the Southern Region. In 2.2.3.3. Survival analysis. Most inferential statistical analysis
addition, the three locations shared the same superintendent and requires that the gap times be statistically independent and iden-
service-unit-level of management. Most crucially, the qualitative tically distributed. All gap time data were checked for indepen-
data from the implementation evaluation revealed no unusual dence by calculating the lag-1 and lag-2 autocorrelations (Cohen
events or changes to any of the locations during the evaluation per- et al., 2003). The data were checked for being identically dis-
iod. The three locations were thus suitable for contrasting against tributed by inspecting Weibull probability plots (Nelson, 1982).
each other. When gap times are statistically independent and identically
Data were gathered for the baseline years prior to, in addition to distributed, survival analysis techniques are suitable for inferential
during, the CAB implementation, constituting a pre-post design analysis (Cook and Lawless, 2010). Survival analysis comprises a
with no-treatment comparisons. With such comparison data avail- suite of statistical techniques to analyze the effects of variables
able, the chief statistical analysis is the relative performance of the on the times until an occurrence, which may include analysis of
treatment and comparison data from baseline through interven- gap times. Survival analysis is preferred over conventional least
tion—that is, the presence of a statistical interaction. If the changes squares techniques such as analysis of variance because, compared
in safety are different for the treatment and comparison data, the to conventional analyses, survival analysis:
implication is that some factor associated with only the treatment
data, such as the safety process, is specifically affecting the treat- Is more capable of addressing the approximately exponential
ment data. On the other hand, if there are no differences in safety distributions typical of gap times.
changes, there is a strong possibility that a single factor other than Produces more accurate parameter estimates due the use of
the treatment is responsible for changes in both treatment and maximum likelihood estimation.
comparison data. Allows inclusion of a ‘‘right-censored” gap time, being the first
occurrence that occurs after the end of the evaluation period.
2.2.3.2. Gap times. The occurrence data were analyzed as gap times
This evaluation employs two forms of survival analysis to ascer-
(Cook and Lawless, 2010; Maguire et al., 1952) to determine the
tain the relationship between the CAB safety process and the
occurrence rate changes associated with CAB. A gap time (also
occurrences rates: Cox regression and Weibull regression (Lee
known as inter-arrival time) is the time between a pair of adjacent
and Wang, 2003; Nelson, 1982). The Weibull regression fits a
occurrences. ‘‘Time” is expressed in units that represent the site’s
specific parametric distribution to the data, while the Cox regres-
exposure to the risk of an occurrence. In the railroad industry,
sion derives a nonparametric distribution from the data.
the unit of exposure for incidents is typically car-moves (the num-
Like any regression, Cox and Weibull regressions produce coef-
ber of cars moved through a location), while the unit of exposure
ficients, bi, representing the magnitude of the effect for each pre-
for decertifications is typically worker-hours at the site. Because
dictor variable. Specifically, exp(bi) is the ratio of the ‘‘hazard” or
the gap times are calculated as the elapsed exposure rather than
chance of an occurrence at any given moment for each integer
calendar time, the data are corrected or effectively normalized
increment of the predictor variable (Bradburn et al., 2003). With
for different levels of exposure at different times and places (e.g.,
a predictor variable representing the presence of CAB, one can cal-
different size yards).
culate the percent change in the chance of the occurrences associ-
The timeline in Fig. 2 shows the gap times between five hypo-
ated with CAB (Hosmer et al., 2008).
thetical decertifications. For instance, the gap between the April
20th and April 23rd decertifications is 24,000 worker-hours, and
the decertification between the April 23rd and April 28th decerti- 2.3. Practices and safety-culture survey
fications is 40,000 worker-hours, and so forth.
Gap times are the mathematical inverse of a rate. For example, A forced-choice survey included measures of practices and
if 24,000 worker-hours were completed since a previous decertifi- safety culture, specifically self-reported CRZ work practices and
cation, then there was 1 decertification per 24,000 worker-hours labor–management relations. Table 2 describes the scales, along
for that time period, equal to a rate of 1/24,000 200,000 or with their sources and supporting research, and the outcome
8.33 decertifications per 200,000 worker-hours.1 Thus, if gap times related to each.2
increase, then the rate must decrease, and vice versa. The scales were evaluated for inter-item reliability with use of
Cronbach’s alpha because this is the first time that such scales have
1
The constant of 200,000 is included to be consistent with the railroad convention been used in the railroad industry. These scales were added to 11
of representing rates as roughly the frequency per 100 workers working 40 h a week
2
for one year. Please contact the first author for more detailed information about the scales.
M. Zuschlag et al. / Safety Science 83 (2016) 59–73 67
Table 2
Forced-choice survey scales and their relation to outcomes.
3.2. Occurrences
The 1-lag and 2-lag autocorrelations for the gap times of decer-
Fig. 3. Monthly percent at-risk scores for CAB–CRZ and CAB-Switching feedback tifications and incidents were all low (|r| < 0.2) and on average not
sessions. significantly different from zero, indicating that gap times appear
to be adequately independent for inferential statistical analysis.
Weibull probability plots indicated the data were from a single dis-
attitudes, relations, and interaction patterns (Reason, 1997). Thus,
tribution (Nelson, 1982). Gap times were thus suitable for survival
interview responses relevant to changes in any of these, singly or in
analysis.
combination, were used to explain the process of safety culture
change under CSA.
3.2.1. Decertifications
To evaluate the outcome of CAB–CRZ on occurrences, the San
3. Findings Antonio Service Unit was compared to the other three Southern
Region service units combined. A Cox regression was performed
3.1. Worker practices on the gap times of CRZ-related decertifications since the start of
the first phase. Predictor variables were date, service unit (San
3.1.1. Feedback session data Antonio versus the others), and the date-by-service-unit interac-
3.1.1.1. CAB–CRZ. Fig. 3 depicts the average at-risk scores for CAB– tion. This regression found a significant interaction (v2(1) = 4.68,
CRZ feedback sessions between August 2005 and December 2007. p = 0.030), indicating that, as time progressed since the start of
The dark blue straight line represents the least-squares best-fit lin- CAB–CRZ, the San Antonio rate of decertifications changed signifi-
ear relations to the monthly percentages. cantly more than the other service units.
Throughout the evaluation period, PPF sessions by CAB peer Table 5 shows the results of Cox regressions of date on the
observers showed decreasing tendency for at-risk CRZ practices decertification gap time for each service unit separately. The per-
(r = –0.797, n = 29, p < 0.0001). Based on a linear regression of these cent changes in chance of decertifications were calculated from
data, the percentage of at-risk practices decreased, from 7.14% in the regression coefficients. A positive percent represents a percent
August 2005 to 1.05% in December 2007. In other words, the rate increase in the chance of a decertification at any given moment
of at-risk practices at the end of the evaluation period was less while a negative percent represents a percent decrease in the
than one-sixth the rate at the beginning. chance.
The San Antonio Service Unit, the site that implemented CAB,
3.1.1.2. CAB-Switching. Fig. 3 shows average at-risk scores for CAB- showed a significant 79-% decrease in chance of decertifications
Switching feedback sessions from the start of CAB-Switching in through the first and second phases, from September 2005 (when
October 2006 through December 2007, with the light magenta CAB started) through the end of available data in February 2008.
straight line represent the least-squares best-fit linear relations The other three service units of the Southern Region (Fort Worth,
M. Zuschlag et al. / Safety Science 83 (2016) 59–73 69
Table 5
Cox regressions of date on gap times of decertifications for phases and each service
unit.
mean improvement on the scale was not insubstantial (accounting Worker, Final Interviews
for 10% of the variance), but remained below the 3.0 midpoint of I have seen a lot of improvements in the facilities, in providing
the 5-point Likert response stem used by the scale. This suggests employees with more stuff, such as locker rooms, more stuff in
that while safety culture improved, some labor–management the engine, such as AC in it. That has totally changed in the last
issues remained. Managers generally saw relations as being signif- three years. I’ve seen the facilities are now clean.
icantly better than did workers for both phases (Ms = 3.472 com-
Management practices such as barrier removal may have con-
pared with 2.378, F(1, 303) = 56.296, p < 0.001). This is a
tributed to improving safety at the site. However the change in
commonly found difference in perspective that may reflect labor
management practices likely had the additional effect of improving
and management’s different bargaining roles (K. Bell, BST, personal
worker perceptions of management. In the interviews, workers
communication, August 26, 2013; M. Mangan, BST, personal com-
(and managers) reported seeing that management commitment
munication, August 27, 2013). There was no significant interaction
to safety had improved since the baseline period:
between respondent type and phase (F(1, 303) = 0.050, p = 0.823),
indicating that workers and managers saw the same degree of Worker, Initial Interviews
change, and that there was no change in the gap between workers Management has a great safety attitude if it benefits them. They
and managers. push cars through even if they are not safe. Then they strong-
arm employees with safety when they want to. All management
at all levels is the same.
3.3.2. Interviews: Labor–Management Relations
In this section concerning the interviews, italic text designates Worker, Final Interviews
themes from the qualitative analysis of the interviews. All inter- I see that 80 percent management have commitment . . .but . . .
view quotes shown in this paper were representative of typical lower management can only execute what upper management
responses for the corresponding theme, rather than unusual or says.
extreme responses. These interview themes suggest changes in
As the worker’s quote above from the final interviews indicates,
attitudes and behavioral patterns related to trust producing
commitment may have been improved, but workers still see issues
changes in commitment, style, and proficiency in ensuring safety
in translating that commitment into action. This may account for
(Reason, 1997).
labor–management relations improving yet still remaining rela-
From the initial interviews through the midterm and final inter-
tively low as measured by the survey.
views, respondents reported improvements in relations between
Management’s commitment toward safety implies that rules
labor and management, echoing the quantitative results found by
are enforced consistently and that managers can be relied on for
the survey. The interviews suggested some factors that may have
safety. Some workers and many managers reported greater fairness
contributed to the improvements in labor–management relations
by management and greater trust between workers and managers, an
indicated by the survey.
improvement in behavioral patterns since the initial interviews:
In the initial interviews, workers were asked to describe their
chief safety concerns. Among the top concerns were the conditions Worker, Final Interviews
of facilities and equipment, as indicated by the worker quoted I have a lot more trust with my managers [than I did a few years
below:4 ago]. I can go to all of them. I know that a lot of the older guys
still don’t trust managers. I know that my managers want me to
Worker, Initial Interviews
work safe.
There is debris all over the yard, the gondola cars are overfull,
and then there are gigantic pieces of metal lying around. There Managers reciprocated the changes in worker perceptions:
are holes two feet deep from work done by Maintenance of Manager, Final Interviews
Way. You could break a leg if you fell in one of these holes. When I first started, the mentality between managers and trainmen
was that [managers] considered trainmen as oxen that have to be
Consistent air conditioning for the train cabs was also men-
whipped to get them going. . .. Over the years, management’s
tioned as an equipment problem:
thoughts have changed. The workers are smart people and have
Worker, Initial Interviews smart things to say, too. . .. There are things that they could teach me.
AC [air conditioning] in cabs – this is a problem. It has
With workers seeing managers as having greater commitment
improved, but not to where it should be. . .. There is a big differ-
to safety, and workers and managers each trusting the other more,
ence when one gets off a train with AC than one without it. Lots
it is understandable that labor–management relations scale scores
of the fleet are not equipped.
would improve.
Management is responsible for conditions of facilities and Greater mutual trust facilitates more communication and coop-
equipment. Thus, workers’ safety concerns implied they perceived eration (Reason, 2003). Furthermore, the joint labor–management
poor organization support for safety. Workers felt management barrier removal team provided a venue to exercise communication
practices inadequately promoted safety. and cooperation. Consistent with these factors, respondents
As part of CAB, the barrier removal team identified barriers reported improved communication and cooperation between
related to the conditions of facilities and equipment, which man- management and labor on safety:
agement removed. Workers appear to have noticed the difference. Worker, Initial Interviews
Most workers and managers reported improvements in facilities A worker was told to pull 100-plus cars out under certain con-
and equipment since the initial interviews. Locomotive air condi- ditions that broke the train in two. It was a manager’s idea to
tioning, yard clean-up, and building repair in particular were not cut away a smaller set. Could have avoided it but he
mentioned: wouldn’t listen to the worker’s idea.
4
The other top concern was fatigue, especially as related to the work schedule. Worker, Final Interviews
Workers would later report improvements regarding this issue, thanks to more I have experienced one incident myself, where managers
reliable train arrival information, but these advances could not be directly linked to
approached me after letting them know about problems in
the CAB process.
M. Zuschlag et al. / Safety Science 83 (2016) 59–73 71
getting a switch lined up. The four managers went there to no significant rate change at yards with a relatively weak CSA
investigate; they talked to us and said thank you for bringing implementation.
it to our attention Labor–management relations, specifically trust, improved at the
demonstration site, indicating the development of a more effec-
Such openness in turn was associated with more reports of
tive safety culture.
workers initiating communication with managers:
Manager, Final Interviews In summary, CAB has demonstrated the potential for CSA to
It used to be that the employees wouldn’t even talk to a man- improve safety and safety culture in railroad transportation
ager without their union local chair, but now I can deal with departments.
the employees directly. Before, they wouldn’t talk to us even
on the smallest issue, but now the conversations are more open 4.2. Evaluation limits
and frequent.
4.2.1. Context of the CAB process
Manager, Final Interviews Despite the comprehensive nature of the evaluation, there are
I think CAB has opened up the relationship between manage- some limitations to take into account when drawing conclusions.
ment and workers. It may be the one avenue that opened doors The findings were from a specific service unit at a specific point
that hadn’t been available before. in history. This site had the following features, which may be rele-
vant to the generalizability of the results:
3.3.3. Summary findings for safety culture
The number of employees at the treatment site ranged from 20
Quantitative survey data indicate that labor–management rela-
(for CAB-Switching) to 1100 (for CAB–CRZ).
tions improved with a successful CSA implementation. Consistent
The pilot was conducted on a US freight railroad, which has cer-
with previous research (see DeJoy, 2005), qualitative data suggests
tain operational and organizational characteristics (e.g., two
that the increase in the labor–management relations scale score
crew members in a CAB, which allows them to observe each
could be related to the apparent improvements in safety commu-
other; FELA, which normally discourages labor–management
nication and cooperative behavioral patterns mentioned in inter-
trust).
views. These improvements were, in turn, related to the
The US freight railroad industry in general was experiencing
improved perceived management commitment to safety that was
growth, owing to increase demands for coal (McPhee, 2005).
associated with the barrier removal identification subsequent
Labor and management were relatively motivated to cooperate
management improvement of facilities and equipment. CSA thus
on safety due to recent high-profile accidents and an influx of
may improve labor–management trust by providing a means for
new workers, although this was counteracted by a long history
ongoing cooperation and communication. Specifically, the
of labor–management distrust and a US legal system that pro-
barrier-removal process provides sustainable opportunities for
vides incentives to clash over safety (Zuschlag et al., 2012).
workers and managers to address safety through cooperation
Top managers were inclined to empower employees with con-
rather than bargaining. The quantitatively measured improvement
trol over the safety process, even when the process encountered
in labor–management relations, bolstered by the explanatory
difficulties.
power of the qualitative analysis, indicate a shift in the safety cul-
There were no unusual changes in leadership, organizational
ture toward a ‘‘trust culture” (Reason, 2003), a key feature of an
structure, or other highly disruptive events.
effective safety culture that is traditionally lacking in the railroad
The baseline level of safety was relatively high, consistent with
industry (Coplen, 1999). The improvement in labor–management
the theoretical potential of CSA to move railroads to the next
relations is consistent with predictions for safety processes like
level of safety.
CSA that feature a nondisciplinary, proactive, cooperative,
systems-safety-analysis orientation (Reason, 1997).
Whether or not the same results can be observed at other sites
and times depends on the similarity of the sites, the implementa-
4. Discussion tion support, the management’s role behavior and motivation, the
level of success in adapting the CSA process to conditions at the
4.1. Outcome summary locality, and the selection of safety measures that are appropriate
for the evaluation (e.g., decertifications, rather than catastrophic
The theory of change, presented in Section 1.2.1., usefully road accidents) and resistant to over-reporting or under-
guided the outcomes to measure. Quantitative results from process reporting. Similar results might be more likely at similar US freight
metrics, surveys, and corporate safety data, depicted on Fig. 1, all sites than at passenger or freight sites outside the US.
point toward safety-related improvements from implementing
CSA. Interview data did not contradict the quantitative findings, 4.2.2. Rigor and confounds
but rather provided consistent explanations for the quantitative As a field study that could not randomly assign participants to
findings. To summarize: conditions, this evaluation nonetheless achieved a high degree of
rigor through the use of mixed research methods, pre-post quasi-
Worker practices targeted by the CSA process improved with experimental design with no-treatment comparison, and systematic
the program’s introduction. Process metrics indicated that the employment of a theory of change (GAO, 2009). It is nearly inevitable
commission of at-risk behaviors dropped by about 80%. that confounds or events will occur simultaneously with safety-
The chance of engineer decertification gradually declined by process implementation, rendering ambiguous the apparent cause
79% after the introduction of CSA at the service unit, whereas of any observed safety improvements. In the case of CAB, however,
the chance of decertification at comparison service units did regional, corporate, or industry-wide confounds with time are
not decline. implausible alternative explanations for improvements in occur-
The rate of incidents of material damage dropped by 81% at a rences because such improvements would also be observed at com-
yard with a strong CSA implementation, and by 32% at yards parison sites. The treatment and comparisons sites were similar in
with a moderately strong implementation, whereas there was size, initial safety level, type of work, organizations (for both labor
72 M. Zuschlag et al. / Safety Science 83 (2016) 59–73
and management), worker tenure, leadership and leadership role site cannot be attributed to either regression to the mean or
behaviors, motivation for safety, regulator oversight, and definitions organizationally-induced confounds. It is most plausibly attributed
of safety occurrences. Yet, this evaluation found safety occurrence to CAB–CRZ, which was unique to the San Antonio Service Unit.
improvements uniquely at the treatment sites. In the case of incidents, Eagle Pass showed the greatest improve-
This leaves local confounds as alternative explanations for ment but at baseline its occurrence rate was significantly higher
safety improvements at the demonstration site. The full report of than the comparison site Laredo, which may cast suspicion on the
this demonstration (Zuschlag et al., 2012) documents all relevant results. However, the rate for Eagle Pass at baseline was not signifi-
confounds and analyzes their potential as alternative explanations. cantly different than the rate for the San Antonio Complex, yet Eagle
However, any field setting has multiple factors occurring and influ- Pass had a significantly lower rate than the San Antonio Complex at
encing each other, and it is ultimately misleading to search for one follow-up. If there were regression to the mean or organizational
feature that is responsible for all change. Anyone interested in confounds, then the San Antonio Complex should have improved
replicating the outcomes of CAB at another site ought to appraise at least as much as Eagle Pass. Indeed, because the San Antonio Com-
the unfolding events at San Antonio, from the initial context plex is larger, it should arguably have had more organizational moti-
through the constant evolution of CAB, and consider how to vation to improve its safety record than Eagle Pass because the San
achieve or exploit equivalent conditions and processes at the Antonio Complex had a greater raw number of incidents (in fact,
intended site. For example, CAB likely benefited from the formative qualitative data indicated that all yards in the service unit were
evaluation, where the ongoing evaluation was used to improve motivated to improve safety). Finally, with regard specifically to
CAB’s implementation. Thus, to maximize the chances of successful regression to the mean, Eagle Pass had the lowest occurrence rate
implementation, CSA processes should include equivalent activi- at follow-up –significantly lower than Laredo. Its occurrence rate
ties, specifically, stakeholder involvement in: did not merely approach the mean over time; it fell significantly
below the mean. This is inconsistent with regression to the mean.
Collection and analysis of data on implementation performance The most plausible and parsimonious interpretation is that increas-
and outcomes. ingly strong CAB-Switching implementations led to greater
Feedback on implementation performance to improve the over- improvements in incident rates.
all implementation.5
4.3. Conclusions and recommendations
along with Lance Fritz, Joe Santamaria, Roby Brown, Michael Gamst, F.C., 1982. The development of operating rules. Proc. Railway Fuel Operat.
Officers Assoc. 46 (155), 162–171.
Mitchell, Mark Barnum, Ted Lewis, Shane Keller, Ronald Tindall,
Geller, E.S., 2001. The Psychology of Safety Handbook. CRC Press, Boca Raton, LA.
Greg Burger, John Dunn, Russell Elley, Mike Araujo, Paul Dillon, Grimaila, R., 2007. Testimony Before U.S. House of Representatives Committee on
Carl Eddington, José Gutierez, Wil Hardiman, Chad Jistel, Oscar Transportation and Infrastructure, October 25, 2007.
‘‘Doctor” Mayfield, Fernando Nanez, Pat Pino, Martin Vacca, Mario Harrington, J.J., 1987. The Improvement Process: How America’s Leading Companies
Improve Quality. McGraw-Hill, New York, NY.
Valadez, John Paul Schuster, and Andy Wright. Thanks also to Hofmann, D.A., Stetzer, A., 1996. A cross-level investigation of factors influencing
George Wollard and Jay Finney of Behavioral Science Technology unsafe behaviors and accidents. Pers. Psychol. 49 (2), 307–339.
Inc. (BST) for providing education and insights into the implemen- Hosmer, D.W., Lemeshow, S., May, S., 2008. Applied Survival Analysis: Regression
Modeling of Time-to-event Data, secon ed. John Wiley & Sons, Hoboken, NJ.
tation of methods like Clear Signal for Action in the railroad indus- Howe, J., 1999. Debunking behavior based safety. Occupational Health & Safety.
try. Along with Kelly Johnson and others at BST, George also Newslett. UAW Health Saf. Department 1 (5).
gathered data for us from a survey customized to our require- Juran, J.M., 1964. Managerial Breakthrough: A New Concept for the Manager’s Job.
McGraw-Hill, New York, NY.
ments. Jonny Morell from Fulcrum Corporation and Demetra Collia Krause, T.R., 1995. Employee-Driven Systems for Safe Behavior. Integrating
from the Bureau of Transportation Statistics provided additional Behavioral and Statistical Methodologies. Van Nostrand Reinhold, New York, NY.
technical assistance, and Christopher Nelson of the RAND Corpora- Krause, T.R., 1997. The Behavior-Based Safety Process. Managing Involvement for an
Injury-Free Culture, second ed. John Wiley & Sons, New York, NY.
tion provided insight into realist program evaluation. Special Krause, T.R., Seymour, K.J., Sloat, K.C.M., 1999. Long-term evaluation of a behavior-
thanks to Wayne Nelson for instructing and advising on gap time based method for improving safety performance: a meta-analysis of 73
analysis and survival analysis. Shuang Wu of Computer Sciences interrupted time-series replications. Saf. Sci. 32, 1–18.
Johnson, K., Greenseid, L.O., Toal, S.A., King, J.A., Lawrenz, F., Volkov, B., 2009.
Corporation assisted in the processing and analyses of the data.
Research on evaluation use: a review of the empirical literature from 1986 to
Katherine Blythe, Alison Stieber, Cassandra Oxley, and other 2005. Am. J. Eval. 30 (3), 377–410.
MacroSys staff provided editorial assistance. Lee, E.T., Wang, J., 2003. Statistical Methods for Survival Data Analysis. John Wiley &
The work was performed under an interagency agreement Sons, Hoboken, NJ.
Maguire, B.A., Pearson, E.S., Wynn, A.H.A., 1952. The time intervals between
between the FRA Human Factors Division and the Office of Safety industrial accidents. Biometrika 39 (1–2), 168–180.
Management and Human Factors of the US Department of Trans- McPhee, J., 2005. Coal Train—I: Disassembling the Planet for Powder River Coal.
portation John A. Volpe National Transportation Systems Center. New Yorker, October 3.
Miles, M.B., Huberman, A.M., 1994. Qualitative Data Analysis: An Expanded
Sourcebook, second ed. Sage, Thousand Oaks, CA.
References Nelson, W.B., 1982. Applied Life Data Analysis. John Wiley & Sons, Hoboken, NJ.
Nelson, W.B., 2003. Recurrent Events Data Analysis for Product Repairs, Disease
Bradburn, M.J., Clark, T.G., Love, S.B., Altman, D.G., 2003. Survival analysis Part III: Recurrences, and other Applications. Society for Industrial and Applied
Basic concepts and first analyses. Br. J. Cancer 89, 605–611. Mathematics, Philadelphia, PA.
Cohen, J., Cohen, P., West, S.G., Aiken, L.S., 2003. Applied Multiple Regression/ Patton, M.Q., 2002. Qualitative Research and Evaluation Methods, third ed. Sage,
Correlation Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Thousand Oaks, CA.
Mahwah, NJ. Pawson, R., 2002. Evidence-based policy: the promise of realist synthesis.
Cook, R.J., Lawless, J.F., 2010. The Statistical Analysis of Recurrent Events. Springer Evaluation 8 (3), 340–358.
Science + Business Media, New York, NY. Pawson, R., Greenhalgh, T., Harvey, G., Walshe, K., 2005. Realist review – a new
Coplen, M.K., 1999. Compliance with Railroad Operating Rules and Corporate method of systematic review designed for complex policy interventions. J.
Culture Influences. Results of a Focus Group and Structured Interviews. Federal Health Serv. Res. Policy 10 (suppl. 1), 21–34.
Railroad Administration, DOT/FRA/ORD-99/09, DOT-VNTSC-FRA-97-7. Pawson, R., Tilly, N., 1997. Realistic Evaluation. Sage, Thousand Oaks, CA.
Coplen, M.K., Ranney, J., 2009. Safe Practices, Operating Rule Compliance, and Pedersen, L.M., Nielsen, K.J., Kines, P., 2012. Realistic evaluation as a new way to
Derailment Rates Improve at Union Pacific Yards with STEEL Process – A Risk design and evaluate occupational safety interventions. Saf. Sci. 50, 48–54.
Reduction Approach to Safety. Research Results. Federal Railroad Phimister, J.R., Bier, V.M., Kunreuther, H.C., 2004. The accidents precursors project:
Administration, RR09-08. <http://www.fra.dot.gov/eLib/Details/L04248> overview and recommendations. In: Phimister, J.R., Bier, V.M., Kunreuther, H.C.
(10.10.14). (Eds.), Accident Precursor Analysis and Management: Reducing Technological
Crano, W.D., Brewer, M.B., 2002. Principles and Methods of Social Research, second Risk through Diligence. National Academies Press, Washington, DC.
ed. Lawrence Erlbaum Associates, Mahwah, NJ. Railroad Safety Risk Reduction Program in the Rail Safety Improvement Act of 2008,
Creswell, J.W., 2003. Research Design: Qualitative, Quantitative, and Mixed 2008. 49 U.S.C. §20156.
Methods Approaches, second ed. Sage, Thousand Oaks, CA. Ranney, J., Nelson, C., 2003. Impacts of Participatory Safety Rules Revision in U.S.
Dastmalchian, A., Blyton, P., Adamson, R., 1989. Industrial relations climate: testing Railroad Industry: An Exploratory Assessment. Federal Railroad Administration,
a construct. J. Occup. Psychol. 62, 21–32. DOT-VNTSC-FRA-02-05.
Dejoy, D.M., 2005. Behavior change versus culture change: divergent approaches to Ranney, J., Zuschlag, M., Coplen, M., Nelson, C., 2010. Behavior-based Safety at
managing workplace safety. Saf. Sci. 43, 105–129. Amtrak-Chicago, Associated with Improved Safety Culture and Reduced Injuries
Deming, W.E., 2000. Out of the Crisis. MIT Press, Cambridge, MA. and Costs. Manuscript in Preparation (copy on file with author).
Department of Defense, 1996. Handbook for Reliability Test Methods, Plans, and Reason, J., 1997. Managing the Risks of Organizational Accidents. Ashgate,
Environments for Engineering, Development, Qualification, and Production. Aldershot, UK.
MIL-HDBK-781A. Reason, J., 2003. Managing Maintenance Error. Ashgate Publishing Company, UK.
Egan, M., Bambra, C., Petticrew, M., Whitehead, M., 2009. Reviewing evidence on Rosenthal, R., Rosnow, R.L., 1991. Essentials of Behavioral Research: Methods and
complex social interventions: appraising implementation in systematic reviews Data Analysis, second ed. McGraw-Hill, New York, NY.
of the health effects of organisational-level workplace interventions. J. Rossi, P.H., Freeman, H.E., Lipsey, M.W., 1999. Evaluation: A Systematic Approach,
Epidemiol. Community Health 63 (1), 4–11. seventh ed. Sage, Thousand Oaks, CA.
Federal Railroad Administration, 2001. Switching Operations Fatality Analysis. Shadish, W.R., Cook, T.D., Campbell, D.T., 2002. Experimental and Quasi-
Severe Injuries to Train and Engine Service Employees: Data Description and Experimental Designs for Generalized Causal Inference. Houghton Mifflin,
Injury Characteristics. Switching Operations Fatality Analysis (SOFA) Working Boston.
Group. <www.fra.dot.gov/downloads/safety/sofa/SOFA_Injury.pdf> (05.01.11). Simard, M., Marchand, A., 1994. The behaviour of first-line supervisors in accident
Federal Railroad Administration, 2006. Railroad Safety Statistics 2005 Annual prevention and effectiveness in occupational safety. Saf. Sci. 17, 169–185.
Report. FRA Office of Public Affairs. <http://safetydata.fra.dot.gov/officeofsafety/ Simard, M., Marchand, A., 1997. Workgroups’ propensity to comply with safety
ProcessFile.aspx?doc=bull2005-book.exe> (May, 2011). rules: the influence of macro-micro organizational factors. Ergonomics 40, 172–
Federal Railroad Administration, 2011. FRA guide for preparing accident/incident 188.
reports. FRA Office of Safety, DOT/FRA/RRS-22. <http://safetydata.fra.dot.gov/ Spigener, J.B., Hodson, S.J., 1997. Are labor unions in danger of losing their
OfficeofSafety/ProcessFile.aspx?doc= leadership position in safety? Prof. Saf. (12), 37–39, 1997
FRAGuideforPreparingAccIncReportspubMay2011.pdf> (July, 2013). U.S. Government Accountability Office (GAO), 2009. Program Evaluation: A Variety
Federal Railroad Administration, 2008. The FRA Risk Reduction Program: A New of Rigorous Methods Can Help Identify Effective Interventions. Report to
Approach for Managing Railroad Safety. White Paper. FRA Office of Research and Congressional Requesters. GAO-10-30, November 23, 2009.
Development Division, Human Factors Program. <http://www. Walton, M., 1986. The Deming Management Method. Berkley Publishing, New York,
fra.dot.gov/downloads/safety/ANewApproachforManagingRRSafety.pdf> NY.
(05.01.11). Yin, R.K., 2009. Case Study Research, fourth ed. Sage, Thousand Oaks, CA.
Frederick, J., Lessin, N., 2000. Blame the worker: the rise of behavioral-based safety Zuschlag, M., Ranney, J., Coplen, M., Harnar, M., 2012. Transformation of Safety
programs. Multinat. Monit. 21 (11), 10–17. Culture on the San Antonio Service Unit of Union Pacific Railroad. Federal
Funnel, S.C., Rogers, P.J., 2011. Purposeful Program Theory: Effective Use of Theories Railroad Administration, DOT/FRA/ORD-12/16, DOT-VNTSC-FRA-10-07. <http://
of Change and Logic Models. Jossey-Bass, San Francisco. www.fra.dot.gov/eLib/details/L04121> (17.12.12).