Professional Documents
Culture Documents
2019 - FDA Quality Metrics Research - ST Gallen
2019 - FDA Quality Metrics Research - ST Gallen
2019 - FDA Quality Metrics Research - ST Gallen
Prof. Dr. Thomas Friedli, Prabir Basu, PhD Nuala Calnan, PhD
Dr. Stephan Köhler, OPEX and cGMP Consultant Regulatory Science Adjunct
Julian Macuvele, Steffen Eich, Research Fellow at TU Dublin
Marten Ritz
University of St.Gallen
TABLE OF CONTENTS
10 SUMMARY................................................................................................................. 68
REFERENCES..................................................................................................................... 70
APPENDIX ...........................................................................................................................72
Appendix 1 ICH Q10 Enabler Questionnaire.................................................................................................................... 73
Appendix 2 Operationalization of operational excellence enabler
dimensions and link to quality behavior/maturity.......................................................................................................... 81
Funding for this report was made possible, in part, by the Food and Drug Administration through
grant [1U01FD005675-01]. The views expressed in written materials or publications and by speakers
and moderators do not necessarily reflect the official policies of the Department of Health and Hu-
man Services; nor does any mention of trade names, commercial practices, or organization imply
endorsement by the United States Government.
We express our gratitude to the FDA Quality Metrics Team for three years of outstanding collabo-
ration. An additional thank you goes to the following peer reviewers for their comments on earlier
versions of this report: Clive Brading (Sanofi), Kira Ford (Eli Lilly) and Tina Morris (PDA).
1. The approach to build PQS Effectiveness (and also other scores) on a specific number of metrics is named operationalization. The conclusions built from
the aggregated scores (e.g. PQS Effectiveness) have to be linked to the specific operationalization that was used in this research. The operationalization is
a construct built on a match between content of each metric and the aggregate but might has some limitations due to the reliance on available metrics in
the St.Gallen benchmarking databases.
This report presents the findings from three years of Quality Met- developed in the first two years of Quality Metrics Research by
rics Research and builds on seminal outcomes from earlier op- the University of St.Gallen (Friedli, Köhler, Buess, Basu, & Calnan,
erations and quality management research, e.g. Voss et al. (Voss, 2017, 2018). The following main results are highlighted below and
Blackmon, Hanson, & Oak, 1995), Ferdows and De Meyer (Ferdows are further described in more detail in this report in the relevant
& De Meyer, 1990), Deming (Deming, 1986). The work undertak- chapters noted.
en in year 3 has deepened the insights and enhanced the models
The FDA Quality Metrics initiative has emerged directly from the strengthens the PPSM model to act as a true excellence model pro-
FDA Safety and Innovation Act (US Congress, 2012). It aims to ‘sup- viding the basis for an in-depth evaluation of a case for quality (see
port the modernization of pharmaceutical manufacturing as part chapter 5). Additionally, research in year 3 focused on deepening
of the FDA’s mission to protect and promote public health’ (Re- the understanding of the role of the QC Labs (chapter 7), enhanc-
search, 2016), by encouraging both the industry and the regulators ing the evaluation of the importance of the role of culture (chapter
to implement modern, risk-based pharmaceutical quality assess- 8) and the identification of early operational indicators or predic-
ment systems. tors of quality issues that are likely to occur in the future (chapter
9).
As part of this initiative, the FDA awarded in 2016 a research grant
to the University of St.Gallen, Switzerland to help establishing a The analyses in this report are mainly based on the St.Gallen OPEX
scientific basis for relevant pharmaceutical manufacturing perfor- database(s) (cf. Figure 1 and Figure 2). The participating sites either
mance metrics, which may prove useful in assessing the current initiate contact with the St.Gallen team or embark on the project
state of quality and in predicting risks of quality failures and/or after the research team proposes the benchmarking opportunity
drug shortages. Results from the first year of this collaboration and benefits. The motives to participate are quite broad: While
were published in a research report issued2 in October 2017 and some sites seek to validate the success of longstanding Operational
key findings were also presented at the ISPE annual meeting in San Excellence initiatives, others use it as a baseline assessment before
Diego (Friedli et al., 2017). In July 2017, the grant was extended for a initiating such an improvement initiative. Large companies that
second year. The results from this second year were again summa- participate with several sites often aim to cover a broad spectrum
rized in a comprehensive report3, issued in November 2018 (Friedli, from lighthouse or flagship sites to others that are lagging in their
Köhler, Buess, Basu, et al., 2018). Operational Excellence journey. The team therefore assumes the
sites are not more advanced than the industry average.
This report, building upon the findings of the first two years of re-
search (see chapter 3 for a comprehensive summary), provides an The research team gathers these datasets remotely, in very close
account of the research activities undertaken in the third year and collaboration with the corresponding manufacturing sites and
presents an outlook for future research on this topic. The research labs. The data gathering process involves several steps for data val-
in year 3 focused on enhancing the Pharmaceutical Production Sys- idation and feedback to the sites to ensure data quality. At the time
tem Model (PPSM) used for structuring and analyzing the available of writing this report, the St.Gallen benchmarking databases con-
data, by broadening the enabler base and further operationalizing sisted of 381 manufacturing sites and 66 QC lab locations.
the outcome calculation (see chapter 4 & 5). This modification
2. The final report of the first year of the FDA Quality Metrics research project can be requested under: www.item.unisg.ch/fda
3. The final report of the second year of the FDA Quality Metrics research project can be requested under: www.item.unisg.ch/fda
Our wording:
Site FTE
13 15 17 19
80 66
60
40 Up to 200 FTEs 201 to 400 401 to 600 Above 600
24
20 14
67%
0
Regional Distribution
0%
31 10 20 5
DS
8%
0%
23 13 12 9 9
DP
56% 44%
1 Drug Product Type Any 2 Drug Product Types
Centralization
25 | © IT
Customer Complaint
Rate
E PQS
Lot Acceptance Excellence
Rate/
1 -Rejected Batches Invalidated OOS
Result System D
PQS Effectiveness PQS Efficiency
B CAPA Effectiveness
Enabling
System
A
Cultural Excellence
1 | © ITEM-HSG
4. https://www.fda.gov/drugs/pharmaceutical-quality-resources/quality-systems-drugs
“QC Lab Robustness” had already been designed into the PPSM ultimate goal of PQS Excellence can only be achieved by sustaining
analysis developed in year one, as there was little or no QC Lab improvement of each of the PPSM building blocks and Cultural Ex-
data available in the original benchmarking database, a specific QC cellence is the basis for delivering that improvement. While specific
1 | © ITEM-HSG
Lab Robustness score could not be calculated or be considered in controls and safeguards may influence an improvement in specif-
the overall consideration of the effectiveness of the Pharmaceutical ic high-level KPIs like OTIF (on time in full)7 maximum benefit is
Quality System (PQS Effectiveness Score). gained when organizations are fully committed to excellence and
use a superior management of equipment and processes (effective-
A high-level view of the PPSM is depicted in Figure 4 and Figure 5
ness) to drive down costs too (efficiency). For overall PQS Effective-
shows the model including all the metrics that went into the mod-
ness, the most significant category is Operational Stability.
el. The three FDA metrics from the draft guidance are highlighted
in Figure 4 and Figure 5. The model serves several aims:
The main characteristics of this PPSM house are: 1. The PPSM provides a structured and holistic depiction of the
relevant, available data from the St.Gallen OPEX Database,
1. The foundation of the PPSM house is Cultural Excellence6
including: Key Performance Indicators8 (e.g. metrics within
which combines employee engagement, quality behavior
the C-categories), Enabler implementation9 (e.g. qualitative
and quality maturity of a company.
enablers within the category Cultural Excellence) and
2. The second level of the house is referred to in the model the Structural Factors10 of the given organization (e.g. site
as the Corrective Action and Preventive Action (CAPA) structure, product mix, technology employed).
effectiveness. This element incorporates broader quality
2. The model facilitates positioning of the three metrics,
management system capabilities, which are required to
suggested in the revised FDA Draft Guidance (2016), within
correct, prevent and bring about improvements to an
the broader context of the holistic St.Gallen understanding,
organization’s processes by proactively and reactively
in order to test them for significance from a system
eliminating causes of non-conformities or other under-
perspective. By doing so:
performance outcomes and is an integral part of any GMP
environment. a. The KPI Lot Acceptance Rate was assigned to the
C-category Operational Stability.
3. The third level considers the stability of production. In the
very center of this level, the model includes a performance b. The KPI Invalidated OOS to the C-category Lab Quality
assessment of Operational Stability. As this stability is also and Robustness.
impacted by the reliability of the supply of critical materials c. Customer Complaint Rate is considered as an outcome
and components, Supplier Reliability is the second building metric within the PPSM and therefore is located in the
block on level 3. The third component, at this level, are the D-category PQS Effectiveness.
measures associated with Lab Robustness. As mentioned 3. The model facilitates the grouping and discussion of the
earlier, this score could not be calculated in year 1 because of elements within the PPSM as well as the examination of the
the lack of appropriate data but was considered in the year relationships between elements. For instance, the proposal
2 research. A deeper analysis of the role of labs can be found to examine the “Relationship of individual Operational
in Chapter 7 of this report. Stability metrics with PQS Effectiveness”, clearly defines the
Based on Cultural Excellence (Level A), robust CAPA processes scope of the analysis to be discussed.
(Level B) and high Supplier Reliability, Operational Stability and 4. The PPSM provides a structure for the overall research
Lab Robustness (Level C) the model has provided evidence of how project as it facilitates the tracking and communication of
a company can achieve higher PQS Effectiveness. Related efforts in each analysis already performed as well as indicating any
assessment of costs and headcounts provide one means to calculate potential blank spots between the different PPSM elements,
a PQS Efficiency score. These two aggregated measures build the thereby supporting the identification of future potentially
two pillars of assessing PQS Excellence. Another approach for the interesting analysis.
efficiency calculation is introduced in chapter 4 of this report. The
» Justify the additional reporting effort and highlight the VI. Without considering the level of inventory range (measured
benefits to the business as Days on hand), the program’s ability to assess the risk for
drug shortages is limited.12
» Further systematize continuous improvement activities
within organizations VII. Evaluating advantages and disadvantages of other voluntary
reporting programs (such as (OSHA, 2019)) versus mandatory
» Improve the understanding of the actual performance of the participation is recommended.
company’s production system in general, and the reported
FDA metrics in particular. The chosen focus on measurable KPIs in the draft guidance is rea-
sonable and does make sense for a regulator. However, the research
This will lead to the integration of the business and quality. Analy- has demonstrated that a focus on the level of integration and lev-
sis and measuring quality will be process oriented. This analysis will el of enabler implementation should also be included for any tar-
lead to prioritization of work processes, changes in the culture on geted discussions with industry or during the inspection process.
and beyond the shop floor, as well as change in how management In particular, the importance of cultural excellence should not be
views quality as a benefit to improving efficiency and productivity. underestimated and is best considered together with any KPI-led
To deal with this radical shift, organizations would need to change performance assessment. The complexity in patterns examined,
their mindset. Where Quality is driving Operational Excellence especially the interdependencies would benefit from further crit-
there is a Culture of Quality, with strong Executive Leadership and ical understanding of the specific context for each organization.
Organizational Structure, integrated with corporate objectives, in- The existence of significant numbers of “out of pattern” groups
tegrated processes, and real-time traceable metric measurement. indicates that it is useful to explore the potential for engaging in
direct discussions with the pharmaceutical industry about the met-
3.1.2 St.Gallen Research validates the FDA Quality rics and metrics systems in use within organizations.
Metrics Program Based on the findings and the comparison of the FDA Quality
I. Lot Acceptance Rate and Customer Complaint Rate are Metrics with the St.Gallen PPSM Approach the research team can
reasonable measures to assess Operational Stability and confirm that the FDA proposed metrics are key metrics of the Phar-
PQS Effectiveness and should remain part of the Quality maceutical Quality System
Metrics Program.
II. The level of detail of FDA suggested quality metrics
definitions is appropriate given the limited number of
metrics requested.
5. The selection of metrics is often influenced and limited by the availability of metrics available in the OPEX benchmarking database. The operationaliza-
tion of subsystems like ‘Supplier Reliability’ could use additional KPIs for a broader representation.
6. Cultural Excellence has been investigated in year 1 of this research. The importance of culture for superior quality could be shown based on available data.
For a deeper insight compare FRIEDLI ET AL.(Friedli et al., 2017)
7. We showed in year 1 that high inventory levels might have a positive impact on OTIF (Friedli et al., 2017)
8. Key performance indicators (KPIs) are a set of quantifiable measures that a company uses to gauge its performance over time. These metrics are used to
determine a company's progress in achieving its strategic and operational goals, and also to compare a company's finances and performance against other
businesses within its industry.
9. Enablers are production principles (methods & tools but also observable behavior). The values show the degree of implementation based on a self-assess-
ment on a 5 point Likert scale.
10. Structural factors provide background information on the site, such as size and FTEs, technology, product program. Structural factors allow to build
meaningful peer groups for comparisons (“compare apples with apples”).
11. To do this bears some complexity: first risk has to be operationalized and then it needs a certain amount of data to be able to find relations between the
metric values and the risk exposure. As FDA intends to do the analysis only in combination with other data they already have available there may be other
patterns arising that serve the aim to identify respective risks.
12. This conclusion has not been derived from data analysis but from theory and from the study of sources like the Drug Shortages report by International
Society for Pharmaceutical Engineering [ISPE] and The Pew Charitable Trusts [PEW] (The Pew Charitable Trusts & International Society for Pharmaceuti-
cal Engineering, 2017).
c. Operational Stability has been found to have a significant I. A high level of operational stability seems to be the major
impact on PQS Effectiveness. lever to achieve high levels of Service Level Delivery. Service
Level Delivery (OTIF) is deemed to be a good surrogate
d. Supplier Reliability has been found to have a significant for PQS Effectiveness measured by the aggregated PQS
impact on Operational Stability. Effectiveness Score.
e. PQS Effectiveness high performing sites have a significantly II. Sites with low operational stability show significant higher
higher Cultural Excellence compared to PQS Effectiveness levels of Rejected Batches.
low performing sites.
III. Sites with low stability and low inventory show a weak
f. A newly developed Inventory –Stability Matrix (ISM) allows performance for both Rejected Batches and Customer
for a better understanding of the impact of inventory on Complaint Rate.
PQS performance on a site.
g. A High level of Inventory (Days on Hand) can compensate
for stability issues experienced on sites including insufficient Customer Complaint Rate
process capability. I. Sites with low stability and low inventory show a weak
I. Sites with Low Stability and Low Inventory have performance for both metrics, Rejected Batches and
the highest risk profile regarding Rejected Batches, Customer Complaint Rate.
Customer Complaint Rate and Service Level Delivery II. A higher Customer Complaint Rate is accompanied by a low
(OTIF) (PQS Effectiveness surrogate) aggregated PQS Effectiveness Score.
II. Operational Stability high performing sites have a
significantly lower level of Customer Complaints and a
significantly lower level of Rejected Batches compared
to Operational Stability low performing sites.
III. Fostering Quality Maturity will have a positive impact
on the Quality Behavior at a site, leading to superior
Cultural Excellence and subsequently providing the
foundation of PQS Excellence.
13. Today in most OPEX and Production System implementations the main focus is to introduce the various tools and enablers. Controlled is usually the
degree to which the different components of an OPEX program or a production system have been implemented. Unfortunately, the link to performance is
not always made, respectively there are no KPI systems introduced to measure the impact of these implementations and adapt the programs accordingly.
Quality Maturity
I. Quality Maturity is strongly correlated with Quality
Behavior.
II. We identified the ten individual Quality Maturity attributes
that differentiate sites most regarding their overall Quality
Maturity score (Details included in Chapter 8).
PQS
Excellence
CAPA Effectiveness
Management Review
ICH Q10
CAPA System
Management Responsibilities Quality Risk Management
Figure 7: Number of legacy PPSM Enablers assigned per ICH Q10 category
14. In order to describe Supplier Reliability the research team used all available measures from the benchmarking questionnaire that help assess supplier relia-
bility. However, overall supplier reliability could encompass more aspects than the ones captured with the benchmarking questionnaire.
Complaint Rate Supplier Quality of input material Analytical Right First Time %
Service Level Supplier Reliability of supply Customer Complaint
No./100,000 Tests
Investigation Rate
Table 3: Overview Supplier Reliability Metrics and Purpose Invalidated OOS Rate No./100,000 Tests
of Measure Lab CAPAs Overdue %
Lab Deviation Rate No./1,000 Tests
Lab Investigation Rate No./1,000 Tests
Product Re-Tests due to
%
Complaints
Recurring Lab Deviations %
15. A moderator is a variable that affects a relation between two dimensions A and B. It can affect the direction and/or strength of the relation between A and
B. A moderator analysis to understand the moderating effects of Lab Robustness on the relation between PQS Effectiveness and PQS Efficiency is part of
on-going research that will be published in the future.
In their seminal work, Voss et al. examine competitiveness by fo- The future oriented character of these hypotheses complicates a
cusing on the relation between operational performance and OPEX reliable verification or falsification. Furthermore, the available data
practice implementation (Voss et al., 1995). The study aims, among does neither provide insights concerning when an OPEX program
others, at identifying how companies benefit from adopting suc- was launched at a plant nor does it allow adequate longitudinal
cessful practices from key areas, such as quality, production, prod- analysis of performance developments. Nevertheless, two plausi-
uct development. For this purpose, performance is operationalized ble explanations can be made based on the available data from the
in terms of productivity, market and financial measures. Voss et al. St.Gallen OPEX Benchmarking database.
suggest that higher practice implementation will lead to increased To identify why won’t go the distance plants are vulnerable to per-
levels of operational performance, which will in turn result in su- formance drops, the research team draws on OPEX literature,
perior business performance (Voss et al., 1995). In a similar fashion, which stresses the importance of empowered employees for sus-
the research team has replicated the logic of Voss et al. by corre- tainable performance (Zhang, L.; Narkhede, B. E.; Chaple, 2017).
lating enabler implementation and PQS Excellence based on the When comparing the reasons for launching OPEX of won’t go the
established St.Gallen OPEX Benchmarking database (cf. Figure 9). distance plants with the reasons of contenders/world class plants, the
Based on the level of practice implementation and performance, analysis shows significant differences in only three of the thirteen
Voss et al. distinguish six categories of plants (Voss et al., 1995), of surveyed reasons, namely to empower employees, to transition
which the research team adopts the four categories ‘punchbags’, towards process organization and to increase cost awareness (cf.
‘won’t go the distance’, ‘promising’ and ‘contenders/world class’. Table 6). A subsequent statistical analysis comparing the level of
(2-tailed)
A 21 2.52 1.436 0.683 -0.413
Meet FDA regulations
B 9 2.78 1.787
Transition from functional organization to process A 22 3.23 1.270 0.034 2.222
organization B 9 2.11 1.269
A 22 4.45 0.739 0.028 2.315
Increase employee empowerment
B 9 3.67 1.118
A 22 4.64 0.581 0.095 1.727
Increase employee involvement
B 9 4.22 0.667
A 22 4.50 0.673 0.013 2.639
Increase cost awareness
B 9 3.56 1.333
Initiate cultural change for continuous A 22 4.64 0.727 0.593 -0.541
improvement B 9 4.78 0.441
A 22 4.45 0.739 0.257 1.156
Introduce standardized methodologies
B 9 4.11 0.782
A 21 4.29 0.845 0.328 -0.996
Fulfill site targets
B 8 4.63 0.744
Table 8: Comparison of reasons for launching OPEX (excerpt of selected reasons): ‘Contenders/world class’ vs. ‘Promising’
Two-step Cluster
analysis:
Cluster A
Cluster B
G2 G1
PQS Efficiency
G3 G4
PQS Effectiveness
PQS Effectiveness Cluster n Average Std. Sig. t-value PQS Efficiency com- Cluster n Average Std. Sig. t-value
Metric Deviation ponent Deviation
(2-tailed) (2-tailed)
Unplanned A 31 .533 .277 Direct labor cost / A 32 .671 .1865
.409 0.834 .000 7.955
Maintenance B 25 .469 .285 batches produced B 30 .274 .2075
A 26 .561 .164 Indirect labor A 32 .619 .1752
OEE .728 -0.350
B 25 .586 .319 cost / 30 .324 .000 5.875
A 32 .619 .234 B .2199
Rejected batches .578 0.559 batches produced
B 30 .584 .260 Cost for machines A 32 .644 .1962
Deviations per A 32 .601 .232 & tools / 30 .304 .000 5.793
.000 4.589 B .2636
batch B 30 .323 .245 batches produced
A 31 .553 .231
Yield .546 -0.608 Cost for property A 32 .575 .2196
B 27 .598 .314 & plant / 30 .339 .001 3.653
A 27 .670 .245 B .2859
Scrap Rate .319 -1.007 batches produced
B 20 .739 .214
Corporate A 32 .608 .3007
A 31 .591 .258 allocations / 30 .511 .280 1.089
Release Time .034 2.169 B .3979
B 28 .435 .295 batches produced
Deviation A 32 .619 .244
.010 2.657 Other cost / A 32 .659 .2837
Closure Time B 30 .447 .268 .007 2.813
batches produced B 30 .434 .3425
Complaint Rate A 32 .482 .298
.644 0.456
Supplier B 30 .567 .264 Table 12: Differences between efficient and less efficient plants
A 32 .567 .286 concerning PQS Efficiency components
Service Level
.892 0.136
Supplier B 30 .557 .269
The analyses further reveal that efficient plants are both more ef-
fective and efficient while having significantly lower inventory lev-
5.5.1 Research Approach
els compared to less efficient plants. Inventory levels are operation- To study potential differences between effective and less effective
alized as the average Inventory less write downs multiplied by 365 plants the sample is broken down across PQS Effectiveness medi-
and divided by the COGS. The more efficient plants also launched an as outlined in chapter 5.3. To identify key differences between
significantly more SKUs in the last three years. Table 14 shows the effective and less effective plants, an independent sample T-test is
actual average values for all depicted metrics, which reveals that conducted subsequently, comparing the two resulting sub-samples
plants from Cluster A have on average inventory levels of 71.9 days (G2, G3) with (G1, G4) (cf. Figure 11).
compared to 189.9 days for Cluster B.
FTE Direct labor / # A 31 .572 0.231 Inventory level (days A 31 71.9 56.4
.001 -3.231
.000 -5.253 on hand) B 25 189.9 175.5
of batches produced B 30 .283 0.197
% of manually A 32 40 40
FTE Indirect labor A 31 0.582 0.244 .303 1.039
/ # of batches .000 -5.373 operated machines B 29 30 30
30 .271 0.204
B
produced A 31 450.9 441.3
# of Total FTEs .204 1.294
B 30 341.9 156.8
Table 13: Differences in productivity in terms of FTE/Batch
# of different A 32 55.7 87.5
market products 30 27.5 45.4 .115 1.607
B
produced
A 22 258.2 219.4
# of different SKUs .415 0.827
B 19 177.9 370.7
# of new drug A 32 22.0 48.4
introductions in last 30 5.5 7.3 .066 1.900
B
3 years
# of SKU at site in A 31 114.3 156.5
.017 2.770
last 3 years B 23 30.3 54.5
Table 20: Differences between excellent and the worst performing plants
concerning headcount structure
Table 21: Differences between excellent and the worst performing plants
concerning employee productivity
dicator for underlying stability problems (cf. Table 20). The head- More detailed information of significance of the group differences
count shares depicted in Table 20 are normalized whereby higher are therefore not provided in Table 22.
values indicate a lower share of FTE per component.
The results of the descriptive analysis outlined in Table 22 show
Finally, in addition to differences in headcount structure, excellent clear differences in enabler implementation levels between the
plants show significant differences in employee productivity in best and the worst performing groups (group G1 respectively group
terms of FTE per batch. A T-test reveals that excellent plants show G3). It also shows that all groups but group G3 have similarly high
significantly higher employee productivity in all direct and indirect Overall enabler implementation scores. A more detailed look at
areas indicated by higher normalized values for FTE per batch. Ta- underlying enabler categories, however, reveals that the more ef-
ble 21 depicts the differences between the two groups concerning fective groups (group G1 and group G4) also show the highest TQM
employee productivity. enabler implementation levels. Furthermore, the very best group
(group G1) separates itself from the other groups in terms of show-
5.6.3 Conclusion ing the highest JIT and EMS enabler implementation levels, which
underlines the importance of an effective management system and
The results outlined in this chapter support the view of enabler of JIT capabilities based on high Overall enabler implementation.
implementation as a key lever for driving PQS Excellence. Plants
from group G1 excel for enabler categories Overall, but also for en- The results further suggest that excellent plants and the worst
abler sub-categories TQM, JIT and EMS. In contrast, plants from performing plants are structurally different. However, these dif-
group G3, which represent the worst performing group show the ferences are viewed as the outcome of PQS performance rather
lowest enabler implementation levels. Table 22 provides a descrip- than as drivers. Examples include the significantly lower inventory
tive overview of key characteristics of the four groups G1 to G4. levels and the higher number of different SKUs produced of excel-
For each criterion the respective highest and lowest scoring values lent plants (cf. Table 19) as well as the lower share of indirect la-
are highlighted in bold. The objective of this table is to provide an bor (cf. Table 20). Therefore, companies should not try to improve
overview of mean values for each of the criteria discussed in this performance by reducing inventory or headcount without having
chapter and for each group. Therefore, the table does not focus on achieved sustainable operational stability first. Since OPEX litera-
ranking the different groups. By definition, excellent plants show ture emphasizes the importance of continuous improvement based
high levels of both PQS Effectiveness and PQS Efficiency. Conse- on an engaged workforce as a key driver of sustainable and superior
quently, these plants excel for most of the underlying metrics of performance, companies should further strive for enhanced em-
both the PQS Effectiveness Score and of the PQS Efficiency Score. ployee involvement in CI as opposed to headcount cost reduction.
Table 23: Propositions related to the operating context and QC lab effectiveness
16. A centralized lab conducts test for the own site and other internal or external sites. A decentralized lab only tests products that are manufactured on-site
where the lab is located.
No. Hypothesis
H1 QCHPs do not show a significantly higher enabler implementation in Preventive Maintenance compared to QCLPs.
H2 QCHPs do not show a significantly higher enabler implementation in Technology Assessment & Usage compared to QCLPs.
H3 QCHPs do not show a significantly higher enabler implementation in Housekeeping compared to QCLPs.
H4 QCHPs do not show a significantly higher enabler implementation in Process Management compared to QCLPs.
H5 QCHPs do not show a significantly higher enabler implementation in Standardization & Simplification compared to QCLPs.
H6 QCHPs do not show a significantly higher enabler implementation in Set-up Time Reduction compared to QCLPs.
H7 QCHPs do not show a significantly higher enabler implementation in Pull Approach compared to QCLPs.
H8 QCHPs do not show a significantly higher enabler implementation in Layout Optimization compared to QCLPs.
H9 QCHPs do not show a significantly higher enabler implementation in Planning Adherence compared to QCLPs.
H10 QCHPs do not show a significantly higher enabler implementation in Visual Management compared to QCLPs.
QCHPs do not show a significantly higher enabler implementation in Management Commitment & Company Culture
H11
compared to QCLPs.
QCHPs do not show a significantly higher enabler implementation in Employee Involvement & Continuous Improvement
H12
compared to QCLPs.
QCHPs do not show a significantly higher enabler implementation in Functional Integration & Qualification compared to
H13
QCLPs.
Table 27: Hypotheses to test significant difference in the enabler implementation between QCHPs and QCLPs
Low Effectiveness,
.80 Low Enabler (C1)
Low Effectiveness,
High Enabler (C2)
High Effectiveness,
QC Lab Effectiveness
.40
.20
Figure 17: Scatter plot of enabler relation with QC lab effectiveness for three clusters
17. FDA Quality Metrics Research – 2nd Year Report (Friedli, Köhler, Buess, Calnan, & Basu, 2018)
.60
.40
.20
.50 .60 .70 .80 .90 1.00
Enabler Implementation
Figure 18: Scatter plot of enabler relation with QC lab effectiveness for two clusters
Std. Sig.
Dependent Variable Category n Average t-value
Deviation (2-tailed)
QCHPs 17 0.72 .15
Preventive Maintenance .536 .0626
QCLPs 18 0.69 .12
QCHPs 17 0.64 .12
Technology Assessment & Usage1 .027 2.372
QCLPs 18 0.56 .06
QCHPs 17 0.86 .13
Housekeeping .027 2.315
QCLPs 18 0.73 .19
QCHPs 17 0.79 .12
Process Management .021 2.420
QCLPs 18 0.69 .12
QCHPs 17 0.83 .13
Standardization & Simplification .022 2.399
QCLPs 18 0.73 .11
QCHPs 17 0.66 .19
Set-up Time Reduction .076 1.830
QCLPs 18 0.55 .16
QCHPs 17 0.74 .19
Pull Approach .025 2.353
QCLPs 18 0.60 .17
QCHPs 17 0.75 .14
Layout Optimization .020 2.436
QCLPs 18 0.64 .11
QCHPs 17 0.78 .14
Planning Adherence .016 2.542
QCLPs 18 0.67 .13
QCHPs 17 0.69 .30
Visual Management .601 .528
QCLPs 18 0.64 .26
QCHPs 17 0.79 .11
Management Commitment & Company Culture1 .294 1.073
QCLPs 18 0.75 .06
QCHPs 17 0.69 .10
Employee Involvement & Continuous Improvement1 .005 3.098
QCLPs 18 0.60 .06
QCHPs 17 0.79 .14
Functional Integration & Qualification .016 2.528
QCLPs 18 0.68 .11
1 Equal variances not assumed
Table 28: T-test results for all enabler dimensions comparing QCHPs and QCLPs
18. That no significant difference for these enabler dimensions was found does not mean that these are enabler dimensions that are irrelevant. While for the
available dataset of QC labs these dimensions were not significantly different for QCHPs and QCLPs this is only linked to the operationalization of these
elements that can be found in Appendix 2 and may subject to change when the database size increases. Please review the definition of each dimension
carefully. We can only make conclusions based on the outlined questions per enabler dimension.
19. Technical Enabler System: Preventive Maintenance, Technology Assessment & Usage, Housekeeping, Process Management, Standardization & Simplifica-
tion, Set-up Time Reduction, Pull Approach, Layout Optimization, Planning Adherence, and Visual Management
20. Management Enabler System: Management Commitment & Company Culture, Employee Involvement & Continuous Improvement, and Functional
Integration & Qualification.
Limited CI Resources
Single Dimension
Average Training
Low Enable r & Pe rf.
Frequently new
and Embracing
Low Span of
Basic MES
Very High
High Customer Complaints
Variation
products
Cluste r 1
Control
Low Planning Adherence
Effort
Low Process Robustness
Low Predictability
Follower
Traditional
Quality
and New
High Enable r & Low P
Business
Low Integration
High Variation
Constantly new
Low Training
High Span of
TES & MES
Limited CI
Cluste r 2
Resources
products
Control
Effort
Basic & Advance d MES
High Homogeneity
High Predictability
High Enable r & High P
High Integration
Basic & Advanced
Infrequently new
High Training
High Planning
Low Customer
Low Variation
High Process
Low Testing
Proactive CI
Low Span of
Complaints
Robustness
Adherence
Cluste r 3
products
Control
Pioneer
Variety
Effort
New
TES
Late
Business
.90
QC Quality Behavior
.80
y=0.24+0.68*x
.70
.60
.50
.50 .60 .70 .80 .90 1.00
QC Quality Maturity
Figure 21: Relation between Quality Maturity and Quality Behavior in QC
21. In the year 1 report, we analyzed and published the individual Quality Maturity attributes that are most responsible for the variance of overall Quality
Maturity. We also showed that Quality Maturity and Quality Behavior strongly correlate. Here, we show which individually Quality Maturity attributes
most strongly directly foster Quality Behavior.
36%
42%
51%
61%
Figure 24: Matched FDA inspection outcomes compared to overall population (2009 – 04/2019)
Groups defined by Overall Enabler Score Groups defined by TQM Enabler Score
(n=55) (n=55)
0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100%
NAI *AI NAI *AI
Groups defined by JIT Enabler Score Groups defined by Quality Maturity (n=55)
(n=55)
Bottom half 6 19
Bottom half 6 21
0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100%
NAI *AI NAI *AI
Bottom quartile 3 11
Top quartile 9 5
Figure 26: Inspection outcomes by detailed Total Quality Management enabler implementation level
Bottom quartile 2 9
Bottom half 4 19
Lower middle quartile 2 10
0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100%
NAI *AI NAI *AI
Groups defined by OTIF - Service level Groups defined by Release Time (n=51)
(n=45)
Bottom quartile 4 7
0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100%
NAI *AI NAI *AI
API
Other DP API
13%
19% 19%
Other DP
Drug-Device 36%
combination 8%
4% 47%
54% Drug-Device Sterile DP
Sterile DP combination
Figure 28: Technology classification of site data records with good and bad inspection
Table 34: Operational KPIs and previous compliance results for groups defined by current inspection
The preliminary score is then processed as follows: There are no significant differences between the two groups in
term of their previous internal quality compliance audit. This
might be due to several factors. Audits and inspections can have
different focus areas. Also, a site evaluated with many findings in
an internal audit can correct these deficits and be evaluated well in
Since the FDA does not provide citations with classification of se- a following inspection, so that the results are very different. These
verity, the calculation is simplified and illustrated in the following interlinks are not yet fully uncovered.
fictional example:
The given FDA inspection of an API site was concluded with three
9.2.3 Conclusion
citations. The maximum citations of the FDA for any API site in the In addition to the work on the St.Gallen Operational Excellence
sample is seven. benchmarking database, the research team investigates quality
compliance risk in a single case study based on proprietary longi-
tudinal compliance and operational data from one of the top five
global pharma companies:
The overall population of Preliminary FDA inspection scores for » Sites that receive fewer citations in the current inspection,
likely also had fewer citations in previous inspections.
API sites shall be . Using the percentrank-function (per-
Similarly, before an inspection with more findings, typically
centrank.inc in MS excel ranks values by how many values in the
more time has passed since the previous inspection. This
sample are bigger/smaller on scale from 0 to 1) this computes to the
confirms some of the factors used in the FDA’s risk based
following inspection score:
site selection methodology (FDA, 2018).
» Better quality compliance evaluations, meaning inspections
with fewer (weighted) citations, go together with better
performance in operational metrics. This confirms the
At 0.6, the given inspection ranks in the better half of inspections. findings from the first sub-chapter.
In consequence, all inspections are equipped with a comparable
9.3.2 Results
The performance evaluation of the resulting models is displayed in
Figure 29. For each model the figure displays the data used for in-
puts, the ROC (Receiver-Operator-Characteristic) curve chart and
the associated AUC (Area-under-curve) value.
The ROC calculation requires the model to provide a probability
for the predicted outcome event between 0 and 1. The algorithm
then test several probability cut-offs and evaluates the number of
true and false predictions based on the probabilities and the varied
cut-off values. The area under the curve metric then allows to pro-
vide a simplified measure for its prediction performance between
0.5 and 1. An ideal model would result in a curve that stretches to
▪ Inspection + audit history ▪ Inspection + audit history ▪ Inspection + audit history ▪ Inspection + audit history
▪ Production type ▪ Production type ▪ Production type
▪ Rejected batches, right first ▪ Rejected batches, right first
time, product complaints time, product complaints
▪ Only data records with KPIs
AUC = 0.54 AUC = 0.593 AUC = 0.677 AUC = 0.715
Figure 29: Performance evaluation of Logistic Regression models by input data used
Management Responsibilities
1 2 3 4 5
To what degree does your site management Management does not Management rarely Management Management regularly Management always
N01 participate in the design, implementation, participate in these participates in these sometimes participates participates in these participates in these
monitoring and maintenance of an effective activities activities in these activities activities activities
pharmaceutical quality system?
Quality policy represents Quality policy sometimes Quality policy does not
Quality policy in part Quality policy is designed
a major barrier for represents a barrier for represent a barrier for
N02 To what degree does your quality policy facilitates continual to facilitate continual
continual improvement of continual improvement continual improvement
facilitate continual improvement of the PQS? improvement of PQS improvement of PQS
PQS of PQS of PQS
Site management
To what degree does your site management Site management does Site management rarely Site management Site management always
sometimes determines
determine and provide adequate and not determine and determines and provides regularly determines and determines and provides
N03 and provides adequate
appropriate resources (e.g. human, financial) provide adequate and adequate and provides adequate and adequate and
and appropriate
to implement and maintain the PQS and to appropriate resources appropriate resources appropriate resources appropriate resources
resources
continually improve its effectiveness?
Management
Management does not Management rarely Management regularly Management always
sometimes conducts
To what degree does your management conduct reviews of conducts reviews of conducts reviews of conducts reviews of
N06 reviews of process
conduct reviews of process performance and process performance and process performance
performance and
process performance and process performance and
product quality on a regular basis? product quality and product quality product quality product quality
product quality
Knowledge is managed
Knowledge is managed Knowledge is managed
Knowledge is not Knowledge is managed along the entire product
N07 Is product & process knowledge managed across R&D and across R&D, production
managed systematically within departments lifecycle, incl. commercial
comprehensively along the product lifecycle? production and QC
life & discontinuation
Does your approach for managing None of the tasks At least one of the tasks At least two of the At least three of the tasks
N09 manufacturing process knowledge cover the All four tasks covered
covered covered tasks covered covered
following tasks? Acquire, analyse, store,
disseminate knowledge.
How much of the following knowledge At least two of the At least three of the
sources do you use systematically? None of the sources One of the sources All four sources used
N10 sources used sources used
Development studies, technology transfer used systematically used systematically systematically
systematically systematically
activities, process validation studies,
manufacturing experience.
There is a process to
Do you have a process to ensure that There is a process to There is a process to There is a process to
There is no process to disseminate this
information regarding nonconforming product, disseminate this disseminate this disseminate this
N13 disseminate this information. It is used in
quality problems and appropriate CAPAs information. It is not information. It is rarely information. It is used in
information practice and continuously
properly disseminated, including used in practice. used in practice. practice.
reviewed.
dissemination for management review?
Process for
documenting CAPA is Process for documenting Process for documenting
Formal process fot
No formal process for applied, however may CAPA is executed CAPA results in being
N14 Please describe your current CAPA documentimg CAPA
documenting CAPA have inconsistencies in consistently from record robust documentation of
documentation process. exists
execution from record to record all CAPA records
to record
We follow a structured
approach to identify risks,
We follow a structured
We follow a structured document them and
Are you aware of existing risks, review We do not document We document existing approach to identify risks,
N20 approach to identify & communicate them
existing risks on a regular base, prioritize & existing risks risks if we come across document them and
document risks across the organization.
communicate them proactively? communicate them
All documented risks are
re vie we d pe riodically.
The QRM process is not The QRM process is The QRM process is The QRM process is
integrated with the not integrated with the integrated with the overall integrated with the overall
Is your quality risk management process We do not have a formal ove rall Quality ove rall Quality Quality Management QMS, its effectiveness is
N21 integrated with the overall Quality System QRM process. Management System. Management System or System and its measured, the interaction
and do you asses the effectiveness of the We do not measure its its effectiveness is not effectiveness is with QMS is continuously
QRM? effectiveness. measured. measured. re vie we d.
We have an accident
We have an accident We have an accident reporting and recording
We do not have an We have an accident
reporting and recording reporting and recording system, measure the
N28 Do you have an effective accident reporting accident reporting and reporting and recording
system but it is rarely system and measure the system's effectiveness
and recording system? recording system. system.
used in practice. system's effectiveness. and review/update the
system continuously.
Housekeeping Regularly-updated
To what degree are housekeeping checklists Housekeeping checklists Housekeeping checklists
We do not have checklists exist and are housekeeping checklists
D17 used to continuously monitor the condition exist but are not widely are adhered to across
housekeeping checklists visible, but are adhered are adhered to across
and cleanness of our machines and visible the site
to unevenly site
equipment?
We sometimes identify
To what degree do you identify sources of We do not identify these We rarely identify these We regularly identify We continuously identify
N30 these sources of
variation affecting process performance and sources of variation sources of variation these sources of variation these sources of variation
variation
product quality?
Our monitoring system Our monitoring system Our monitoring system Our monitoring system Our monitoring system
To what degree does your process does not provide rarely provides sometimes provides regularly provides continuously provide
N32 performance and product quality monitoring knowledge to enhance knowledge to enhance knowledge to enhance knowledge to enhance knowledge to enhance
system provide knowledge to enhance process understanding process understanding process understanding process understanding process understanding
process understanding?
We do not monitor, We rarely monitor, We sometimes monitor, We regularly monitor, We continuously monitor,
To what degree do you monitor, measure measure and control measure and control measure and control measure and control measure and control
N34
and control process quality of outsourcing process quality of process quality of process quality of process quality of process quality of
activities? outsourcing activities outsourcing activities outsourcing activities outsourcing activities outsourcing activities
To what degree are proposed changes Proposed changes are Proposed changes are Proposed changes are Proposed changes are Proposed changes are
evaluated by expert teams from relevant not evaluated by expert rarely evaluated by sometimes evaluated regularly evaluated by always evaluated by
N36 areas (e.g., Pharmaceutical Development, teams that ensure the expert teams that by expert teams that expert teams that ensure expert teams that ensure
Manufacturing, Quality, Regulatory Affairs change is technically ensure the change is ensure the change is the change is technically the change is technically
and Medical), to ensure the change is justified technically justified technically justified justified justified
technically justified?
We do evaluate the
We evaluate the impact We evaluate the impact
impact of changes on We evaluate the impact
of changes based on a of changes based on a
To what degree do you evaluate that there We do not evaluate the product quality, but do of changes based on a
formalized change formalized change
N38 was no deleterious impact on quality across impact of changes on not have a formalized formalized, but not risk-
management system that management system that
the entire product lifecycle, after product quality and risk-based change based change
includes product quality includes all product and
implementation of changes? management system in management system
risks process related risks
place
Quality department is not Quality department is Quality department is Quality department is Quality department is
How often is quality department actively actively involved in the rarely actively involved in sometimes actively usually actively involved always actively involved
N40
involved in the change management change management the change involved in the change in the change in the change
process? process management process management process management process management process
To what degree does management measure Management does not Management rarely Management Management regularly Management
effectiveness of pharmaceutical quality measure the measures the sometimes measures measures the continuously measures
N41
system (PQS) objectives (including process achievements of PQS achievements of PQS the achievements of achievements of PQS the achievements of PQS
performance and product quality) at your objectives objectives PQS objectives objectives objectives
site?
It includes a structured
It includes a structured It includes a structured
It does not include a It includes a structured approach for timely and
To what degree does Management Review at approach for timely and approach for timely and
structured approach for approach for timely and effective
your site include a timely and effective effective communication effective communication
N42 timely and effective effective communication communication and
communication and escalation process to and escalation, which is and escalation, which is
communication and and escalation, but it is escalation, which is
raise appropriate quality issues to senior visible and adhered to visible and adhered to all
escalating problems n ot widely visible visible, but is adhered
levels of management for review? across most of the site across the site
to unevenly
To what degree does your site management Site management Site management Site management
Site management does Site management rarely
review the results of regulatory inspections sometimes reviews the regularly reviews the continuously reviews the
N43 not review the results of reviews the results of
and findings, audits and other assessments, results of such results of such results of such
such inspections such inspections
and commitments made to regulatory inspections inspections inspections
authorities?
Periodic reviews
To what degree does your site management Periodic reviews do not Periodic reviews rarely Periodic reviews usually Periodic reviews always
sometimes include
N44 perform periodic reviews, that include include measures of include measures of include measures of include measures of
measures of customer
measures of customer satisfaction such as customer satisfaction customer satisfaction customer satisfaction customer satisfaction
satisfaction
product quality complaints and recalls?
Periodic reviews
Periodic reviews do not Periodic reviews rarely Periodic reviews usually Periodic reviews always
To what degree does your site management sometimes include
include conclusions of include conclusions of include conclusions of include conclusions of
N45 perform periodic reviews, that include conclusions of process
process performance and process performance process performance and process performance and
conclusions of process performance and performance and
product quality and product quality product quality product quality
product quality monitoring? product quality
Quality policy at our site Quality policy has been Quality policy has been Quality policy has been Quality policy has been
To what degree is quality policy at your site has not been reviewed in reviewed within the last reviewed within the last reviewed within the last 5 reviewed within the last 5
N46
reviewed periodically for continuing the last 5 years for 5 years and is since 5 years and is since years and is since years and is since
effectiveness? continuing effectiveness rarely reviewed sometimes reviewed regularly reviewed continuously reviewed
The following table shows the enabler being assigned to quality behavior, quality maturity or none of the categories. As the analysis intends to
follow-up on the appropriate research in manufacturing (Friedli et al., 2017), only enablers being part of both St.Gallen Operational Excellence
Questionnaires (Manufacturing and QC Lab) are considered. New enablers represented in the lab questionnaire only are highlighted as New
in QC lab benchmarking
Housekeeping
To what degree do employees strive to keep the lab neat and clean? x
To what degree are tools and consumables put in their place (e.g. usage of a shadow
x
board)?
To what degree are housekeeping checklists used to continuously monitor the condition
x
and cleanness of our equipment?
To what degree do you do a regular review of the “As-Is” situation (e.g. by doing a
walkthrough) in your lab in order to identify potential improvement areas (e.g. by doing a New in QC lab benchmarking
gap analysis)?
Process Management
To what degree are direct and indirect processes documented? x
To what degree is process quality continually measured using process metrics? x
To what degree are dedicated process owners defined and responsible for planning,
x
managing, and improving their processes?
What proportion of the equipment in the lab is currently under statistical process control
x
(SPC)?
To what degree are standardized tools in place for root cause analysis, to get a deeper
x
understanding of the influencing factors (e.g. DMAIC)?
Set-up Reduction
To what degree do you continuously work to lower set-up and cleaning times in your lab?
To what degree do analysts practice set-ups to reduce the time required?
What proportion of equipment set-ups are scheduled so that the testing process is not
affected (e.g. to shorten lead time)?
To what degree are optimized set-up and cleaning procedures documented as best-
x
practices and rolled-out throughout the whole lab?
Pull Approach
Do you use a pull system (Kanban squares, containers or signals) for your consumables?
To what degree do you test according to forecast?
To what degree do you have tools installed for a regular demand and FTE capacity
New in QC lab benchmarking
analysis?
Layout optimization
To what degree are your processes located close together so that material handling and
consumable storage are minimized?
What proportion of testing substances/products are classified into groups with similar
processing requirements to reduce set-up times?
To what degree does the layout of the lab facilitate low inventories and fast throughput?
To what degree can your lab layout be characterized as separated into “mini-labs”, if
testing substances/products have been classified based on their specific requirements?
To what degree does you testing processes from incoming testing material to release
involve almost no interruptions and can be described as a full continuous flow?
To what degree do you use “Value Stream Mapping” as a methodology to visualize and
optimize processes?
Planning Adherence
To what degree do you meet your daily lab testing plans?
To what degree do you know the root causes of variance in your lab working schedule and
continuously try to eliminate them?
To what degree does your lab have flexible working shift models in order to easily adjust
labor capacity according to current demand changes?
Beyond flexible working shifts, do you assign extra resources within the lab for testing
during peak loads or do you outsource activities? New in QC lab benchmarking
To what degree do you prefer to increase productivity over short lead time or vice versa?