1 s2.0 S0957417421015220 Main

Expert Systems With Applications 190 (2022) 116208
Contents lists available at ScienceDirect
Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa
Generating decision support for alarm processing in cold supply chains

using a hybrid k-NN algorithm
Iurii Konovalenko *, André Ludwig
a
Kühne Logistics University, Großer Grasbrook 17, Hamburg 20457, Hamburg, Germany
A R T I C L E I N F O A B S T R A C T
Keywords: Real-time temperature monitoring is necessary in cold pharmaceutical supply chains (SCs), where exposures to
k-nearest neighbors extreme temperatures can lead to product quality deterioration. Temperature alarms (TAs) triggered by the
Fuzzy set current rule-based systems still require lengthy examinations before a suitable corrective measure (CM) can be
Recommendation
chosen. However, provision of additional information relevant to TAs can expedite the examination process.
Decision Support
Pharmaceutical supply chain
In the related areas of recommender systems and false alarm/anomaly detection, k-nearest neighbors (k-NN)
Temperature deviation algorithm has proven to be successful because of its interpretability and ease of use. However, in the context of
TA processing, it may suffer from some inherent limitations (i.e., varying neighborhood radius, unreliable
classifications in sparse and noisy regions, and blindness to natural class boundaries). To overcome these limi
tations, we propose a hybrid k-NN (Hk-NN) algorithm based on the principles of local similarity and neigh
borhood homogeneity. It incorporates a two-step voting procedure with an entropy-optimized k-NN radius,
decision trees with k-constrained leaves, and nearest neighbor predictions.
We investigate 16,525 comments by alarm personnel for TAs in a pharmaceutical SC and encode them in terms
of deviation causes and CMs (target features). We use SC data on cargo location, SC phase, sensor role, and
temperature characteristics as predictor features for TA similarity estimation. In eight experimental setups, Hk-
NN consistently outperforms k-NN with an optimized k in terms of accuracy, balanced accuracy, macro-average
precision, recall, and specificity. At the same time, Hk-NN refrains from predicting observations, for which k-
NN’s accuracy is close to a random guess.
1. Introduction sensitive pharmaceuticals projected to exceed $416 billion by 2022,

marking a 53% growth against 2016 (Schaefer, 2019), the challenge of
Maintenance of permissible temperatures in a cold supply chain (SC) reliable temperature control will gain relevance in the future.
is necessary not only from the regulatory perspective (European Com Fortunately, the abundance and affordability of sensor devices and
mission, 2001; World Health Organization, 2003), but also for securing the advances in mobile technologies have enabled the real-time collec
the quality and safety of perishable products. This is especially relevant tion and transmission of relevant cold SC measurements. This has made
in the context of temperature-sensitive pharmaceuticals that are subject post-delivery or occasional underway temperature monitoring possible.
to quality deterioration following exposures to extreme temperatures At the same time, event-driven and rule-based systems have paved the
(Haan, Hillegersberg, de Jong, & Sikkel, 2013). Considering the long way to further automate monitoring thanks to the real-time TA trig
and complex way that pharmaceuticals have to traverse on their way gering. However, a variety of real-world scenarios, in which TAs are
from the manufacturer to the end consumer, the importance of tem triggered, cannot be effectively encompassed by static alarm rules
perature maintenance and its associated challenges have become (Mousheimish, Taher, & Zeitouni, 2016; 2017). This results in a high
evident. Temperature control in modern SCs poses additional difficulties number of TAs not requiring human involvement. Consequently, much
because of its complex geographically-distributed multi-actor nature effort is still needed from alarm personnel for examining temperature
(Serdarasan & Tanyas, 2012). Moreover, the problem of pharmaceutical deviations. Further, a suitable corrective measure (CM) can be chosen,
quality loss still remains unresolved (Matthias, Robertson, Garrison, and the TA examination process can be documented. Obviously, this
Newland, & Nelson, 2007). With the worldwide sales of temperature- necessitates a research on how decision support (DS) can be provided to
* Corresponding author.
E-mail addresses: iurii.konovalenko@the-klu.org (I. Konovalenko), andre.ludwig@the-klu.org (A. Ludwig).
https://doi.org/10.1016/j.eswa.2021.116208
Received 7 September 2020; Received in revised form 21 December 2020; Accepted 6 November 2021
Available online 17 November 2021
0957-4174/© 2021 Elsevier Ltd. All rights reserved.
I. Konovalenko and A. Ludwig Expert Systems With Applications 190 (2022) 116208
the personnel involved in TA processing to expedite their decision- 2. Related work

making process
Nowadays, SC operations generate massive amounts of data for lo We single out three main pillars of research literature on the gen
gistic service providers (LSPs). In this way, historical data associated eration of DS in TA processing. These are temperature monitoring and
with temperature monitoring and documented TA comments can pro control, alarm filtering and anomaly detection, and recommender sys
vide a good basis for deriving DS in TA processing. However, extant tems in critical industries. We briefly investigate the literature in these
literature on temperature monitoring, alarm classification, and recom three fields, identify what existing approaches are applicable to our
mender systems in critical industries still falls short of offering a ready- problem, and derive the research gap that we will bridge through our
to-use solution for DS in TA processing. contributions by means of this study.
For example, research on temperature monitoring focuses either on
the applicability of technology in a cold SC (Shafiq et al., 2019; K. Yang 2.1. Temperature monitoring in a cold SC
et al., 2018), or potentially gives the opportunity to derive DS by pre
senting a set of rules triggering an alarm (Lang et al., 2011; Zakeri, Literature on temperature monitoring is mainly focused on solutions
Saberi, Hussain, & Chang, 2018). However, such solutions are labor- for food products, with pharmaceuticals being relatively scarcely
intensive in maintenance and very static (Mousheimish et al., 2016; addressed (J. Liu, Higgins, & Tan, 2010). Investigation of extant litera
2017). Alarm classification literature is typically focused on binary ture allows for distinguishing a couple of related research streams. In
predictions, that is, alarm or no alarm, and oversees the cases where an particular, they range from enabling technology-centric works to
alarm or its absence may have multiple subcategories laden with oper traceability/visibility solutions, and to more refined solutions with the
ational meanings (e.g., (Hsieh, Huang, Liu, Chu, & Chan, 2016; Su, rules declared for business logic.
2011b)). Finally, solutions in the form of critical recommender systems The first stream of research is represented by works that demonstrate
have so far been very specific to a problem or a field, not allowing for a the application of sensor technologies in a cold SC. For example, Emenike,
smooth transfer to the TA processing context (Dao et al., 2015; Gómez, Van Eyk, & Hoffman (2016) tested the RFID temperature sensing for
Goron, Groza, & Letia, 2016; Pilarski, 2014). predictive modeling with fewer sensors. The efficiency of RFID tech
At the same time, we encounter numerous studies applying k-NN in nology for product identification in an agricultural food SC was verified
similar settings, that is, alarm classification, recommender systems, or by (Leng, Jin, Shi, & Van Nieuwenhuyse, 2018). An unclonable, envi
even for the generation of DS in healthcare (Ruiz, Berenguer, Soriano, & ronmentally sensitive chip-less RFID tag that was capable of tracking
Sánchez, 2011; D. West, Mangiameli, Rampal, & West, 2005). Moreover, commodities and their temperatures in an SC was developed by (K. Yang
the interpretability of predictions by k-NN makes it much more attrac et al., 2018). A passive RFID temperature sensor to monitor the sur
tive in more critical settings, in contrast to other black-box models rounding environment of perishable goods in a cold SC was demon
(Farnaaz & Jabbar, 2016; Gaikwad & Thool, 2015). Despite the wide strated by (Shafiq et al., 2019). Finally, Mondal et al. (2019) proposed a
popularity of k-NN in similar prediction settings, it is fraught with lim blockchain-inspired RFID information architecture for food SCs.
itations, that are especially disadvantageous in our TA processing The second group is comprised of event-driven solutions, mainly
context, because of its higher misclassification costs. These limitations focusing on tracking and tracing of products along an SC. For example,
are as follows: a) a varying neighborhood size that may allow distant Thakur & Forås (2015) evaluated a pilot test of an event-driven online
neighbors to vote; b) blindness to natural class boundaries that can be temperature monitoring and traceability system. An intelligent value
problematic for predictions close to such boundaries; and c) the absence stream-based approach to food traceability enabled by fog computing in
of a mechanism for the identification of problematic data regions where a cyber-physical system was proposed by (R.-Y. Chen, 2017). The im
no classification should be made. plications of SC traceability based on the data from smart containers and
These limitations motivate the reconsideration of the algorithm’s pallets were discussed by (Wattanakul, Henry, Bentaha, Reeveerakul, &
suitability for a setting with higher misclassification costs and the Ouzrout, 2017). An IoT-based risk monitoring system for controlling
preference for abstinence from unreliable classifications over TA ex product quality in cold SCs was proposed by (Tsang et al., 2018). Finally,
aminations. Therefore, in this study, we propose a modified k-NN al Y. Zhang, Wang, Yan, Glamuzina, & Zhang (2019) developed and
gorithm considering the peculiarities of DS generation in TA processing. evaluated an intelligent traceability system for a cold SC.
For this purpose, we focus on the context-specific limitations of k-NN, The third stream of research is represented by (event-driven) rule-
and propose an algorithm that addresses these limitations. We conduct a based solutions, in which DS or business logic is based on rule specifi
series of evaluations to demonstrate the suitability of our algorithm for cations. For example, Lang et al. (2011) proposed a cognitive sensor
DS generation from the collected TA processing data. network for transport management that is based on sensor data, and in
Our paper is structured as follows. In Section 2, we review the related which background product knowledge enables decentralized decision
work on temperature monitoring, alarm detection, and critical recom making. A unified IoT modeling framework to reflect the dynamics of
mender systems to identify what solutions extant studies offer for the distributed IoT processes, devices, and objects that function based on the
problem at hand and assess their suitability. In the next section, we underlying rules was developed by (Tu, Lim, & Yang, 2018). A rule-
describe the materials/data used and propose a new hybrid classifica based early detection system for proactive management of raw milk
tion algorithm for DS generation in TA processing. In the methodolog quality was proposed by (Zakeri et al., 2018). A rule-based framework
ical part of the section, we delineate the limitations of k-NN widely for predicting disruptions under incomplete and uncertain information
applied in similar settings, and propose the adjusted hybrid k-NN (Hk- in event data was created by (Nawaz, Janjua, & Hussain, 2019). Finally,
NN) algorithm to target the identified limitations in our specific context. Pal & Kant (2019) presented a layered architecture for perishable lo
Additionally, we provide a subsection on the time complexity of the new gistics operations, in which infrastructure sharing is based on rule
algorithm. In the materials part of the section, we describe the business declarations.
case of a cold SC, in which the used data were generated, and show what Irrespective of its focus or underlying logic, extant literature in this
features we engineered and how they were encoded. Further, in Section area does not feature studies that address the provisioning of DS to
4, we present the evaluation results for Hk-NN vs. k-NN. In particular, we personnel involved in TA processing or temperature monitoring. This
present the general classification performance, evaluation on publicly necessitates the research into possible solutions with the corresponding
available data sets, simulation tests, and extension to the second-best functionality.
prediction. Finally, Section 5 summarizes the paper highlighting its
importance, describes the limitations of our algorithm, and proposes the
directions of future research.
2
2.2. Alarm and anomaly detection industry, and proposes solutions that are either quite specific to a certain
problem or provide only general guidelines on the engineering of such
Machine learning (ML) algorithms have been widely used in the systems. For example, Bouneffouf, Bouzeghoub, & Ganarski (2013)
context of detection of anomalies and classification of alarms (e.g., proposed an algorithm that considers the risk level of a user situation
Shang & Chen, 2020; Su, 2011a; 2011b), and have assisted in manual and adaptively balances an exploration – exploitation trade-off in
processing of overwhelming sensor data. This practice indicates their context-aware recommender systems. Naderpour, Lu, & Zhang (2014)
potential application in other areas with similar prediction settings. also worked with situation-aware systems and presented a cognition-
In the abovementioned context, it is especially important to be able driven DS system to help manage abnormal situations in safe
to back-engineer the classification logic of an algorithm for interpreting ty–critical environments. Evaluation results of a situation-aware
the classification output. In the extant research, however, black-box computerized operator support system in nuclear plants that helps
models have also enjoyed a wide-spread implementation, in partic detect faults earlier than with conventional control room technologies
ular, when a higher classification accuracy was an important goal in was provided by (Ulrich et al., 2017).
model building. In this section below, we provide an account of areas Dao et al. (2015) discussed the quality of a system developed to help
where various ML algorithms were successfully implemented for alarm the pilot of a distressed aircraft choose a diversionary airport. In a
and anomaly detection. We start our overview with interpretable models similar field, Gómez et al. (2016) illustrated a recommender framework
and move on to the application of black-box algorithms. for assisting flight controllers. The framework combines model checking
Interpretable models are represented in the literature by decision and argumentation theory in the evaluation of trade-offs under the
trees (DTs), k-NN, logistic regression, and naïve Bayes. For instance, DTs condition of incomplete and inconsistent information. Pilarski (2014)
were applied in the intrusion detection context, and were shown to worked on a more general solution and elaborated on a recommender
achieve a reduction in false positives (Anuar, Sallehudin, Gani, & system concept that can support decision making in a hierarchical mil
Zakaria, 2008). The algorithm was also used for solving the same itary organization.
problem over Big Data in a fog environment (K. Peng et al., 2018), and in
combination with rule-based models (Ahmim, Maglaras, Ferrag, Der 2.4. Research gap
dour, & Janicke, 2019). k-NN was successfully applied to fall detection
(Hsieh et al., 2016), identification of univariate time-series anomalies Although the research presented in the previous sections boasts of
(Ishimtsev, Bernstein, Burnaev, & Nazarov, 2017), network intrusion many helpful developments, we still see the need for further work that
detection (Li, Yu, Bai, Hou, & Chen, 2017), industrial fault detection would enable DS generation in TA processing. First, literature on tem
(Zhu, Sun, & Romagnoli, 2018), identification of faults in photovoltaic perature monitoring, in particular sensor technologies or event-driven
systems (Harrou, Taghezouit, & Sun, 2019), and early classification of solutions, enable data collection for further decision making. Such
alarm floods (Shang & Chen, 2020). literature still fails to provide any solutions for informative DS in tem
Logistic regression was successfully used for polymorphic malware perature monitoring. Although rule-based solutions are also potential
detection and achieved the accuracy of 0.977 (B. J. Kumar, Naveen, sources of DS (e.g., provision of rules that trigger an action are a good
Kumar, Sharma, & Villegas, 2017). The algorithm also fared well in the basis for making better-informed decisions), such systems are still very
early detection of health issues (P. M. Kumar & Devi Gandhi, 2018) and static, inflexible, and labor-intensive in terms of rule base maintenance
driver’s alertness (Allach, Ahmed, & Boudhir, 2019). Naïve Bayes (Mousheimish et al., 2016; 2017). Moreover, any variation in specified
combined with k-means fared well in the reduction of false alarm rate in rules leads to completely different system behavior and alarm processing
intrusion detection (Om & Kundu, 2012), detection of payload-based results (Hafliðason, Ólafsdóttir, Bogason, & Stefánsson, 2012).
anomalies (Swarnkar & Hubballi, 2016), and software defect predic Second, alarm classification literature that relies on easily inter
tion (Arar & Ayan, 2017). pretable models has so far focused on classification settings with two
Hardly interpretable models were represented by ensemble or hybrid classes with highly imbalanced data (e.g., intrusion detection). In this
algorithms and deep neural networks. For example, Farnaaz & Jabbar case, false positives may not pose a big problem and the application of
(2016) applied a random forest classifier for network intrusion detec interpretable models without significant modifications is justifiable. A
tion. A system for the same purpose using bootstrap aggregating was multi-class setting, in which the identification of each class is relevant,
built by (Gaikwad & Thool, 2015). A hybrid classifier for credit card necessitates an investigation into the suitability of off-the-shelf algo
fraud detection based on DTs, random forest, Bayesian network, naïve rithms for the task.
Bayes, and support vector machine in three voting scenarios was Finally, the literature on critical recommender systems has so far
composed by (Kültür & Çağlayan, 2016). Finally, Jokanovic & Amin been relatively field- and problem-specific, and does not allow for a
(2017) solved the problem of fall detection using deep learning in range- smooth transfer to other fields. At the same time, in the literature on
Doppler radars. traditional collaborative recommender systems, where k-NN is widely
In light of the importance of prediction interpretability in our TA used for item- or user-based recommendations, the desired serendipity
processing setting, and given the previous success of interpretable and novelty of predictions incapacitate the development of analogous
models in alarm and anomaly detection tasks, this study pursues the models for settings with higher misclassification costs (Ricci, Rokach, &
development of a solution that makes it possible for a user to understand Shapira, 2015).
the final prediction. For this reason, we will consider the deployment of Poor informative qualities of the current temperature monitoring
interpretable models as components of our proposed solution (e.g., k- solutions, a focus on rare single-class predictions in alarm detection
NN, nearest neighbor, and DT predictions). literature, and a rather problem-specific nature of existing critical
recommender systems urge us to reconsider the applicability of avail
2.3. Recommender systems in critical industries able research to the context of DS generation in TA processing.
Convinced by the successful applicability of k-NN in alarm detection and
Recommender systems in critical industries are a special type of in recommender systems, as well as the interpretability of its prediction
interactive systems that combine the elements of critical systems and results, we propose certain adjustments to the algorithm that are
interactive systems engineering to provide DS in settings with high necessary in our prediction scenario with higher misclassification costs.
safety or security requirements. Critical recommender systems originate
from widely popular e-commerce collaborative recommender systems, 3. Material and methods
and their engineering is still in its infancy (Bouzekri et al., 2019).
Extant research primarily suits the needs of aviation or military In this section, we focus on the new hybrid classification algorithm
3
Fig. 1. Limitations of k-NN algorithm (k = 15).
motivated by the limitations of k-NN in the scenario with higher

misclassification costs. The methodological part is followed by the
description of the data used for the evaluation of the new algorithm.
3.1. New hybrid k-NN algorithm
In the following subsections, we first consider the limitations of k-NN

in the scenario of higher misclassification costs (in particular, in TA
processing in a cold pharmaceutical SC). We then address these limita
tions by proposing a hybrid version of k-NN. In the following subsection,
we provide a description of time complexity of the proposed hybrid
algorithm.
3.1.1. Limitations of k-NN Fig. 2. Accuracy rate and radius size of neighborhood.
As mentioned in the previous section (2.4), k-NN was successfully
used in alarm classification settings, which is similar to the context of DS observation. The radius of search (ball or hypersphere) is increased till it
generation for TA processing. Moreover, the algorithm was also applied contains k neighbors. In locally dense regions, it should not pose any
to engineer critical recommender systems, specifically in healthcare problems, since relatively similar observations would eventually
(Bhatti et al., 2019; Ruiz et al., 2011; D. West et al., 2005). The popu participate in voting. In locally sparse regions, or regions with heter
larity of this algorithm is largely attributed to its advantages. In ogenous density, it leads to certain situations in which remoter and more
particular, k-NN is easy to train, is effective for large training data sets, dissimilar samples influence voting. At the same time, although a true
and is robust to noisy training data (Bhatia & Ashev, 2010; Jadhav & class may equal the class of the closest neighbors, the voting outcome for
Channe, 2016). Its results are also easy to interpret on a case-to-case k-NN may still differ from it because of a denser representation of a
basis in contrast to black-box models (Dreiseitl & Ohno-Machado, neighboring class (see Fig. 1a for an illustration).
2002), and its application does not require the establishment of a pre To demonstrate how the changeable radius of k-NN can affect pre
dictive model before classification (Yeh & Lien, 2009). dictions, we show how accuracy changes with a growing size of a k-NN
It the context of recommender systems, where novelty and seren hypersphere, with one of our datasets as an example (see Fig. 2. For this
dipity of recommendations are welcome (Ricci et al., 2015), some lim purpose, we first find an optimal number of nearest neighbors (k) via
itations of k-NN that are critical in settings with higher costs of grid search; then, we run a classification with a leave-one-out cross-
misclassifications have not yet been addressed. Such limitations can also validation and associate each prediction with the size of k-NN radius.
affect the quality of predictions in the context of DS generation for TA Finally, we calculate a cumulative accuracy with respect to the
processing. increasing radius. We observe that the accuracy is higher for smaller
Our exploration of multiple special cases of k-NN classifications for radius sizes, and drops with increasing radius sizes. Finally, it is at its
various data structures allows us to derive three main challenges that k- lowest, and remains almost unchanged for the largest radius sizes. This
NN may face: a) radius search restricted by a certain number of neigh observation indicates that keeping the neighborhood boundary for k-NN
bors, but no other distance metric (for example, this issue may become changeable may affect classification reliability where only a few or no
more evident for large k values (e.g., (Aldayel, 2012; Domeniconi, Peng, neighbors lie close to a new sample. In the context of DS generation for
& Gunopulos, 2001; Wu et al., 2008))); b) potentially biased predictions TA processing, this may turn out to be a critical limitation, especially for
close to natural class boundaries; c) unreliable predictions in noisy and new TAs associated with a few similar past deviations.
sparse data regions (in particular, for smaller k (e.g., (Aldayel, 2012; Wu Secondly, k-NN is blind to class boundaries. If a new observation lies
et al., 2008))). Fig. 1 illustrates these limitations for a scenario with 15 close to a class boundary and its true class is represented by fewer
neighbors. The remainder of this section is devoted to a detailed training observations than a competing class, then such a training point
description of identified k-NN limitations and how they are addressed in can be misclassified by k-NN (e.g., see Fig. 1b. Coupled with the limi
our Hk-NN algorithm. tation in Fig. 1a, this blindness to class boundaries can lead to
First, k-NN looks for the nearest k neighbors in the vicinity of a test
4
misclassifications even if a new observation lies well within a true class for an optimal number of nearest neighbors, and a fixed size of the
boundary. For this to happen, a competing class should be represented neighbor search radius (radiusopt). For subsequent evaluation setups, we
by a denser region in the k-NN hypersphere. Fig. 1b shows that the find the k parameter via random search with a 10-fold cross-validation
application of a classifier in constructing a decision boundary (in this (Bergstra & Bengio, 2012). For the estimation of a fixed neighborhood
particular case, DTs) can help provide an additional criterion in making radius, we run a classification with a leave-one-out cross-validation on
a classification decision for problematic regions. the training data and, just as in the data for Fig. 2, calculate the search
Lastly, k-NN does not have any mechanism for handling problematic radius associated with each prediction. We then sort the data by radius
classifications, that is, in regions with much noise or where decision size and use the latter with a target variable for prediction success (zero
boundaries of multiple classes meet and overlap. Such problematic re for misclassifications and unit for correct classifications) to derive in
gions can also be represented by heterogenous neighborhoods with tervals with maximum information gains based on the minimal
distant nearest neighbors, where classifications cannot be reliable. For description length principle (Fayyad & Irani, 1993). The cut point
an illustrative example, see Fig. 1c. In such cases, either abstaining from marking the end of the first interval is chosen as radiusopt, since it offers a
classification or relying on other insights from the local data structure higher accuracy than the following cut points (if multiples are found).
would be more helpful. Consideration of the labels of 15 nearest Following our first principle of local similarity, we first make a
neighbors that works best globally on training data can fail in such prediction for a test sample with a radius neighbors classifier that con
problematic neighborhoods. siders a fixed neighborhood radius (lines 3 through 6). This prediction
should help deal with the limitation illustrated in Fig. 1a. Situations
3.1.2. Description of Hk-NN algorithm where such fixed search radius has no neighbors is unavoidable, irre
Obviously, one can identify other limitations of k-NN for specific spective of the data structure. When no prediction is possible (RNpred is
data structures or in light of computational requirements (e.g., memory assigned the value “null”), we deal with this in the second step later. To
requirements, sensitivity to redundant attributes (Aldayel, 2012)). fulfill the second principle of neighborhood homogeneity, we make two
However, we focus on the ones mentioned above (Section 3.1.1) and other predictions with k-NN and DTs (lines 7 and 8). Thus, we do not
which are particularly important for predictions with high misclassifi confine ourselves to the notion of neighborhood extending equidistantly
cation costs. To overcome these limitations, we propose a hybrid algo from a test observation to capture a possible limitation illustrated in
rithm based on the principles of local similarity and neighborhood Fig. 1b. At the same time, we constrain DTs with a parameter k for the
homogeneity. The first principle rests on the assumption that (most) maximum size of a terminal node to avoid deep DTs prone to overfitting
training samples in the close vicinity of a new observation share the and to ensure that a k-NN contributes to the final prediction on an equal
same class label with it (as suggested by the relationship between the footing with DTs. Although decision boundaries of DTs are axis-aligned
radius size of k-NN and the accuracy). The second principle serves as a and cannot guarantee a complete class purity in a leaf because of the k
(final) confirmation or dismissal of a prediction based on the first constraint, they are still easily interpretable and, for a smaller k, can
principle. This second principle requires that the neighborhood of a new isolate very local concentrations of data points belonging to one
observation be homogenous, that is, the class in the closest vicinity of a class.
new observation coincides with a prevailing class in the neighborhood. As mentioned in the previous paragraph, a fixed search radius can
Here, the neighborhood notion is not necessarily defined as the k-NN contain no neighbors, especially in the areas with sparser data. In the
hypersphere. We provide the procedure for our hybrid algorithm, context of TA processing, this is possible for deviations with extreme
hereafter referred to as the Hk-NN, in Algorithm 1. conditions. In such cases, we relax the first principle to the nearest
For input, we need training (Xhist) and test data (Xnew), parameter k neighbor prediction that lies outside the fixed search radius, but is still
5
Fig. 3. Illustrative examples of voting outcomes of the Hk-NN (k = 15).
the most similar data point to a test sample (line 9). the predictions match (line 16). Otherwise, we abstain from prediction
In the first step of voting, we consider the predictions of k-NN and in the second step (line 19).
DTs, and a radius neighbors classifier. Both conditions are fulfilled if a In Fig. 3, we show the possible voting outcomes of Hk-NN. For
prediction by a radius neighbors classifier coincides either with the example, in situations where a fixed search radius is not empty (Fig. 3a,
prediction of k-NN, DTs, or both (lines 11 and 12). In this case, we do not b, e, a new observation is classified either in conformance with a pre
proceed to the second step of voting and predict the class upon which at diction of k-NN and/or DTs. Otherwise, a decision to abstain from pre
least two classifiers agree. If the fixed search radius is not empty and the diction is made if a noisy heterogenous region is identified (Fig. 3e. For
prediction of a radius neighbors classifier coincides with neither that of cases with an empty fixed search radius (Fig. 3c, d, f, the nearest
k-NN nor that of DTs, then we have strong reasons to believe that neighbor of a test observation can agree with a prediction by k-NN and/
training samples within a fixed radius can either represent noise and/or or DTs (Fig. 3c, d, or abstain from classification if a problematic region is
hint at a rare deviation that should be addressed with caution. In this signaled by the lack of prediction agreement (Fig. 3f.
case, we do not proceed to the second step of the voting procedure; we
prefer to abstain from prediction and delegate the final decision to an 3.1.3. Time complexity of Hk-NN
alarm employee (line 14). In this section, we elaborate on the time complexity of Hk-NN
If the fixed search radius is empty, we proceed to the second voting considering the worst asymptomatic performance of the component al
step with a relaxed restriction for the local similarity principle. We use gorithms (k-NN, DT, radius neighbors classifier, and nearest neighbor
the nearest neighbor prediction instead of a radius neighbors prediction classifier) and procedures (numerical sorting and entropy-based dis
and look for an agreement with the k-NN or DTs (lines 15 through 19). cretization for a fixed radius calculation). However, we do not include
This situation is common in sparser regions with a higher distance be the search for an optimal k for k-NN into the overall time complexity,
tween data points. In this case, we believe that if the most similar since it heavily depends on the procedure used for it (i.e., random or grid
training observation coincides with a prevailing class in the neighbor search), the number of folds in the cross-validation procedure, and
hood of a test observation, we can still rely on this prediction despite finally the search domain of candidate values for k.
local sparsity. If such a nearest neighbor lies in or close to the DT leaf As input, Hk-NN requires, among other things, the parameter radi
where a new observation is, or corresponds to, the plurality vote within usopt (see Algorithm 1). The procedure for its calculation is described in
a k-NN hypersphere, we believe that it fulfills both conditions as long as paragraph 2 of Section 3.1.2. Time complexity of this procedure
6
Fig. 4. Cold air SC.
( )
amounts to O snlogn +n2 +kn , where s is the number of splits for in 3.2. Used data and its preparation
tervals identified in the course of the minimum description length pro
cedure (Fayyad & Irani, 1993). Given an optimal value of k and n (the In the following subsections, we first describe a real-world cold SC
number of samples in a training set), running a classification with a scenario, in which TAs are currently processed manually, and from
leave-one-out cross-validation would require O(kn) computations. which the data for the evaluation of our algorithm are generated and
Sorting a list of n samples in the ascending order by the size of k-NN collected. Then, we focus on the preparation and representation of data
( )
radius would require O n2 computations in the worst case. Finally, in the feature generation subsection.
each split considered in the minimum description length procedure
takes O(nlogn) time. If the procedure finally outputs s thresholds, at most 3.2.1. Cold SC scenario
2 s + 1 computations are done, which results in the complexity of Air pharmaceutical SC is a special case of a cold SC, where the
O(snlogn) (Kohavi & Sahami, 1996). maintenance of a permissible temperature regime is just as important as
In an ensuing two-step voting procedure, predictions by k-NN, DT, in other modes or for other perishables, but where speedy transportation
radius neighbors classifier, and nearest neighbor are used. Using these and delivery are prioritized. We have temperature monitoring and TA
classifiers for predictions requires additional computations. For processing data at our disposal from a large international LSP rendering
example, for the number of features m, k-NN has a time complexity of temperature-controlled air transportation services to over 100 phar
O(mnk) (computing distances would require O(m) time and looping maceuticals manufacturers in four continents. As of the time of our
though the training set for k neighbors would take O(nk) time). Growing research, the LSP has collected over 20 million temperature measure
( )
an DT would require the complexity of O mn2 in the worst case (here, ments from 70,874 sensor devices.
A simplified and a straightforward view of the LSP’s cold air SC is
we consider critically unbalanced DTs that split data into partitions of 1
illustrated in Fig. 4. A starting point of a cold SC is in the premises of a
and n – 1 in each node). Calculation of the time complexity for nearest
manufacturer, where pharmaceuticals are packed with proper insulation
neighbor classier is analogous to k-NN for k = 1, i.e. O(mn). Radius
( ) and stored until they are picked up. Upon pick-up by the freight
neighbors classifier would require O mn2 computations in the worst
( 2) forwarder, they are either shipped (by truck) to the airport facilities or to
case, i.e. O n for distance calculations (Bentley, Stanat, & Williams, the LSP facilities, where consolidation, deconsolidation, repackaging,
1977) multiplied by the number of features. and temporary storage are performed prior to customs processing and
Correspondingly, the upper bound of time complexity for training flight. Following one more leg of road transportation to the airport fa
( )
Hk-NN would be equal to O snlogn +mn +kn +kmn +n2 +mn2 . It cilities, customs clearance and the subsequent security checks happen
should be noted, however, that the required computations reflect rather before a shipment can be forwarded to the temperature-controlled
pessimistic scenarios and do not consider more efficient implementa storage facilities at the airport of origin. After the receipt of a ship
tions of some algorithms / procedures that have been proposed in the ment booking list, cargo is collected at storage facilities and moved to
extant literature. For example, numerical sorting preceding the entropy- the hold area for ramp transport, and then to parking position, where it
based discretization procedure can be performed by block merge sort is loaded on a flight as per the load plan.
algorithm within O(nlogn) or, in the best case, within O(n) time (Kim & After the flight arrival at the destination airport, the cargo is
Kutzner, 2008). Using a k-d tree space-partitioning data structure, unloaded and moved to appropriate storage facilities, and to appropriate
nearest neighbors’ search can be reduced from O(n) to O(logn) (Brown, locations for transfer to another carrier in the case of transit; or if the
2015). Similarly, k-NN’s O(mnk) can be simplified to O(mn +kn) if the cargo arrives at the final destination airport, the freight forwarder is
implementation computes and stores distances, which requires on the notified and the cargo is stored for import. Following customs clearance,
other hand the additional space complexity of O(n) . Finally, if we the cargo is prepared for handover to the freight forwarder, who collects
consider a less pessimistic case of balanced binary splitting DTs, their and transports it to the premises of a consignee. In a real-world scenario,
growing complexity can be reduced to O(mnlogn) (Cormen, Leiserson, air SCs are usually more complex and may contain multiple layovers and
Rivest, & Stein, 2009). Higher values of k for the minimum number of transshipments, where the above-described operations are repeated.
samples in a leaf could eventually constrain the depth of DT even below In the entire course of storage, transportation, and handling opera
logn. tions, correct temperatures should be maintained to guarantee the
quality and safety of use of pharmaceuticals. For this reason, constant
7
temperature monitoring is ensured with the help of sensing devices that Table 1
transmit measurement event data in real-time. Sensors can already be Sample TA comments and their corresponding encoding.
attached to the shipment on the manufacturer’s premises and assigned TA comment DC CM
(activated) in the monitoring system of the LSP for ongoing measure
“[…] received this alarm and issue resolution in NA action
ment collection. Additional sensors are attached by the LSP to its own progress. Shipment is now in SHA carrier
packaging, doors, and/or walls of a refrigerated container. They build a warehouse. We are confirming with airline to
sensor mesh and transmit the event data via a dedicated gate. The data ensure they have put our cargo into correct
are constantly transmitted with a predefined sampling rate anytime a temperature range.”
“[…] received this TM alarm. The shipment is pre preconditioning no action
network coverage is available (in our study, we work with measure conditioning and the temperature has been back
ments collected at 10-minute intervals). In the case of network un in range. No action is required.”
availability, the data are stored locally with the corresponding “Alarm occurred due to handling procedure. We are handling investigation
timestamps and transmitted later. The event data are collected until in contact with […] office for this case.”
“[…] received alarm. That temperature deviation handling monitoring
sensors are deactivated after delivery.
was due to tarmac time transfer from aircraft.
The shipment of cargo is associated with the initial assignment and Please note that temperature is ok. We will keep
constant collection of relevant temperature monitoring data. For on eye.”
example, the permissible temperature range is determined in the course
of laboratory stability studies (Ammann, 2011; European Medicines
Agency, 2003), and is associated with each shipment in the form of rules features. Then, a way of encoding a cargo location at the time of alarm
for an upper and a lower temperature threshold that cannot be exceeded. triggering is proposed. It is followed by the engineering of features
Along with the two thresholds, a setpoint temperature is also defined, standing for temperature characteristics and a sensor role.
which usually corresponds to the average of two thresholds and signifies
the temperature that a cooling unit is set to maintain. Each shipment is 3.2.2.1. Semantic evaluation of alarm comments. A starting point of
associated with a number of regular (internal) sensors and ambient successful predictions in a classification or regression is the correct
(external) sensors that register and transmit measurements typical of a representation of available data through relevant features. The dividing
pharmaceutical or container doors/walls. When the shipment passes line between a capable classifier and a good data representation is
certain milestones along the SC, that is, cargo pick-up, arrival at the blurry. An ideal feature engineering would make the classification task
airport, delivery to a consignee, etc., shipment status update event data trivial and, conversely, an omnipotent classifier would make feature
are generated with a time and location of a shipment progress. extraction superfluous (Duda, Hart, & Stork, 2000). Given the lack of an
Temperature event data and rule specifications for a permissible omnipotent classifier, we understand the importance of data preparation
temperature range form the basis for a TA triggering logic. Whenever a and investigate the body of knowledge on TAs to gain insights into why
current measurement satisfies one of the conditions in a rule base, for TAs are triggered. This knowledge is reflected in the documented TA
example, internal temperature being higher than an upper threshold for comments.
at least 10 min, a TA is triggered. Alarm personnel are notified of a We have 16,525 documented TAs with SC context data at our
deviation, and begin examining the case to decide on a CM. They study disposal. After receiving a notification of a deviation, alarm personnel
recent temperature data, look into which sensors registered the devia are requested to document the context in which TA was triggered and
tion, consider where the shipment is according to the shipment status what action was undertaken to rectify the deviation. There is neither a
data, contact a responsible party to get or confirm additional informa template nor a list of required components that should comprise a full-
tion, and decide on a suitable CM. At the same time, alarm personnel are fledged comment. Moreover, alarm personnel may sometimes fail to
expected to document the examination process and a final decision on a state either a deviation cause (DC) or a CM taken. Examples of TA
CM. comments we worked with are listed in Table 1 (column TA comment).
Given the involvement of multiple parties along the cold air SC and We semantically evaluated documented TAs and identified whether
the constant possibility of exposure to temperatures outside a permis and which parts of the comments contained the DCs and CMs. We started
sible range, many triggered TAs may require an examination; however, with specific descriptions and groups of related descriptions (for
not all of them require particular CMs (human involvement) to combat instance, “low sensor battery level” or “shipment still on tarmac waiting
the deviations. For example, when sensors are attached to the packed for being loaded”) and made our way up to more general categories by
pharmaceutical products on the manufacturer’s facilities, they may not tagging and grouping separate DCs and CMs. After several rounds of
have cooled down (preconditioned) properly to deliver true measure generalizations, we ended up with the following seven DCs.
ments, which leads to TAs without necessary CMs. If a shipment is Preconditioning made up the first group and was often represented by
subject to a brief physical handling (loading, movement to a warehouse, “shipment building,” “shipment preparation,” “sensor assignment,” etc.
etc.) and the exposure to a higher temperature is registered by external For handling, the following descriptions were typical: “physical
sensors, TAs not requiring human involvement may be triggered again. handling,” “cargo movement,” “ground handling,” etc. Storage was
Finally, failing to deactivate sensors after delivery can lead to the mostly represented as “deviation during storage,” “wrong temperature
transmission of unrealistic measurements. Conversely, TAs usually room,” or “shipment misplaced”. The fourth category, transportation was
require immediate CMs because of deviations during transport or stor described by “flight,” “deviation during transport,” “deviation on truck,”
age, when cargo may have been erroneously stored within a wrong “cargo still on the way,” etc. The next group, malfunctioning was often
temperature range or when a cooling unit malfunctions. described by “sensors stopped working,” “low battery,” “defect sensor,”
In the period from August 2013 through April 2020, 37,517 TAs were etc. The category ambient temperature was described by “ambient
triggered for transported pharmaceuticals, of which 21,338 received sensor,” “internal readings still in range,” “high ambient temperature,”
comments from alarm personnel. Following rigorous text and context etc. Finally, we formed a category for DC not stated because of its
data quality checks, that is, timeliness of comments, chronological numerousness and also to allow for the identification of TAs whose DC
correspondence of the TA comments with the temperature and shipment was hard to identify.
status data, etc., 16,525 samples could be finally used for evaluations. After the same grouping procedure for CMs, we arrived at the
following categories: action (“check packaging,” “add dry ice,” “move
3.2.2. Feature engineering into proper storage room,” etc.), investigation (“still in contact with
In this subsection, we focus on the sematic evaluation of documented carrier,” “asked for storage conditions,” “confirming cargo location,”
alarm comments in terms of DCs and CMs and the resulting derivation of “investigate the case,” etc.), monitoring (“monitoring temperature,”
8
(handling, preconditioning, delivery, or high temperature registered by an

ambient sensor). These vanish as we progress further to the middle of an
SC phase.
In light of these peculiarities, we propose the encoding of cargo
location with membership degrees in a fuzzy set storage and/or trans
portation with respect to the prior and subsequent shipment status up
date event. We generate a membership function by inductive reasoning
with the entropy minimization principle (Ross, 2010). It was shown that
a global measure of indefiniteness for fuzzy sets can be obtained with
entropy without the use of probabilistic constructs (De Luca & Termini,
1972).
A procedure for encoding the location of cargo at the time of TA
triggering for a particular SC phase is presented in Algorithm 2. As an
input, information on the initiation and ending of each shipment (Si) is
needed along with all commented TAs (Ai). Finally, DCs typical of a
Fig. 5. TAs triggered with respect to the vicinity of a PHP. particular SC phase are required. We differentiate between the DCs other
than transportation or storage right after a PHP (startRphase) and before the
“closely following shipment,” etc.), no action (“no action required,” next one (endRphase). This distinction between the start and end of an SC
“temperature back in range,” “normal deviation,” etc.), and CM not phase is important for the subsequent construction of a membership
stated. Examples of classifications for selected TA comments are shown function. It is also practically motivated by the fact that some DCs can be
in Table 1 (columns DC and CM). observed only earlier or later in particular SC phases. Thus, list startR
phase is made up of the following DCs: a) for preconditioning – pre
3.2.2.2. Encoding of a shipment location. When we take a look at the conditioning, NA; b) for handling, flight, and delivery – handling, ambient,
final DCs, we cannot help but notice their connection with the closeness NA. List endRphase is comprised of the following DCs: a) for pre
of TAs to physical handling points (PHPs) or their typical occurrence conditioning, handling, and flight – handling, ambient, NA; b) for delivery –
only in some phases along the SC. PHPs are illustrated in Fig. 4 as handling, ambient, delivery, NA.
shipment status update events according to the master operating plan of We begin by calculating the maximum duration of a shipment (lines
the standard CargoIQ of IATA (three-letter abbreviated statuses). They 2 through 5). Then, for each TA, we check whether it was registered
implicitly divide the whole SC into characteristic phases, in which only closer to the previous or the next PHP (line 9) and append the list cor
certain DCs can be observed. For example, preconditioning can be a DC responding to the previous shipment status update event (startList), or
only at the beginning of shipping, or before the statuses REW (arrival at otherwise invert the time of triggering and append the list for the next
LSP origin) or REH (arrival at LSP export gate). Similarly, delivery can be shipment status update event (endList) (lines 10 through 14). By
listed as a DC only after DLV (collected from airline) and shortly before generating these aggregate lists relative to PHPs, we preserve the
or after POD. This exclusive occurrence of DCs in some specific SC comparability of SC phases with different durations.
phases leads to a pragmatic division of the SC of each shipment into As a next step, we order TAs by time (lines 15 and 16). After
implicit phases that are separated by digital shipment status updates. regarding DCs transportation and storage as members of one class and the
Consequently, we come up with four SC phases, namely preconditioning rest as members of the other, we derive points in time (lines 17 and 18)
(up to REW or REH), delivery (after DLV), flight (between DEP (flight with the minimum class entropy based on the minimum description
departure) and ARR (arrival at destination)), and handling (for all other length principle (Fayyad & Irani, 1993). Then, we build two partial
phases between other three, for which only physical handling, road membership functions for a fuzzy set transportation and storage, that is,
transportation, and storage on the premises of the LSP or airport are one with respect to the previous and the other with respect to the next
typical). PHP (lines 19 through 37). If the number of identified points in time
Fig. 5 illustrates the relationship between TAs and closeness of equals unit, the resulting partial membership function is trapezoidal; if
shipment to a PHP (shipment status event that is manually triggered and more points are found, a function is piecewise linear. The coordinates of
presupposes the physical handling of a shipment). We observe that more pieces of the latter function are found by calculating the class purity
deviations are registered in the vicinity of such PHPs. Moreover, such within an interval (lines 24 and 33). The partial functions are normal
deviations tend to have DCs such as delivery, preconditioning, handling, or ized (lines 27 and 37) and added to produce a prototype membership
high temperature registered by an ambient sensor, and require less frequent function transportation and storage for the longest duration of a particular
CMs such as action or investigation. On the other hand, most of the TAs SC phase (membFuncWholephase). See Fig. 6 for an example of a piecewise
chronologically farther from shipment status events tend to be caused linear membership function transportation and storage.
during transportation and/or storage. Depending on the extent of devia For each TA, we try to estimate the shipment’s membership degree in
tion, they are associated more often with CMs action or investigation. a fuzzy set transportation and storage with respect to the beginning and
Obviously, it is important to have features representing the location end of an SC phase. We first calculate how chronologically distant a TA
of cargo when a TA is triggered. It should be noted that the data at our is to both PHPs (lines 41–42 and 50). Then, we find intercepts with the
disposal were collected by sensors, which were seldom or not equipped pieces of partial membership functions (lines 46 and 53), which finally
with a GPS module. Therefore, we need a general way of encoding a serve as encodings of the shipment location.
cargo location that does not rely on GPS data. The prototypical membership function is generated for the longest
We mentioned above that a majority of DCs chronologically farther duration of an SC phase. Therefore, by finding the intercepts with regard
away from the PHPs are storage and/or transportation. Obviously, to the beginning and end of an SC phase, we can map any shorter phase
duration of storage and transportation operations differs across and onto the prototypical function. At the same time, this encoding retains
within origin – destination pairs and is determined by multiple factors information on the possible duration of physical handling operations (as
(closeness of manufacturer facilities to an airport, customs clearance an inverted membership degree in transportation and storage). This way,
delays, pre-flight storage at the airport, etc.). At the same time, opera similarity calculation is not affected by SC phases of different durations.
tions in different locations may have different durations, but on an
average, SC parties try to handle cooled products with the shortest 3.2.2.3. Representation of temperature characteristics and a sensor role.
possible delays. We observe this in the more frequent DCs close to PHPs
9
10
Fig. 6. Example of a possible linear piecewise membership function.
each curve based on its membership in a cluster. As long as the cluster

means represent visually different curves and have three different
temperature characteristics, we assume that the latter can represent
such curves well if their cluster membership can be predicted based on
such characteristics. We used the class as a target variable, to which each
curve was assigned in k-means clustering. For features, we calculated
three mentioned characteristics from the curve data. Then, we found an
optimal number of neighbors via random search with a 10-fold cross-
validation (Bergstra & Bengio, 2012). We achieved a classification ac
curacy of 0.9818 (±0.0089), which gives us reasons to believe that these
three temperature characteristics can replace a curve representation
with six data points.
We encode a setpoint deviation, slope, and average setpoint devia
tion relative to a setpoint temperature (i.e., for a temperature regime of
+ 2–+8◦ C with a setpoint at + 5 ◦ C, a temperature measurement of +
9 ◦ C is represented as 1.33). This way, comparability of deviations for
Fig. 7. Cluster means of temperature curves prior to registered deviations.
pharmaceuticals with different temperature regimes is guaranteed.
Finally, we include the feature for a sensor role that was suggested in
Alarm personnel study a temperature trajectory before a TA and,
the TA comments analysis and in the extant literature on wireless sen
considering other contextual information, decide on a CM and investi
sors in a pharmaceutical SC (Haan et al., 2013). This feature indicates
gate a possible DC. Obviously, temperature data is an important factor in
whether temperature readings were registered by an internal sensor
estimating the similarity between TAs. We investigated temperature
attached to a pharmaceutical, or an external sensor on a door or a side of
curves prior to TAs to get insights into how such temperature data can be
a container. This differentiation can be important, since ambient sensors
represented in our feature space.
are usually more sensitive to temperature changes outside of a container
We found that most of the TAs had temperature curves, which started
(e.g., direct sunlight), and in some settings may be disregarded if
demonstrating a (slight) deviation from a setpoint about one hour in
exposure to extreme temperatures is brief and the deviation is not
advance. To see how typical temperature trajectories prior to a TA look
critical.
like, we derived the main underlying groups of trajectories by identi
In the end, our features comprise the role of sensor (ambient and
fying a natural number of clusters. We started with centering tempera
internal), cargo location (two membership degrees in the fuzzy set
ture vectors containing six measurements to achieve their comparability
transportation and storage), and temperature characteristics before a TA
for over 30 temperature regimes with different setpoints. Then, we ran a
(setpoint deviation, slope, and average setpoint deviation). Target fea
k-means clustering algorithm for a different number of clusters and
tures are DCs and CMs. Following the logic of division of SC into phases,
chose the one for which a silhouette measure was maximized (Rous
we split our initial data set into four sets for each SC phase. Although
seeuw, 1987). Means of identified clusters are illustrated in Fig. 7 as six
such division increases the number of evaluation setups and reduces the
lines standing for typical temperature changes before a TA.
sample complexity per data set, it reduces the dimensionality of data (no
We see that the clusters show various dynamics of deviations and we
categorical feature SC phase) and avoids potential misclassifications of
believe that this differentiation might influence how alarm personnel
TAs from different SC phases. Evaluation results are presented in the
react to a deviation and, in combination with other contextual infor
next section.
mation, hint at what caused it. However, for computational reasons, it is
worth finding a way of representing a temperature curve with fewer
4. Results and discussion
points. Upon a closer examination of cluster means, we observed that
each line differs from the others in terms of a setpoint deviation (how far
In this section, we first provide the results of Hk-NN and k-NN clas
away the latest measurement is from a predefined setpoint), slope (the
sification performance with an optimal k in the context of alarm pro
most recent change in temperature prior to a TA), and average setpoint
cessing (Section 4.1) and on seven publicly available ML data sets
deviation one hour prior to a TA.
(Section 4.2). We compare their performance in terms of accuracy,
Basically, the knowledge of these characteristics provides a com
precision, recall, specificity, and accuracy in the case of abstinence from
pressed representation of the temperature curve before a TA. For
prediction. We also simulate a TA processing scenario manually vs. one
example, a higher value of slope and setpoint deviation with a relatively
with the help of k-NN and Hk-NN to highlight the cumulative exposure
low value of the average setpoint deviation encodes a curve that is at
to deviations (Section 4.3). Finally, we justify the extension to the
first close to a setpoint, but deviates from it to a large extent before a TA.
second-best prediction (SBP) and describe how it was realized; this is
Similarly, a low value of slope and relatively high values of absolute and
followed by the updated accuracy scores for two calculation strategies
average setpoint deviation mean that a temperature curve was close to
(Section 4.4).
one of the thresholds during the last hour and finally exceeded it.
To verify our observation, we performed the k-NN classification for
11
Table 2 Table 3
Classification performance of k-NN and Hk-NN. Performance of k-NN and Hk-NN on publicly available data sets.
ACC PR REC SP Coverage ACC (D) ACC (B) Clf. ACC PR REC SP Coverage ACC (D) ACC (B)
1
SC phase Preconditioning (4,293 samples) UCI Data set “Spambase”
DC k-NN 0.909 0.904 0.909 0.906 1.00 0.740 0.908
k-NN 0.895 0.789 0.651 0.974 1.00 0.455 0.813 Hk- 0.938 0.935 0.935 0.935 0.967 Abstain 0.935
Hk- 0.917 0.869 0.702 0.980 0.974 Abstain 0.841 NN
NN UCI Data set “Statlog (Vehicles Silhouettes)”2
Recommended CM k-NN 0.717 0.706 0.719 0.905 1.00 0.554 0.812
k-NN 0.797 0.777 0.763 0.933 1.00 0.584 0.848 Hk- 0.747 0.734 0.743 0.916 0.934 Abstain 0.830
Hk- 0.823 0.814 0.800 0.940 0.976 Abstain 0.870 NN
NN UCI Data set “EEG Eye State”3
SC phase Handling (5,265 samples) k-NN 0.804 0.818 0.793 0.793 1.00 0.723 0.793
DC Hk- 0.841 0.845 0.836 0.836 0.977 Abstain 0.836
k-NN 0.882 0.778 0.609 0.946 1.00 0.627 0.778 NN
Hk- 0.910 0.844 0.673 0.958 0.955 Abstain 0.816 UCI Data set “Semeion Handwritten Digit”4
NN k-NN 0.916 0.921 0.915 0.990 1.00 0.393 0.952
Recommended CM Hk- 0.943 0.946 0.941 0.994 0.965 Abstain 0.968
k-NN 0.751 0.769 0.713 0.917 1.00 0.632 0.815 NN
Hk- 0.781 0.800 0.763 0.931 0.929 Abstain 0.847 UCI Data set “Glass Identification”5
NN k-NN 0.976 0.954 0.951 0.995 1.00 0.000 0.973
SC phase Flight (3,256 samples) Hk- 1.00 1.00 1.00 1.00 0.986 Abstain 1.00
DC NN
k-NN 0.756 0.779 0.619 0.929 1.00 0.531 0.774 UCI Data set “MAGIC Gamma Telescope”6
Hk- 0.818 0.827 0.730 0.946 0.916 Abstain 0.838 k-NN 0.841 0.847 0.792 0.797 1.00 0.698 0.795
NN Hk- 0.853 0.856 0.811 0.811 0.967 Abstain 0.811
Recommended CM NN
k-NN 0.753 0.735 0.677 0.927 1.00 0.524 0.802 UCI Data set “Website Phishing”7
Hk- 0.804 0.793 0.740 0.941 0.929 Abstain 0.841 k-NN 0.880 0.845 0.840 0.928 1.00 0.813 0.884
NN Hk- 0.893 0.861 0.865 0.937 0.968 Abstain 0.901
SC phase Delivery (3,711 samples) NN
DC 1
k-NN 0.795 0.783 0.655 0.955 1.00 0.534 0.805 Data set can be downloaded at https://archive.ics.uci.edu/ml/datasets/Spam
Hk- 0.851 0.850 0.775 0.967 0.907 Abstain 0.871 base
2
NN Data set can be downloaded at https://archive.ics.uci.edu/ml/datasets/
Recommended CM Statlog+%28Vehicle+Silhouettes%29
k-NN 0.812 0.788 0.744 0.936 1.00 0.583 0.840 3
Data set can be downloaded at https://archive.ics.uci.edu/ml/datasets/
Hk- 0.844 0.826 0.796 0.948 0.957 Abstain 0.872 EEG+Eye+State.
NN 4
Semeion+Handwritten+Digit.
5
Data set can be downloaded at https://archive.ics.uci.edu/ml/datasets/Glass
4.1. General results
+Identification.
6
Data set can be downloaded at https://archive.ics.uci.edu/ml/datasets/MAG
In Section 3.2.2.3, we proposed the division of data into four data IC+Gamma+Telescope.
sets. We also chose to separately predict DCs and CMs instead of pre 7
dicting a fixed pair of categories. This way, the number of target classes Website+Phishing.
is dramatically reduced, and a better classification performance can be
expected. In the end, we obtained eight evaluation setups, namely for across eight prediction setups, Hk-NN constantly outperformed the
each SC phase and two target features. former in terms of true positive rate by a much larger margin. For
Table 2 contains evaluation results for k-NN and Hk-NN in terms of example, performance difference for DC for flight and delivery equaled
accuracy (ACC); precision (PR); recall (REC); specificity (SP); classifi 0.111 and 0.121, respectively. A smaller performance difference could
cation coverage (Coverage); accuracy in disagreement, that is, absti be registered for a positive predictive value: Hk-NN outperformed k-NN
nence from prediction (ACC (D)); and balanced accuracy (ACC (B)). by up to 0.080 for DC predictions in preconditioning. In terms of balanced
Accuracy is widely used in literature pertaining to alarm systems (Anuar accuracy, Hk-NN also outperformed k-NN across all evaluation setups.
et al., 2008; Farid, Harbi, & Rahman, 2010; Gaikwad & Thool, 2015; Su, Performance difference ranged between 0.022 (CM in preconditioning)
2011a; 2011b), and shows what portion of all alarms was correctly and 0.066 (DC in delivery).
classified by the algorithm. Precision, or positive predictive value, shows In Sections 3.1.1-3.1.2, we focused on the limitations of k-NN and
the number of cases where the algorithm correctly predicted a certain proposed Hk-NN to abstain from problematic predictions, among other
DC or CM out of all predictions for such DCs and CMs (Derczynski, things. In Table 2, both strengths and weaknesses of such abstinence are
2016). Recall, true positive rate, or sensitivity is also used in alarm reflected in prediction coverage and accuracy in disagreement. We see,
classification research (Farnaaz & Jabbar, 2016; Y. Wang, 2005), and on the one hand, that k-NN accuracy is close to random guess for test
shows what share of predictions for a specific DC or CM was correct out samples, where Hk-NN decided to abstain from predictions. It is espe
of all such predictions. Specificity, or true negative rate, shows the cially crucial when we have to deal with higher costs of mis
proportion of correctly classified negatives out of all negatives (Altman classifications. On the other hand, we face more TA examinations with
& Bland, 1994). Finally, balanced accuracy considers data imbalances Hk-NN, since it does not provide predictions for all test samples. Basi
by normalizing true positives and negatives by the number of positive cally, prediction coverage serves as an indicator for problematic regions
and negative data points, which is also expressed as the sum of recall and in data and hints at a possible accuracy gain of Hk-NN. The lower the
specificity divided by two (Mower, 2005). prediction coverage, the larger the performance difference between k-
Generally, we observe that Hk-NN consistently yields a better per NN and Hk-NN because of the presence of more problematic regions in
formance across all evaluation metrics. Performance gain for accuracy data. We consider whether this peculiarity of Hk-NN affects its value in
ranged between 0.022 (DC for SC phase preconditioning) and 0.062 (DC terms of temperature exposure in a simulation in Section 4.3.
for flight). Although k-NN achieved a competitively high specificity Our final observation regards fairly consistent superior accuracy of
12
DCs vs. CMs. With the only exception of k-NN in a SC phase delivery, both
classifiers yielded more accurate predictions for DCs, especially for
handling (difference of 0.131 and 0.129 for k-NN and Hk-NN, respec
tively). We attribute this difference to the predominantly factual nature
of DCs, and a relatively judgmental nature of CMs because of the varying
proactiveness of alarm personnel. The resulting higher variance and
overlap of classes for CMs leads to somewhat lower accuracies. We
investigate this case in detail in Section 4.4 by extending Hk-NN to the
SBP.
4.2. Evaluation on publicly available data sets
In the previous section, we have seen a consistent performance

advantage of Hk-NN over k-NN with the exception of a prediction
coverage score. Admittedly, Hk-NN, by its design, is not expected to Fig. 8. Share of TAs with exceeded MKT.
provide predictions for data points in problematic regions which should
be preferably delegated to manual processing. However, a convincing 2003). The main indicator of such exposure over time is mean kinetic
performance advantage across other evaluation metrics illustrates the temperature (MKT) that reflects the continuous thermal challenge to a
findings in a specific business context with a characteristic data gener pharmaceutical over a studied period of time, even with temperature
ating process. Despite our use of eight different evaluation setups for the exposure not being stable during this period (Cristianini & Shawe-
business context of alarm processing in pharmaceutical SC, it would be Taylor, 2000). MKT is calculated according to the Arrhenius relation
helpful to observe the performance of Hk-NN vs. k-NN in other scenarios. ship as follows (Haynes, 1971):
For this purpose, we use seven publicly available ML data sets and
provide the comparison of two algorithms across seven evaluation ΔH/R
TK = ( ) (1)
metrics described in Section 4.1 in different classification tasks. Table 3 e− (ΔH/Rt1 ) +e− (ΔH/Rt2 ) +⋯+e− (ΔH/Rtn )
− ln
below contains the results of our evaluations. n
Similar to the findings in Section 4.1, Hk-NN provides a consistent

performance superiority over k-NN. Although the performance gap where ΔH is a heat of activation (calories mole-1), R is a universal gas
varies across seven data sets, accuracy is improved by up to 0.037 in the constant (calories mole-1 degree-1), ti is the absolute temperature at the
eye state classification task, whereas specificity and balanced accuracy ith point in time (Kelvin), and n is the number of temperature
show an improvement of 0.043 for the same classification task. In the measurements.
task of glass identification, Hk-NN achieves an ideal true positive rate If MKT exceeds a threshold of a permissible range within a period of
(recall) of 1.00 in comparison with 0.951 by k-NN. Similarly, positive time found in stability studies, a pharmaceutical product’s quality is
predictive value (precision) yielded by Hk-NN (1.00) was noticeably irreversibly affected, and it may have lost its pharmacological effect. The
higher than the value achieved by k-NN (0.954). stability of a pharmaceutical product is determined by MKT for periods
As observed in Section 4.1, Hk-NN always achieved a predictive up to 24 h (European Medicines Agency, 2003). Our simulation setup,
coverage of over 0.9 (the value of 0.907 yielded for DC in the SC phase however, allows for modeling exposures that last about as long as TAs
delivery). New evaluations have also shown that Hk-NN normally pro are processed. Nevertheless, given multiple points along the SC where
vides classifications for over 90% of test samples. For example, predic TAs are possible, it is still helpful to investigate whether MKT can
tion coverage of Hk-NN ranged between 0.934 (task of vehicle progress outside a permissible range.
identification based on its silhouette) and 0.986 (glass identification We set up a straightforward simulation model with sequential TA
task). Whereas this value depends on the data noisiness and reflects the examination that reflects the current manual TA processing procedure at
share of reliable predictions, the remaining unclassified samples are LSP. A list of model parameters and the ways how they were derived
delegated to manual processing (labelling) and are usually characterized from available data are provided in Table A.1.
by a much poorer performance of k-NN. While classification accuracy of In the manual TA processing scenario, each TA is examined
k-NN for such problematic samples in the alarm processing context sequentially till a decision on a CM or its necessity is made, and the TA
revolved around random guessing, evaluations on the seven new data can be documented. Depending on the final CM, its enactment may also
sets show somewhat higher variance of its classification accuracy. For take some time before temperature returns to a normal range. TA pro
example, it yielded a relatively good accuracy of 0.813 in the task of cessing time burden is accumulated until a final CM has been taken.
website phishing identification, however, it failed to correctly identify Automated TA processing with the help of k-NN initially allows for a
any of the glass classes (zero accuracy). reduction in manual examinations and contributes to the TA processing
time burden with the time required for the enactment of some predicted
4.3. Temperature exposure simulations CMs. However, if an incorrect CM is predicted for a TA (i.e., no action
instead of action), the TA remains active and is again triggered with the
Elimination of k-NN limitations in Section 3.1.2 has the flip side of next temperature measurement, thereby requiring more processing
incomplete prediction coverage by Hk-NN (see Section 4.1 and Table 2 time. Time burden is accumulated until all TAs have been processed and
and/or Section 4.2 and Table 3. On the one hand, such abstinence re no falsely classified TA are still active.
duces the number of misclassifications typical of k-NN. On the other The processing procedure of Hk-NN is similar to that of k-NN; how
hand, it increases the number of TAs that require manual examination, ever, TA processing time burden is increased by the number of TAs, for
thus diluting the advantage of automated processing. which Hk-NN did not provide any prediction. On the one hand, it
To identify whether the lower prediction coverage of Hk-NN can automatically increases the processing time burden, as in the case with
affect its performance in automated TA processing, we investigate how manual examinations. On the other hand, it avoids the cases of TAs that
the algorithm contributes to the reduction in exposures to higher tem are still active because of misclassifications for rare or problematic TAs.
peratures that a shipment undergoes after TA triggering and processing. Time burden is accumulated until all TAs have been processed (manu
How product quality is affected by exposures to extreme temperatures ally or automatically) and no falsely classified TAs remain active.
has been explored in stability studies (European Medicines Agency, The model simulates a scenario of one alarm employee facing a
13
certain number of TAs triggered at the same time. We run a model for a Table 4
different number of simultaneous TAs (1 through 24) to assess the Extension to the SBP.
possible effect of increasing workload. We consider the exposure to SC phase Constrained accuracy Inclusive accuracy
temperatures above the upper threshold and calculate MKT for a period calculation calculation
of one hour prior to a TA, during which temperature was at a setpoint DC CM DC CM
level, and afterwards till the TA was processed and temperature returned
Preconditioning 0.941 0.963 0.977 0.991
to its normal value. For simplicity, we focus on the cases with temper (+0.002) (+0.003) (+0.038) (+0.031)
ature increases in the + 2–+8◦ C range, but ignore the cases of temper Handling 0.944 0.956 0.985 0.988
ature going below a lower threshold. Simulation results are illustrated in (±0.000) (− 0.016) (+0.041) (+0.016)
Fig. 8, and show the share of TAs for which MKT was exceeded in 1000 Flight 0.896 0.919 0.952 0.970
(+0.010) (+0.006) (+0.066) (+0.056)
simulation runs.
Delivery 0.890 0.963 0.961 0.992
We notice that for up to five simultaneous TAs per one alarm (+0.001) (− 0.006) (+0.072) (+0.023)
employee, all approaches to TA processing yield good product safety;
however, the situation changes with an increasing number of TAs.
Manual processing suffers the most from the increasing workload class by Hk-NN and repetition of the entire procedure, that is, selection
because of the sequential TA examination. Hk-NN and k-NN substan of k, estimation of the radius size, etc., is possible. However, it would
tially reduce the frequency of TAs with excess MKT; however, the pose an additional computational burden. In the case of homogenous
increasing number of TAs still leads to a slight increase in the cases of neighborhoods possibly with a single representative class, it would also
spoiled products. Again, Hk-NN demonstrates the leading performance result in considering the votes of very remote neighbors of a different
despite its disadvantage in prediction coverage. Performance gap in class in k-NN. It would also either increase the volume of terminal nodes
creases with the number of simultaneous TAs. The reasons for this of DTs containing distant data points of a different class or build decision
performance gap are the generally better classification accuracy of Hk- boundaries in sparse regions where the majority class was removed, thus
NN, and a lower false negative rate for CMs action and investigation that reducing the reliability of predictions for new observations close to
leads to a higher number of active TAs after misclassifications by k-NN. decision boundaries.
Provided the current maximum number of TAs per one LSP employee In light of these considerations, we adhere to a simple and straight
approaches nine and the growing trend in recent years can be observed, forward procedure for selecting the SBP based only on the data points
the simulation provides a projection of how a situation can develop in within the k-NN neighborhood. First, we check whether the class pre
the future. Either alarm personnel’s capacity expansion or TA processing dicted by Hk-NN coincides with the initial prediction by k-NN. If they are
automation would be necessary to avoid the negative effect of increasing not the same, it means that even though the class predicted by k-NN
workloads. prevailed in the neighborhood, another class was chosen based on the
Hk-NN voting policy. In this case, we take this initial k-NN prediction as
4.4. Extension to SBP the SBP.
If these predictions coincide, however, we further explore the
In addition to comparing the performance of Hk-NN with k-NN, we remaining data points within the neighborhood of k-NN. If the first
extend the evaluation to a scenario in which the SBP is also provided. predicted class is the only class in the hypersphere, then we abstain from
This extension is motivated by the following practical considerations. the SBP. If the hypersphere initially contains two classes, then the mi
First, it is possible to have competing classes in a k-NN neighborhood nority class is the SBP. Finally, if the k-NN neighborhood initially con
in training data, and consequently, in recommendations. This is not tains data points with over two classes, we identify whether they
traceable in standard libraries, and predictions in such situations are not compete with each other. In the case of no competition, the choice of the
deterministic and subject to the embedded logic of algorithm packages majority class is obvious; if two or more remaining classes compete, we
(e.g., a class with lower encoded numeric value is predicted). Second, choose them as the SBPs.
the LSP might have an intrinsic motivation to avoid an imprudent reli Table 4 shows the accuracy of predicted DCs and CMs. We calculated
ance on single-class predictions. If alarm personnel are presented with accuracy in two ways: a) constrained (abstinence from a prediction either
recommended CMs, the incentive to closely study TAs and ensure their for the first-, or the SBP was not considered a correct prediction if k-NN
accuracy may be negligible. To combat the negative effect of the prediction was incorrect); b) inclusive (abstinence from a prediction
abovementioned scenarios, we look into the change in accuracy asso either for the first-, or the SBP was considered a correct prediction if k-
ciated with the introduction of SBP. NN prediction was incorrect). Numbers in brackets show the perfor
A repeated application of the Hk-NN would be problematic in an SBP mance advantage over k-NN.
setting for a number of reasons. In the first scenario, consideration of a Obviously, a more inclusive calculation strategy provides more
second-major class within the existing boundaries would be conceiv optimistic results. The biggest difference in accuracy across calculation
able. However, radius neighbors classifier faces multiple cases of empty strategies reaches 0.071 for the DC delivery. The accuracy difference is at
radius neighborhood already in the first iteration. Removal of data least 0.028 (CM for preconditioning), and reflects the share of abstinence
points of the first predicted class would obviously increase the number cases in a particular data set for which k-NN failed to provide a correct
of such cases and delegate the voting task to the nearest neighbor clas prediction. It is also worth noting that a constrained calculation strategy
sifier. Such voting outcome would not be reliable if the first predicted can be disadvantageous for Hk-NN because k-NN can provide a correct
class coincides with the nearest neighbor prediction. The new nearest classification for test samples where Hk-NN chose to abstain. In case of
neighbor is not guaranteed to be close to the new observation, and may the inclusive calculation strategy, however, Hk-NN tends to consistently
not reliably indicate the probable class of the latter. Moreover, DTs outperform k-NN.
produce terminal nodes with minimized entropy or Gini impurity, in Generally, we observe that presenting the alarm personnel with two
which the new observation may be farther away from some points than (in the case of competing classes in the SBP, with three) DCs or CMs can
the most distant neighbor in a k-NN neighborhood. Therefore, the achieve an accuracy of well over or almost 0.9, even for the constrained
second-major class data points may either be distant, or, especially in the calculation. Inclusive accuracy calculation provides an accuracy of 0.99
case of a small k, represent noise. Finally, the voting policy in Hk-NN for CMs of preconditioning and delivery. At the same time, accuracy does
would fail in cases where only one class represents a k-NN neighborhood not fall below 0.95 for the inclusive calculation. Therefore, an extension
or all data points in a DT leaf. to the SBP may be advantageous if concerns about the reliance on single-
In the second scenario, the removal of data points of the first predicted class predictions are justified.
14
One interesting finding worth noting is the consistently better per Table A1
formance for CMs, which is in contrast with the single-class predictions Variables in a simulation model.
by k-NN and Hk-NN (see Table 2. In Section 4.1, we ascribed this con Variable Source
trary observation to the fact that human judgment can introduce addi
TA examination time Derived from collected data on
tional variance to CMs (for instance, a more proactive employee decides documented CMs and TA triggering;
to monitor, even though the temperature is still within a permissible approximated as Γ(2.0, 0.15)
range, in contrast to a less proactive employee deciding to take no ac Extra time burden (CMs action and Derived from collected data on
tion). On the other hand, DCs are rather factual than judgmental in investigation) documented CMs; approximated as
nature. Consequently, we believe that the class boundaries for CMs tend Γ(2.0, 0.1)
to demonstrate overlaps for uncertain situations that are exacerbated by Extra time burden (CM monitoring) Derived from collected data on
documented CMs and inquiries to the
differing cautiousness of employees. For this reason, we see that the
LSP; approximated as Γ(2.0, 0.05)
introduction of SBP improves the accuracy of DCs to a lesser extent.
Deviation (range) for triggered TAs Derived from sensor measurement and
TA triggering data; approximated as
5. Conclusions U (0.1, 2.0)
Deviation (range) during enactment of a Derived from sensor measurement and
In this study, we focused on the problem of manual examinations of CM TA triggering data; approximated as
U (0.1, 0.3)
TAs in a pharmaceutical SC. Given the time-consuming nature of ex
Deviation (range) for brief deviations Derived from sensor measurement and
aminations, we searched for a way to provide alarm personnel with DS
requiring no CM TA triggering data; approximated as
based on the data collected for previous shipments and documented TAs. U (0.0, 0.1)
We proposed Hk-NN as the adjustment to k-NN in the context of DS TA triggering interval Derived from sensor measurement and
generation in TA processing. The new algorithm is represented by a two- TA triggering data (one sampling rate of
step voting procedure based on k-NN with an entropy-minimized fixed- 0.167 h)
size radius, pruned DTs, and nearest neighbor predictions. The proposed Probability of TA being triggered in SC Estimated on the basis of the numbers of
phase preconditioning, handling, flight, historically triggered TAs for different
modifications were targeted at the weaknesses of k-NN in a setting with and delivery SC phases; see Section 5.1
higher costs of misclassifications and preferred manual examinations in Probability of a TA requiring a CM action, Derived from collected data on CMs on
the cases of unreliable predictions. investigation, monitoring, and no action, the basis of historical TAs requiring
We used the SC and TA processing data from a large international or that no CM was documented for corresponding CMs
similar historical TAs
LSP from mid-2013 through 2020 for the evaluation of Hk-NN. Based on
TA classification accuracy for k-NN and Classifier evaluations; see column ACC
16,525 encoded TA comments in terms of DCs, CMs, and temperature Hk-NN in Table 2
and SC context data, we formed eight experimental setups. All setups Classification coverage for Hk-NN Classifier evaluations; see column
demonstrated the leading performance of Hk-NN. For example, Hk-NN Coverage in Table 2
outperformed k-NN by up to 0.062 with the accuracy of 0.917. It also Probability of misclassifying a TA with a Derived from confusion matrices for k-
CM action, investigation, monitoring, and NN and Hk-NN
yielded a balanced accuracy of 0.870 (0.066 higher than k-NN). Macro-
no action
average recall, precision, and specificity scores were also consistently
higher for Hk-NN. At the same time, for test samples, for which Hk-NN
abstained from predictions, accuracy of k-NN was close to random applications of Hk-NN may range from situations with a preference for
guessing and ranged between 0.455 and 0.627. Leading performance of abstinence from predictions over random guessing to the cases in which
Hk-NN was also backed up by additional evaluations on seven publicly the cost of (additional) manual processing is insignificant when
available ML data sets. compared to the criticality of a misclassification (i.e., predictive diag
Simulations for MKT stability demonstrated that—in comparison to nosis, loan risk prediction, predictive maintenance, etc.).
k-NN and manual processing—Hk-NN was associated with a lower fre However, despite more accurate predictions on DCs and CMs, Hk-NN
quency of MKT exceeding an upper threshold within TA processing time, still cannot be applied in some marginal cases. For example, it becomes
plus one hour prior to a deviation. The advantage increased linearly with less practical as the values of optimal k become small. For such scenarios
an increasing number of simultaneous TAs. Provided the container value that are less affected by noise, the search for a fixed-size radius becomes
of cooled pharmaceutical products ranges from a couple hundred impractical and DTs cannot be effectively controlled for overfitting.
thousand to millions of euros, the fewer cases of MKT exceeding a Similarly, if k-NN already offers a very good performance, which often
permissible threshold speak for significant potential monetary savings. speaks for a good data structure, the gains offered by Hk-NN may be
Motivated by computational and psychological considerations, we insignificant. Apart from these limitations, we see the potential of
also introduced the extension of Hk-NN to the SBP. Again, Hk-NN out extending our work in the following directions. First, further elabora
performed k-NN in terms of accuracy, and its extension to the SBP led to tions of voting policies are conceivable, namely, a study on the weighted
a significant accuracy gain for CMs (up to 0.207). We attributed this approach based on the share of data points in a k-NN neighborhood or a
increase to a higher class overlap in light of a more judgmental nature of DT leaf, and their distance to the test observation. One more direction is
CMs by alarm personnel. the use of a kernel function instead of a fixed-size radius that will allow
Consequently, in a narrower context of TA processing, Hk-NN is for a more flexible weighting of differently distant points (i.e., Parzen
capable of providing still interpretable and more reliable DS (in terms of windows). Although the currently used fixed radius can be regarded as a
used evaluation metrics and exceeded MKT) in comparison with k-NN univariate kernel, we believe that other functions could offer better
predictions. In this specific context, this DS potentially leads to reduced classification results (or at least for certain data generating processes).
TA examination times, more informed TA examinations, and ranking of
simultaneous TAs (depending on the recommended CM, TAs can be CRediT authorship contribution statement
presented in the order of their importance, as determined by the ne
cessity of human involvement). In a broader context, the proposed al Iurii Konovalenko: Conceptualization, Methodology, Software,
gorithm is beneficial for solving classification problems with higher Data curation, Writing – original draft, Visualization, Supervision,
misclassification costs. Thanks to its cautious voting mechanism that Project administration. André Ludwig: Conceptualization, Methodol
abstains from predictions in data regions with distant and/or heterog ogy, Data curation, Writing – original draft, Writing – review & editing.
enous samples, less reliable predictions can be identified, which is not
possible with traditional k-NN implementations. The suitable
15
Declaration of Competing Interest Dreiseitl, S., & Ohno-Machado, L. (2002). Logistic regression and artificial neural
network classification models: A methodology review. Journal of Biomedical
Informatics, 35(5–6), 352–359. https://doi.org/10.1016/S1532-0464(03)00034-0
The authors declare that they have no known competing financial Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern Classification ((2nd ed.).). Wiley-
interests or personal relationships that could have appeared to influence Interscience.
the work reported in this paper. Emenike, C. C., Van Eyk, N. P., & Hoffman, A. J. (2016). Improving Cold Chain Logistics
through RFID temperature sensing and Predictive Modelling. In Presented at the 2016
IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), IEEE
Appendix (pp. 2331–2338). https://doi.org/10.1109/ITSC.2016.7795932
European Commission. Directive 2001/83/EC of the European Parliament and of the
Council on the Community Code Relating to Medicinal Products for Human Use
(2001).
European Medicines Agency. Note for Guidance on Stability Testing: Stability Testing of
new Drug Substances and Products (2003). CPMP/ICH/2736/99.
Appendix A. Supplementary data Farid, D., Harbi, N., & Rahman, M. Z. (2010). Combining Naive Bayes and Decision Tree
for Adaptive Intrusion Detection. International Journal of Network Security & Its
Supplementary data to this article can be found online at https://doi. Applications, 2(2), 12–25. https://doi.org/10.5121/ijnsa10.5121/ijnsa:
2010.220010.5121/ijnsa:2010.2202
org/10.1016/j.eswa.2021.116208.
Farnaaz, N., & Jabbar, M. A. (2016). Random Forest Modeling for Network Intrusion
Detection System. Procedia Computer Science, 89, 213–217. https://doi.org/10.1016/
References j.procs.2016.06.047
Fayyad, U. M., & Irani, K. B. (1993). Multi-Interval Discretization of Continuous-Valued
Attributes for Classification Learning. In Presented at the International Joint Conference
Ahmim, A., Maglaras, L., Ferrag, M. A., Derdour, M., & Janicke, H. (2019). A Novel
on Artificial Intelligence (pp. 1022–1027).
Hierarchical Intrusion Detection System Based on Decision Tree and Rules-Based
Gaikwad, D. P., & Thool, R. C. (2015). Intrusion Detection System Using Bagging
Models. In Presented at the 2019 15th International Conference on Distributed Computing
Ensemble Method of Machine Learning. In Presented at the 2015 International
in Sensor Systems (DCOSS), IEEE (pp. 228–233). https://doi.org/10.1109/
Conference on Computing Communication Control and automation(ICCUBEA), IEEE (pp.
DCOSS.2019.00059
291–295). https://doi.org/10.1109/ICCUBEA.2015.61
Aldayel, M. S. (2012). K-Nearest Neighbor classification for glass identification problem.
Gómez, S. A., Goron, A., Groza, A., & Letia, I. A. (2016). Assuring safety in air traffic
In Presented at the 2012 International Conference on Computer Systems and Industrial
control systems with argumentation and model checking. Expert Systems with
Informatics (ICCSII), IEEE (pp. 1–5). https://doi.org/10.1109/ICCSII.2012.6454522
Applications, 44, 367–385. https://doi.org/10.1016/j.eswa.2015.09.027
Allach, S., Ahmed, M. B., & Boudhir, A. A. (2019). Detection of driver’s alertness level
Haan, G. H., Hillegersberg, J. V., de Jong, E., & Sikkel, K. (2013). Adoption of Wireless
based on the Viola and Jones method and logistic regression analysis. International
Sensors in Supply Chains: A Process View Analysis of a Pharmaceutical Cold Chain.
Journal of. Intelligent Enterprise, 6(2/3/4), 356. https://doi.org/10.1504/
Journal of Theoretical and Applied Electronic Commerce Research, 8(2), 138–154.
IJIE.2019.101135
https://doi.org/10.4067/S0718-18762013000200011
Altman, D. G., & Bland, J. M. (1994). Statistics Notes: Diagnostic tests 1: Sensitivity and
Hafliðason, T., Ólafsdóttir, G., Bogason, S., & Stefánsson, G. (2012). Criteria for
specificity. Bmj, 308(6943), 1552. https://doi.org/10.1136/bmj.308.6943.1552
temperature alerts in cod supply chains. International Journal of Physical Distribution
Ammann, C. (2011). Stability Studies Needed to Define the Handling and Transport
& Logistics Management, 42(4), 355–371. https://doi.org/10.1108/
Conditions of Sensitive Pharmaceutical or Biotechnological Products. AAPS
09600031211231335
PharmSciTech, 12(4), 1264–1275. https://doi.org/10.1208/s12249-011-9684-0
Harrou, F., Taghezouit, B., & Sun, Y. (2019). Improved kNN-Based Monitoring Schemes
Anuar, N. B., Sallehudin, H., Gani, A., & Zakaria, O. (2008). Identifying False Alarm for
for Detecting Faults in PV Systems. IEEE Journal of Photovoltaics, 9(3), 811–821.
Network Intrusion Detection System Using Hybrid Data Mining and Decision Tree.
https://doi.org/10.1109/JPHOTOV.550386910.1109/JPHOTOV.2019.2896652
Malaysian. Journal of Computer Science, 21(2), 101–115. https://doi.org/10.22452/
Haynes, J. D. (1971). Worldwide Virtual Temperatures for Product Stability Testing.
mjcs10.22452/mjcs.vol21no210.22452/mjcs.vol21no2.3
Journal of Pharmaceutical Sciences, 60(6), 927–929. https://doi.org/10.1002/
Arar, Ö. F., & Ayan, K. (2017). A feature dependent Naive Bayes approach and its
jps.2600600629
application to the software defect prediction problem. Applied Soft Computing, 59,
Hsieh, C.-Y., Huang, C.-N., Liu, K.-C., Chu, W.-C., & Chan, C.-T. (2016). A machine
197–209. https://doi.org/10.1016/j.asoc.2017.05.043
learning approach to fall detection algorithm using wearable sensor. In Presented at
Bentley, J. L., Stanat, D. F., & Williams, E. H., Jr. (1977). The complexity of finding fixed-
the 2016 International Conference on Advanced Materials for Science and Engineering
radius near neighbors. Information Processing Letters, 6(6), 209–212. https://doi.org/
(ICAMSE), IEEE (pp. 707–710). https://doi.org/10.1109/ICAMSE.2016.7840209
10.1016/0020-0190(77)90070-9
Ishimtsev, V., Bernstein, A., Burnaev, E., & Nazarov, I. (2017). Conformal k-NN Anomaly
Bergstra, J., & Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization. The
Detector for Univariate Data Streams. In Presented at the Sixth Workshop on Conformal
Journal of Machine Learning Research, 13(1), 281–305.
and Probabilistic Prediction and Applications (pp. 213–227).
Bhatia, N., & Ashev, V. (2010). Survey of Nearest Neighbor Techniques. International
Jadhav, S. D., & Channe, H. P. (2016). Comparative Study of K-NN, Naive Bayes and
Journal of Computer Science and Information Security, 8(2), 1–4.
Decision Tree Classification Techniques. International Journal of Science and Research
Bhatti, U. A., Huang, M., Wu, D., Zhang, Y., Mehmood, A., & Han, H. (2019).
(IJSR), 5(1), 1842–1845. http://doi.org/10.21275/v5i1.NOV153131.
Recommendation system using feature extraction and pattern recognition in clinical
Jokanovic, B., & Amin, M. (2017). Fall Detection Using Deep Learning in Range-Doppler
care systems. Enterprise Information Systems, 13(3), 329–351. https://doi.org/
Radars. IEEE Transactions on Aerospace and Electronic Systems, 54(1), 180–189.
10.1080/17517575.2018.1557256
https://doi.org/10.1109/TAES.2017.2740098
Bouneffouf, D., Bouzeghoub, A., & Ganarski, A. L. (2013). Risk-Aware Recommender
Kim, P.-S., & Kutzner, A. (2008). In Lecture Notes in Computer ScienceTheory and
Systems. In Neural Networks: Tricks of the Trade (Vol. 8226, pp. 57–65). Berlin,
Applications of Models of Computation (pp. 246–257). Berlin, Heidelberg: Springer
Heidelberg: Springer Berlin Heidelberg. http://doi.org/10.1007/978-3-642-42054-
Berlin Heidelberg.
2_8.
Kohavi, R., & Sahami, M. (1996). Error-based and entropy-based discretization of
Bouzekri, E., Canny, A., Fayollas, C., Martinie, C., Palanque, P., Barboni, E., et al. (2019).
continuous features (pp. 114–119). Presented at the Second International
Engineering issues related to the development of a recommender system in a critical
Conference on Knowledge Discovery and Data Mining, Portland, Oregon.
context: Application to interactive cockpits. International Journal of Human-Computer
Kumar, B. J., Naveen, H., Kumar, B. P., Sharma, S. S., & Villegas, J. (2017). Logistic
Studies, 121, 122–141. https://doi.org/10.1016/j.ijhcs.2018.05.001
regression for polymorphic malware detection using ANOVA F-test (pp. 1–5).
Brown, R. A. (2015). Building a Balanced k-d Tree in O(knlogn) Time. Journal of Computer
Presented at the 2017 4th International Conference on Innovations in Information,
Graphics Techniques, 4(1), 50–68.
Embedded and Communication Systems (ICIIECS), IEEE. 10.1109/
Chen, R.-Y. (2017). An intelligent value stream-based approach to collaboration of food
ICIIECS.2017.8275880.
traceability cyber physical system by fog computing. Food Control, 71, 124–136.
Kumar, P. M., & Devi Gandhi, U. (2018). A novel three-tier Internet of Things
https://doi.org/10.1016/j.foodcont.2016.06.042
architecture with machine learning algorithm for early detection of heart diseases.
Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to Algorithms
Computers & Electrical Engineering, 65, 222–235. https://doi.org/10.1016/j.
(Third). MIT Press.
compeleceng.2017.09.001
Cristianini, N., & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and
Kültür, Y., & Çağlayan, M. U. (2016). Hybrid approaches for detecting credit card fraud.
Other Kernel-based Learning Methods ((1st ed.,, pp. 128–131). Cambridge University
Expert Systems, 34(2), Article e12191. https://doi.org/10.1111/exsy.12191
Press.
Lang, W., Jedermann, R., Mrugala, D., Jabbari, A., Krieg-Brückner, B., & Schill, K.
Dao, A.-Q., Koltai, K., Cals, S. D., Brandt, S. L., Lachter, J., Matessa, M., et al. (2015).
(2011). The “Intelligent Container”—A Cognitive Sensor Network for Transport
Evaluation of a Recommender System for Single Pilot Operations. Procedia
Management. IEEE Sensors Journal, 11(3), 688–698. https://doi.org/10.1109/
Manufacturing, 3, 3070–3077. https://doi.org/10.1016/j.promfg.2015.07.853
JSEN.2010.2060480
De Luca, A., & Termini, S. (1972). A definition of a nonprobabilistic entropy in the setting
Leng, K., Jin, L., Shi, W., & Van Nieuwenhuyse, I. (2018). Research on agricultural
of fuzzy sets theory. Information and Control, 20(4), 301–312. https://doi.org/
products supply chain inspection system based on internet of things. Cluster
10.1016/S0019-9958(72)90199-4
Computing, 22(S4), 8919–8927. https://doi.org/10.1007/s10586-018-2021-6
Derczynski, L. (2016). Complementarity, F-score, and NLP Evaluation. Presented at the
Li, L., Yu, Y., Bai, S., Hou, Y., & Chen, X. (2017). An Effective Two-Step Intrusion
International Conference on Language Resources and Evaluation.
Detection Approach Based on Binary Classification and k -NN. IEEE Access, 6,
Domeniconi, C., Peng, J., & Gunopulos, D. (2001). An Adaptive Metric Machine for
12060–12073. https://doi.org/10.1109/ACCESS.2017.2787719
Pattern Classification. In Presented at the Thirteenth International Conference on Neural
Information Processing Systems (pp. 437–443).
16
Liu, J., Higgins, A., & Tan, Y.-H. (2010). IT enabled redesign of export procedure for Antennas and Propagation, 67(10), 6612–6626. https://doi.org/10.1109/
high-value pharmaceutical product under temperature control: the case of drug TAP.810.1109/TAP.2019.2921150
living lab (pp. 1–18). Presented at the Annual International Conference on Digital Shang, J., & Chen, T. (2020). Early Classification of Alarm Floods via Exponentially
Government Research, Public Administration Online Challenges and Opportunities, Attenuated Component Analysis. IEEE Transactions on Industrial Electronics, 67(10),
Puebla, Mexico. 8702–8712. https://doi.org/10.1109/TIE.4110.1109/TIE.2019.2949542
Matthias, D. M., Robertson, J., Garrison, M. M., Newland, S., & Nelson, C. (2007). Su, M.-Y. (2011a). Real-time anomaly detection systems for Denial-of-Service attacks by
Freezing temperatures in the vaccine cold chain: A systematic literature review. weighted k-nearest-neighbor classifiers. Expert Systems with Applications, 38(4),
Vaccine, 25(20), 3980–3986. https://doi.org/10.1016/j.vaccine.2007.02.052 3492–3498. https://doi.org/10.1016/j.eswa.2010.08.137
Mondal, S., Wijewardena, K. P., Karuppuswami, S., Kriti, N., Kumar, D., & Chahal, P. Su, M.-Y. (2011b). Using clustering to improve the KNN-based classifiers for online
(2019). Blockchain Inspired RFID-Based Information Architecture for Food Supply anomaly network traffic identification. Journal of Network and Computer Applications,
Chain. IEEE Internet of Things Journal, 6(3), 5803–5813. https://doi.org/10.1109/ 34(2), 722–730. https://doi.org/10.1016/j.jnca.2010.10.009
JIoT.648890710.1109/JIOT.2019.2907658 Swarnkar, M., & Hubballi, N. (2016). OCPAD: One class Naive Bayes classifier for
Mousheimish, R., Taher, Y., & Zeitouni, K. (2016). Automatic learning of predictive rules payload based anomaly detection. Expert Systems with Applications, 64, 330–339.
for complex event processing (pp. 414–417). Presented at the The 10th ACM https://doi.org/10.1016/j.eswa.2016.07.036
International Conference on Distributed Event-Based Systems, New York, New York, Thakur, M., & Forås, E. (2015). EPCIS based online temperature monitoring and
USA: ACM Press. 10.1145/2933267.2933430. traceability in a cold meat chain. Computers and Electronics in Agriculture, 117, 22–30.
Mousheimish, R., Taher, Y., & Zeitouni, K. (2017). In Automatic Learning of Predictive CEP https://doi.org/10.1016/j.compag.2015.07.006
Rules (pp. 158–169). New York, New York, USA: ACM Press. https://doi.org/ Tsang, Y. P., Choy, K. L., Wu, C. H., Ho, G. T. S., Lam, C. H. Y., & Koo, P. S. (2018). An
10.1145/3093742.3093917. Internet of Things (IoT)-based risk monitoring system for managing cold supply
Mower, J. P. (2005). PREP-Mt: Predictive RNA editor for plant mitochondrial genes. BMC chain risks. Industrial Management & Data Systems, 118(7), 1432–1462. https://doi.
Bioinformatics, 6(1), 96–115. https://doi.org/10.1186/1471-2105-6-96 org/10.1108/IMDS-09-2017-0384
Naderpour, M., Lu, J., & Zhang, G. (2014). An intelligent situation awareness support Tu, M., Lim, M. K., & Yang, M.-F. (2018). IoT-based production logistics and supply chain
system for safety-critical environments. Decision Support Systems, 59, 325–340. system – Part 1. Industrial Management & Data Systems, 118(1), 65–95. https://doi.
https://doi.org/10.1016/j.dss.2014.01.004 org/10.1108/IMDS-11-2016-0503
Nawaz, F., Janjua, N. K., & Hussain, O. K. (2019). PERCEPTUS: Predictive complex event Ulrich, T. A., Lew, R., Poresky, C. M., Rice, B. C., Thomas, K. D., & Boring, R. L. (2017).
processing and reasoning for IoT-enabled supply chain. Knowledge-Based Systems, Operator-in-the-Loop Study for a Computerized Operator Support System (COSS). Cross-
180, 133–146. https://doi.org/10.1016/j.knosys.2019.05.024 System and System-Independent Evaluations.
Om, H., & Kundu, A. (2012). A hybrid system for reducing the false alarm rate of Wang, Y. (2005). A multinomial logistic regression modeling approach for anomaly
anomaly intrusion detection system. In Presented at the 2012 1st International intrusion detection. Computers & Security, 24(8), 662–674. https://doi.org/10.1016/
Conference on Recent Advances in Information Technology (RAIT), IEEE (pp. 131–136). j.cose.2005.05.003
https://doi.org/10.1109/RAIT.2012.6194493 Wattanakul, S., Henry, S., Bentaha, L., Reeveerakul, N., & Ouzrout, Y. (2017). Improving
Pal, A., & Kant, K. (2019). Internet of Perishable Logistics: Building Smart Fresh Food risk management by using smart containers for real-time traceability (pp. 1–8).
Supply Chain Networks. IEEE Access, 7, 17675–17695. https://doi.org/10.1109/ Presented at the International Conference on Logistics and Transport, Bangkok,
ACCESS.2019.2894126 Thailand.
Peng, K., Leung, V. C. M., Zheng, L., Wang, S., Huang, C., & Lin, T. (2018). Intrusion West, D., Mangiameli, P., Rampal, R., & West, V. (2005). Ensemble strategies for a
Detection System Based on Decision Tree over Big Data in Fog Environment. Wireless medical diagnostic decision support system: A breast cancer diagnosis application.
Communications and Mobile Computing, 2018(5), 1–10. https://doi.org/10.1155/ European Journal of Operational Research, 162(2), 532–551. https://doi.org/
2018/4680867 10.1016/j.ejor.2003.10.013
Pilarski, M. G. (2014). The Concept of Recommender System Supporting Command and World Health Organization. (2003). Guide to good storage practices for pharmaceuticals
Control System in Hierarchical Organization. In Presented at the 2014 European (No. 908). WHO Technical Report Series (p. 12). World Health Organization.
Network Intelligence Conference (ENIC), IEEE (pp. 138–141). https://doi.org/ Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., et al. (2008). Top
10.1109/ENIC.2014.9 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1–37.
Ricci, F., Rokach, L., & Shapira, B. (Eds.). (2015). Recommender Systems Handbook. https://doi.org/10.1007/s10115-007-0114-2
Boston, MA: Springer US. Yang, K., Botero, U., Shen, H., Woodard, D. L., Forte, D., & Tehranipoor, M. M. (2018).
Ross, T. J. (2010). Fuzzy Logic with Engineering Applications ((3rd ed.).). Wiley. UCR: An Unclonable Environmentally Sensitive Chipless RFID Tag For Protecting
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation Supply Chain. ACM Transactions on Design Automation of Electronic Systems, 23(6),
of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. 1–24. https://doi.org/10.1145/3264658
https://doi.org/10.1016/0377-0427(87)90125-7 Yeh, I.-C., & Lien, C.-H. (2009). The comparisons of data mining techniques for the
Ruiz, D., Berenguer, V., Soriano, A., & Sánchez, B. (2011). A decision support system for predictive accuracy of probability of default of credit card clients. Expert Systems with
the diagnosis of melanoma: A comparative approach. Expert Systems with Applications, 36(2), 2473–2480. https://doi.org/10.1016/j.eswa.2007.12.020
Applications, 38(12), 15217–15223. https://doi.org/10.1016/j.eswa.2011.05.079 Zakeri, A., Saberi, M., Hussain, O. K., & Chang, E. (2018). An Early Detection System for
Schaefer, R. (2019). How to become CEIV Pharma Certified (p. (p. 129).). Internatinal Air Proactive Management of Raw Milk Quality: An Australian Case Study. IEEE Access,
Transport Association. 6, 64333–64349. https://doi.org/10.1109/ACCESS.2018.2877970
Serdarasan, S., & Tanyas, M. (2012). Dealing with Complexity in the Supply Chain: The Zhang, Y., Wang, W., Yan, L., Glamuzina, B., & Zhang, X. (2019). Development and
Effect of Supply Chain Management Initiatives. SSRN Electronic Journal. https://doi. evaluation of an intelligent traceability system for waterless live fish transportation.
org/10.2139/ssrn.2056331 Food Control, 95, 283–297. https://doi.org/10.1016/j.foodcont.2018.08.018
Shafiq, Y., Gibson, J. S., Kim, H., Ambulo, C. P., Ware, T. H., & Georgakopoulos, S. V. Zhu, W., Sun, W., & Romagnoli, J. (2018). Adaptive k-Nearest-Neighbor Method for
(2019). A Reusable Battery-Free RFID Temperature Sensor. IEEE Transactions on Process Monitoring. Industrial & Engineering Chemistry Research, 57(7), 2574–2586.
https://doi.org/10.1021/acs.iecr.7b0377110.1021/acs.iecr.7b03771.s002
17

1 s2.0 S0957417421015220 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0957417421015220 Main

Uploaded by

Copyright:

Available Formats

Expert Systems With Applications 190 (2022) 116208

Contents lists available at ScienceDirect

Expert Systems With Applications

Generating decision support for alarm processing in cold supply chains

1. Introduction sensitive pharmaceuticals projected to exceed $416 billion by 2022,

the personnel involved in TA processing to expedite their decision- 2. Related work

Fig. 1. Limitations of k-NN algorithm (k = 15).

motivated by the limitations of k-NN in the scenario with higher

3.1. New hybrid k-NN algorithm

In the following subsections, we first consider the limitations of k-NN

Fig. 3. Illustrative examples of voting outcomes of the Hk-NN (k = 15).

Fig. 4. Cold air SC.

(handling, preconditioning, delivery, or high temperature registered by an

Fig. 6. Example of a possible linear piecewise membership function.

each curve based on its membership in a cluster. As long as the cluster

4.2. Evaluation on publicly available data sets

In the previous section, we have seen a consistent performance

Similar to the findings in Section 4.1, Hk-NN provides a consistent

You might also like