Professional Documents
Culture Documents
1 s2.0 S0957417421015220 Main
1 s2.0 S0957417421015220 Main
A R T I C L E I N F O A B S T R A C T
Keywords: Real-time temperature monitoring is necessary in cold pharmaceutical supply chains (SCs), where exposures to
k-nearest neighbors extreme temperatures can lead to product quality deterioration. Temperature alarms (TAs) triggered by the
Fuzzy set current rule-based systems still require lengthy examinations before a suitable corrective measure (CM) can be
Recommendation
chosen. However, provision of additional information relevant to TAs can expedite the examination process.
Decision Support
Pharmaceutical supply chain
In the related areas of recommender systems and false alarm/anomaly detection, k-nearest neighbors (k-NN)
Temperature deviation algorithm has proven to be successful because of its interpretability and ease of use. However, in the context of
TA processing, it may suffer from some inherent limitations (i.e., varying neighborhood radius, unreliable
classifications in sparse and noisy regions, and blindness to natural class boundaries). To overcome these limi
tations, we propose a hybrid k-NN (Hk-NN) algorithm based on the principles of local similarity and neigh
borhood homogeneity. It incorporates a two-step voting procedure with an entropy-optimized k-NN radius,
decision trees with k-constrained leaves, and nearest neighbor predictions.
We investigate 16,525 comments by alarm personnel for TAs in a pharmaceutical SC and encode them in terms
of deviation causes and CMs (target features). We use SC data on cargo location, SC phase, sensor role, and
temperature characteristics as predictor features for TA similarity estimation. In eight experimental setups, Hk-
NN consistently outperforms k-NN with an optimized k in terms of accuracy, balanced accuracy, macro-average
precision, recall, and specificity. At the same time, Hk-NN refrains from predicting observations, for which k-
NN’s accuracy is close to a random guess.
* Corresponding author.
E-mail addresses: iurii.konovalenko@the-klu.org (I. Konovalenko), andre.ludwig@the-klu.org (A. Ludwig).
https://doi.org/10.1016/j.eswa.2021.116208
Received 7 September 2020; Received in revised form 21 December 2020; Accepted 6 November 2021
Available online 17 November 2021
0957-4174/© 2021 Elsevier Ltd. All rights reserved.
I. Konovalenko and A. Ludwig Expert Systems With Applications 190 (2022) 116208
2
I. Konovalenko and A. Ludwig Expert Systems With Applications 190 (2022) 116208
2.2. Alarm and anomaly detection industry, and proposes solutions that are either quite specific to a certain
problem or provide only general guidelines on the engineering of such
Machine learning (ML) algorithms have been widely used in the systems. For example, Bouneffouf, Bouzeghoub, & Ganarski (2013)
context of detection of anomalies and classification of alarms (e.g., proposed an algorithm that considers the risk level of a user situation
Shang & Chen, 2020; Su, 2011a; 2011b), and have assisted in manual and adaptively balances an exploration – exploitation trade-off in
processing of overwhelming sensor data. This practice indicates their context-aware recommender systems. Naderpour, Lu, & Zhang (2014)
potential application in other areas with similar prediction settings. also worked with situation-aware systems and presented a cognition-
In the abovementioned context, it is especially important to be able driven DS system to help manage abnormal situations in safe
to back-engineer the classification logic of an algorithm for interpreting ty–critical environments. Evaluation results of a situation-aware
the classification output. In the extant research, however, black-box computerized operator support system in nuclear plants that helps
models have also enjoyed a wide-spread implementation, in partic detect faults earlier than with conventional control room technologies
ular, when a higher classification accuracy was an important goal in was provided by (Ulrich et al., 2017).
model building. In this section below, we provide an account of areas Dao et al. (2015) discussed the quality of a system developed to help
where various ML algorithms were successfully implemented for alarm the pilot of a distressed aircraft choose a diversionary airport. In a
and anomaly detection. We start our overview with interpretable models similar field, Gómez et al. (2016) illustrated a recommender framework
and move on to the application of black-box algorithms. for assisting flight controllers. The framework combines model checking
Interpretable models are represented in the literature by decision and argumentation theory in the evaluation of trade-offs under the
trees (DTs), k-NN, logistic regression, and naïve Bayes. For instance, DTs condition of incomplete and inconsistent information. Pilarski (2014)
were applied in the intrusion detection context, and were shown to worked on a more general solution and elaborated on a recommender
achieve a reduction in false positives (Anuar, Sallehudin, Gani, & system concept that can support decision making in a hierarchical mil
Zakaria, 2008). The algorithm was also used for solving the same itary organization.
problem over Big Data in a fog environment (K. Peng et al., 2018), and in
combination with rule-based models (Ahmim, Maglaras, Ferrag, Der 2.4. Research gap
dour, & Janicke, 2019). k-NN was successfully applied to fall detection
(Hsieh et al., 2016), identification of univariate time-series anomalies Although the research presented in the previous sections boasts of
(Ishimtsev, Bernstein, Burnaev, & Nazarov, 2017), network intrusion many helpful developments, we still see the need for further work that
detection (Li, Yu, Bai, Hou, & Chen, 2017), industrial fault detection would enable DS generation in TA processing. First, literature on tem
(Zhu, Sun, & Romagnoli, 2018), identification of faults in photovoltaic perature monitoring, in particular sensor technologies or event-driven
systems (Harrou, Taghezouit, & Sun, 2019), and early classification of solutions, enable data collection for further decision making. Such
alarm floods (Shang & Chen, 2020). literature still fails to provide any solutions for informative DS in tem
Logistic regression was successfully used for polymorphic malware perature monitoring. Although rule-based solutions are also potential
detection and achieved the accuracy of 0.977 (B. J. Kumar, Naveen, sources of DS (e.g., provision of rules that trigger an action are a good
Kumar, Sharma, & Villegas, 2017). The algorithm also fared well in the basis for making better-informed decisions), such systems are still very
early detection of health issues (P. M. Kumar & Devi Gandhi, 2018) and static, inflexible, and labor-intensive in terms of rule base maintenance
driver’s alertness (Allach, Ahmed, & Boudhir, 2019). Naïve Bayes (Mousheimish et al., 2016; 2017). Moreover, any variation in specified
combined with k-means fared well in the reduction of false alarm rate in rules leads to completely different system behavior and alarm processing
intrusion detection (Om & Kundu, 2012), detection of payload-based results (Hafliðason, Ólafsdóttir, Bogason, & Stefánsson, 2012).
anomalies (Swarnkar & Hubballi, 2016), and software defect predic Second, alarm classification literature that relies on easily inter
tion (Arar & Ayan, 2017). pretable models has so far focused on classification settings with two
Hardly interpretable models were represented by ensemble or hybrid classes with highly imbalanced data (e.g., intrusion detection). In this
algorithms and deep neural networks. For example, Farnaaz & Jabbar case, false positives may not pose a big problem and the application of
(2016) applied a random forest classifier for network intrusion detec interpretable models without significant modifications is justifiable. A
tion. A system for the same purpose using bootstrap aggregating was multi-class setting, in which the identification of each class is relevant,
built by (Gaikwad & Thool, 2015). A hybrid classifier for credit card necessitates an investigation into the suitability of off-the-shelf algo
fraud detection based on DTs, random forest, Bayesian network, naïve rithms for the task.
Bayes, and support vector machine in three voting scenarios was Finally, the literature on critical recommender systems has so far
composed by (Kültür & Çağlayan, 2016). Finally, Jokanovic & Amin been relatively field- and problem-specific, and does not allow for a
(2017) solved the problem of fall detection using deep learning in range- smooth transfer to other fields. At the same time, in the literature on
Doppler radars. traditional collaborative recommender systems, where k-NN is widely
In light of the importance of prediction interpretability in our TA used for item- or user-based recommendations, the desired serendipity
processing setting, and given the previous success of interpretable and novelty of predictions incapacitate the development of analogous
models in alarm and anomaly detection tasks, this study pursues the models for settings with higher misclassification costs (Ricci, Rokach, &
development of a solution that makes it possible for a user to understand Shapira, 2015).
the final prediction. For this reason, we will consider the deployment of Poor informative qualities of the current temperature monitoring
interpretable models as components of our proposed solution (e.g., k- solutions, a focus on rare single-class predictions in alarm detection
NN, nearest neighbor, and DT predictions). literature, and a rather problem-specific nature of existing critical
recommender systems urge us to reconsider the applicability of avail
2.3. Recommender systems in critical industries able research to the context of DS generation in TA processing.
Convinced by the successful applicability of k-NN in alarm detection and
Recommender systems in critical industries are a special type of in recommender systems, as well as the interpretability of its prediction
interactive systems that combine the elements of critical systems and results, we propose certain adjustments to the algorithm that are
interactive systems engineering to provide DS in settings with high necessary in our prediction scenario with higher misclassification costs.
safety or security requirements. Critical recommender systems originate
from widely popular e-commerce collaborative recommender systems, 3. Material and methods
and their engineering is still in its infancy (Bouzekri et al., 2019).
Extant research primarily suits the needs of aviation or military In this section, we focus on the new hybrid classification algorithm
3
I. Konovalenko and A. Ludwig Expert Systems With Applications 190 (2022) 116208
3.1.1. Limitations of k-NN Fig. 2. Accuracy rate and radius size of neighborhood.
As mentioned in the previous section (2.4), k-NN was successfully
used in alarm classification settings, which is similar to the context of DS observation. The radius of search (ball or hypersphere) is increased till it
generation for TA processing. Moreover, the algorithm was also applied contains k neighbors. In locally dense regions, it should not pose any
to engineer critical recommender systems, specifically in healthcare problems, since relatively similar observations would eventually
(Bhatti et al., 2019; Ruiz et al., 2011; D. West et al., 2005). The popu participate in voting. In locally sparse regions, or regions with heter
larity of this algorithm is largely attributed to its advantages. In ogenous density, it leads to certain situations in which remoter and more
particular, k-NN is easy to train, is effective for large training data sets, dissimilar samples influence voting. At the same time, although a true
and is robust to noisy training data (Bhatia & Ashev, 2010; Jadhav & class may equal the class of the closest neighbors, the voting outcome for
Channe, 2016). Its results are also easy to interpret on a case-to-case k-NN may still differ from it because of a denser representation of a
basis in contrast to black-box models (Dreiseitl & Ohno-Machado, neighboring class (see Fig. 1a for an illustration).
2002), and its application does not require the establishment of a pre To demonstrate how the changeable radius of k-NN can affect pre
dictive model before classification (Yeh & Lien, 2009). dictions, we show how accuracy changes with a growing size of a k-NN
It the context of recommender systems, where novelty and seren hypersphere, with one of our datasets as an example (see Fig. 2. For this
dipity of recommendations are welcome (Ricci et al., 2015), some lim purpose, we first find an optimal number of nearest neighbors (k) via
itations of k-NN that are critical in settings with higher costs of grid search; then, we run a classification with a leave-one-out cross-
misclassifications have not yet been addressed. Such limitations can also validation and associate each prediction with the size of k-NN radius.
affect the quality of predictions in the context of DS generation for TA Finally, we calculate a cumulative accuracy with respect to the
processing. increasing radius. We observe that the accuracy is higher for smaller
Our exploration of multiple special cases of k-NN classifications for radius sizes, and drops with increasing radius sizes. Finally, it is at its
various data structures allows us to derive three main challenges that k- lowest, and remains almost unchanged for the largest radius sizes. This
NN may face: a) radius search restricted by a certain number of neigh observation indicates that keeping the neighborhood boundary for k-NN
bors, but no other distance metric (for example, this issue may become changeable may affect classification reliability where only a few or no
more evident for large k values (e.g., (Aldayel, 2012; Domeniconi, Peng, neighbors lie close to a new sample. In the context of DS generation for
& Gunopulos, 2001; Wu et al., 2008))); b) potentially biased predictions TA processing, this may turn out to be a critical limitation, especially for
close to natural class boundaries; c) unreliable predictions in noisy and new TAs associated with a few similar past deviations.
sparse data regions (in particular, for smaller k (e.g., (Aldayel, 2012; Wu Secondly, k-NN is blind to class boundaries. If a new observation lies
et al., 2008))). Fig. 1 illustrates these limitations for a scenario with 15 close to a class boundary and its true class is represented by fewer
neighbors. The remainder of this section is devoted to a detailed training observations than a competing class, then such a training point
description of identified k-NN limitations and how they are addressed in can be misclassified by k-NN (e.g., see Fig. 1b. Coupled with the limi
our Hk-NN algorithm. tation in Fig. 1a, this blindness to class boundaries can lead to
First, k-NN looks for the nearest k neighbors in the vicinity of a test
4
I. Konovalenko and A. Ludwig Expert Systems With Applications 190 (2022) 116208
misclassifications even if a new observation lies well within a true class for an optimal number of nearest neighbors, and a fixed size of the
boundary. For this to happen, a competing class should be represented neighbor search radius (radiusopt). For subsequent evaluation setups, we
by a denser region in the k-NN hypersphere. Fig. 1b shows that the find the k parameter via random search with a 10-fold cross-validation
application of a classifier in constructing a decision boundary (in this (Bergstra & Bengio, 2012). For the estimation of a fixed neighborhood
particular case, DTs) can help provide an additional criterion in making radius, we run a classification with a leave-one-out cross-validation on
a classification decision for problematic regions. the training data and, just as in the data for Fig. 2, calculate the search
Lastly, k-NN does not have any mechanism for handling problematic radius associated with each prediction. We then sort the data by radius
classifications, that is, in regions with much noise or where decision size and use the latter with a target variable for prediction success (zero
boundaries of multiple classes meet and overlap. Such problematic re for misclassifications and unit for correct classifications) to derive in
gions can also be represented by heterogenous neighborhoods with tervals with maximum information gains based on the minimal
distant nearest neighbors, where classifications cannot be reliable. For description length principle (Fayyad & Irani, 1993). The cut point
an illustrative example, see Fig. 1c. In such cases, either abstaining from marking the end of the first interval is chosen as radiusopt, since it offers a
classification or relying on other insights from the local data structure higher accuracy than the following cut points (if multiples are found).
would be more helpful. Consideration of the labels of 15 nearest Following our first principle of local similarity, we first make a
neighbors that works best globally on training data can fail in such prediction for a test sample with a radius neighbors classifier that con
problematic neighborhoods. siders a fixed neighborhood radius (lines 3 through 6). This prediction
should help deal with the limitation illustrated in Fig. 1a. Situations
3.1.2. Description of Hk-NN algorithm where such fixed search radius has no neighbors is unavoidable, irre
Obviously, one can identify other limitations of k-NN for specific spective of the data structure. When no prediction is possible (RNpred is
data structures or in light of computational requirements (e.g., memory assigned the value “null”), we deal with this in the second step later. To
requirements, sensitivity to redundant attributes (Aldayel, 2012)). fulfill the second principle of neighborhood homogeneity, we make two
However, we focus on the ones mentioned above (Section 3.1.1) and other predictions with k-NN and DTs (lines 7 and 8). Thus, we do not
which are particularly important for predictions with high misclassifi confine ourselves to the notion of neighborhood extending equidistantly
cation costs. To overcome these limitations, we propose a hybrid algo from a test observation to capture a possible limitation illustrated in
rithm based on the principles of local similarity and neighborhood Fig. 1b. At the same time, we constrain DTs with a parameter k for the
homogeneity. The first principle rests on the assumption that (most) maximum size of a terminal node to avoid deep DTs prone to overfitting
training samples in the close vicinity of a new observation share the and to ensure that a k-NN contributes to the final prediction on an equal
same class label with it (as suggested by the relationship between the footing with DTs. Although decision boundaries of DTs are axis-aligned
radius size of k-NN and the accuracy). The second principle serves as a and cannot guarantee a complete class purity in a leaf because of the k
(final) confirmation or dismissal of a prediction based on the first constraint, they are still easily interpretable and, for a smaller k, can
principle. This second principle requires that the neighborhood of a new isolate very local concentrations of data points belonging to one
observation be homogenous, that is, the class in the closest vicinity of a class.
new observation coincides with a prevailing class in the neighborhood. As mentioned in the previous paragraph, a fixed search radius can
Here, the neighborhood notion is not necessarily defined as the k-NN contain no neighbors, especially in the areas with sparser data. In the
hypersphere. We provide the procedure for our hybrid algorithm, context of TA processing, this is possible for deviations with extreme
hereafter referred to as the Hk-NN, in Algorithm 1. conditions. In such cases, we relax the first principle to the nearest
For input, we need training (Xhist) and test data (Xnew), parameter k neighbor prediction that lies outside the fixed search radius, but is still
5
I. Konovalenko and A. Ludwig Expert Systems With Applications 190 (2022) 116208
the most similar data point to a test sample (line 9). the predictions match (line 16). Otherwise, we abstain from prediction
In the first step of voting, we consider the predictions of k-NN and in the second step (line 19).
DTs, and a radius neighbors classifier. Both conditions are fulfilled if a In Fig. 3, we show the possible voting outcomes of Hk-NN. For
prediction by a radius neighbors classifier coincides either with the example, in situations where a fixed search radius is not empty (Fig. 3a,
prediction of k-NN, DTs, or both (lines 11 and 12). In this case, we do not b, e, a new observation is classified either in conformance with a pre
proceed to the second step of voting and predict the class upon which at diction of k-NN and/or DTs. Otherwise, a decision to abstain from pre
least two classifiers agree. If the fixed search radius is not empty and the diction is made if a noisy heterogenous region is identified (Fig. 3e. For
prediction of a radius neighbors classifier coincides with neither that of cases with an empty fixed search radius (Fig. 3c, d, f, the nearest
k-NN nor that of DTs, then we have strong reasons to believe that neighbor of a test observation can agree with a prediction by k-NN and/
training samples within a fixed radius can either represent noise and/or or DTs (Fig. 3c, d, or abstain from classification if a problematic region is
hint at a rare deviation that should be addressed with caution. In this signaled by the lack of prediction agreement (Fig. 3f.
case, we do not proceed to the second step of the voting procedure; we
prefer to abstain from prediction and delegate the final decision to an 3.1.3. Time complexity of Hk-NN
alarm employee (line 14). In this section, we elaborate on the time complexity of Hk-NN
If the fixed search radius is empty, we proceed to the second voting considering the worst asymptomatic performance of the component al
step with a relaxed restriction for the local similarity principle. We use gorithms (k-NN, DT, radius neighbors classifier, and nearest neighbor
the nearest neighbor prediction instead of a radius neighbors prediction classifier) and procedures (numerical sorting and entropy-based dis
and look for an agreement with the k-NN or DTs (lines 15 through 19). cretization for a fixed radius calculation). However, we do not include
This situation is common in sparser regions with a higher distance be the search for an optimal k for k-NN into the overall time complexity,
tween data points. In this case, we believe that if the most similar since it heavily depends on the procedure used for it (i.e., random or grid
training observation coincides with a prevailing class in the neighbor search), the number of folds in the cross-validation procedure, and
hood of a test observation, we can still rely on this prediction despite finally the search domain of candidate values for k.
local sparsity. If such a nearest neighbor lies in or close to the DT leaf As input, Hk-NN requires, among other things, the parameter radi
where a new observation is, or corresponds to, the plurality vote within usopt (see Algorithm 1). The procedure for its calculation is described in
a k-NN hypersphere, we believe that it fulfills both conditions as long as paragraph 2 of Section 3.1.2. Time complexity of this procedure
6
I. Konovalenko and A. Ludwig Expert Systems With Applications 190 (2022) 116208
( )
amounts to O snlogn +n2 +kn , where s is the number of splits for in 3.2. Used data and its preparation
tervals identified in the course of the minimum description length pro
cedure (Fayyad & Irani, 1993). Given an optimal value of k and n (the In the following subsections, we first describe a real-world cold SC
number of samples in a training set), running a classification with a scenario, in which TAs are currently processed manually, and from
leave-one-out cross-validation would require O(kn) computations. which the data for the evaluation of our algorithm are generated and
Sorting a list of n samples in the ascending order by the size of k-NN collected. Then, we focus on the preparation and representation of data
( )
radius would require O n2 computations in the worst case. Finally, in the feature generation subsection.
each split considered in the minimum description length procedure
takes O(nlogn) time. If the procedure finally outputs s thresholds, at most 3.2.1. Cold SC scenario
2 s + 1 computations are done, which results in the complexity of Air pharmaceutical SC is a special case of a cold SC, where the
O(snlogn) (Kohavi & Sahami, 1996). maintenance of a permissible temperature regime is just as important as
In an ensuing two-step voting procedure, predictions by k-NN, DT, in other modes or for other perishables, but where speedy transportation
radius neighbors classifier, and nearest neighbor are used. Using these and delivery are prioritized. We have temperature monitoring and TA
classifiers for predictions requires additional computations. For processing data at our disposal from a large international LSP rendering
example, for the number of features m, k-NN has a time complexity of temperature-controlled air transportation services to over 100 phar
O(mnk) (computing distances would require O(m) time and looping maceuticals manufacturers in four continents. As of the time of our
though the training set for k neighbors would take O(nk) time). Growing research, the LSP has collected over 20 million temperature measure
( )
an DT would require the complexity of O mn2 in the worst case (here, ments from 70,874 sensor devices.
A simplified and a straightforward view of the LSP’s cold air SC is
we consider critically unbalanced DTs that split data into partitions of 1
illustrated in Fig. 4. A starting point of a cold SC is in the premises of a
and n – 1 in each node). Calculation of the time complexity for nearest
manufacturer, where pharmaceuticals are packed with proper insulation
neighbor classier is analogous to k-NN for k = 1, i.e. O(mn). Radius
( ) and stored until they are picked up. Upon pick-up by the freight
neighbors classifier would require O mn2 computations in the worst
( 2) forwarder, they are either shipped (by truck) to the airport facilities or to
case, i.e. O n for distance calculations (Bentley, Stanat, & Williams, the LSP facilities, where consolidation, deconsolidation, repackaging,
1977) multiplied by the number of features. and temporary storage are performed prior to customs processing and
Correspondingly, the upper bound of time complexity for training flight. Following one more leg of road transportation to the airport fa
( )
Hk-NN would be equal to O snlogn +mn +kn +kmn +n2 +mn2 . It cilities, customs clearance and the subsequent security checks happen
should be noted, however, that the required computations reflect rather before a shipment can be forwarded to the temperature-controlled
pessimistic scenarios and do not consider more efficient implementa storage facilities at the airport of origin. After the receipt of a ship
tions of some algorithms / procedures that have been proposed in the ment booking list, cargo is collected at storage facilities and moved to
extant literature. For example, numerical sorting preceding the entropy- the hold area for ramp transport, and then to parking position, where it
based discretization procedure can be performed by block merge sort is loaded on a flight as per the load plan.
algorithm within O(nlogn) or, in the best case, within O(n) time (Kim & After the flight arrival at the destination airport, the cargo is
Kutzner, 2008). Using a k-d tree space-partitioning data structure, unloaded and moved to appropriate storage facilities, and to appropriate
nearest neighbors’ search can be reduced from O(n) to O(logn) (Brown, locations for transfer to another carrier in the case of transit; or if the
2015). Similarly, k-NN’s O(mnk) can be simplified to O(mn +kn) if the cargo arrives at the final destination airport, the freight forwarder is
implementation computes and stores distances, which requires on the notified and the cargo is stored for import. Following customs clearance,
other hand the additional space complexity of O(n) . Finally, if we the cargo is prepared for handover to the freight forwarder, who collects
consider a less pessimistic case of balanced binary splitting DTs, their and transports it to the premises of a consignee. In a real-world scenario,
growing complexity can be reduced to O(mnlogn) (Cormen, Leiserson, air SCs are usually more complex and may contain multiple layovers and
Rivest, & Stein, 2009). Higher values of k for the minimum number of transshipments, where the above-described operations are repeated.
samples in a leaf could eventually constrain the depth of DT even below In the entire course of storage, transportation, and handling opera
logn. tions, correct temperatures should be maintained to guarantee the
quality and safety of use of pharmaceuticals. For this reason, constant
7
I. Konovalenko and A. Ludwig Expert Systems With Applications 190 (2022) 116208
temperature monitoring is ensured with the help of sensing devices that Table 1
transmit measurement event data in real-time. Sensors can already be Sample TA comments and their corresponding encoding.
attached to the shipment on the manufacturer’s premises and assigned TA comment DC CM
(activated) in the monitoring system of the LSP for ongoing measure
“[…] received this alarm and issue resolution in NA action
ment collection. Additional sensors are attached by the LSP to its own progress. Shipment is now in SHA carrier
packaging, doors, and/or walls of a refrigerated container. They build a warehouse. We are confirming with airline to
sensor mesh and transmit the event data via a dedicated gate. The data ensure they have put our cargo into correct
are constantly transmitted with a predefined sampling rate anytime a temperature range.”
“[…] received this TM alarm. The shipment is pre preconditioning no action
network coverage is available (in our study, we work with measure conditioning and the temperature has been back
ments collected at 10-minute intervals). In the case of network un in range. No action is required.”
availability, the data are stored locally with the corresponding “Alarm occurred due to handling procedure. We are handling investigation
timestamps and transmitted later. The event data are collected until in contact with […] office for this case.”
“[…] received alarm. That temperature deviation handling monitoring
sensors are deactivated after delivery.
was due to tarmac time transfer from aircraft.
The shipment of cargo is associated with the initial assignment and Please note that temperature is ok. We will keep
constant collection of relevant temperature monitoring data. For on eye.”
example, the permissible temperature range is determined in the course
of laboratory stability studies (Ammann, 2011; European Medicines
Agency, 2003), and is associated with each shipment in the form of rules features. Then, a way of encoding a cargo location at the time of alarm
for an upper and a lower temperature threshold that cannot be exceeded. triggering is proposed. It is followed by the engineering of features
Along with the two thresholds, a setpoint temperature is also defined, standing for temperature characteristics and a sensor role.
which usually corresponds to the average of two thresholds and signifies
the temperature that a cooling unit is set to maintain. Each shipment is 3.2.2.1. Semantic evaluation of alarm comments. A starting point of
associated with a number of regular (internal) sensors and ambient successful predictions in a classification or regression is the correct
(external) sensors that register and transmit measurements typical of a representation of available data through relevant features. The dividing
pharmaceutical or container doors/walls. When the shipment passes line between a capable classifier and a good data representation is
certain milestones along the SC, that is, cargo pick-up, arrival at the blurry. An ideal feature engineering would make the classification task
airport, delivery to a consignee, etc., shipment status update event data trivial and, conversely, an omnipotent classifier would make feature
are generated with a time and location of a shipment progress. extraction superfluous (Duda, Hart, & Stork, 2000). Given the lack of an
Temperature event data and rule specifications for a permissible omnipotent classifier, we understand the importance of data preparation
temperature range form the basis for a TA triggering logic. Whenever a and investigate the body of knowledge on TAs to gain insights into why
current measurement satisfies one of the conditions in a rule base, for TAs are triggered. This knowledge is reflected in the documented TA
example, internal temperature being higher than an upper threshold for comments.
at least 10 min, a TA is triggered. Alarm personnel are notified of a We have 16,525 documented TAs with SC context data at our
deviation, and begin examining the case to decide on a CM. They study disposal. After receiving a notification of a deviation, alarm personnel
recent temperature data, look into which sensors registered the devia are requested to document the context in which TA was triggered and
tion, consider where the shipment is according to the shipment status what action was undertaken to rectify the deviation. There is neither a
data, contact a responsible party to get or confirm additional informa template nor a list of required components that should comprise a full-
tion, and decide on a suitable CM. At the same time, alarm personnel are fledged comment. Moreover, alarm personnel may sometimes fail to
expected to document the examination process and a final decision on a state either a deviation cause (DC) or a CM taken. Examples of TA
CM. comments we worked with are listed in Table 1 (column TA comment).
Given the involvement of multiple parties along the cold air SC and We semantically evaluated documented TAs and identified whether
the constant possibility of exposure to temperatures outside a permis and which parts of the comments contained the DCs and CMs. We started
sible range, many triggered TAs may require an examination; however, with specific descriptions and groups of related descriptions (for
not all of them require particular CMs (human involvement) to combat instance, “low sensor battery level” or “shipment still on tarmac waiting
the deviations. For example, when sensors are attached to the packed for being loaded”) and made our way up to more general categories by
pharmaceutical products on the manufacturer’s facilities, they may not tagging and grouping separate DCs and CMs. After several rounds of
have cooled down (preconditioned) properly to deliver true measure generalizations, we ended up with the following seven DCs.
ments, which leads to TAs without necessary CMs. If a shipment is Preconditioning made up the first group and was often represented by
subject to a brief physical handling (loading, movement to a warehouse, “shipment building,” “shipment preparation,” “sensor assignment,” etc.
etc.) and the exposure to a higher temperature is registered by external For handling, the following descriptions were typical: “physical
sensors, TAs not requiring human involvement may be triggered again. handling,” “cargo movement,” “ground handling,” etc. Storage was
Finally, failing to deactivate sensors after delivery can lead to the mostly represented as “deviation during storage,” “wrong temperature
transmission of unrealistic measurements. Conversely, TAs usually room,” or “shipment misplaced”. The fourth category, transportation was
require immediate CMs because of deviations during transport or stor described by “flight,” “deviation during transport,” “deviation on truck,”
age, when cargo may have been erroneously stored within a wrong “cargo still on the way,” etc. The next group, malfunctioning was often
temperature range or when a cooling unit malfunctions. described by “sensors stopped working,” “low battery,” “defect sensor,”
In the period from August 2013 through April 2020, 37,517 TAs were etc. The category ambient temperature was described by “ambient
triggered for transported pharmaceuticals, of which 21,338 received sensor,” “internal readings still in range,” “high ambient temperature,”
comments from alarm personnel. Following rigorous text and context etc. Finally, we formed a category for DC not stated because of its
data quality checks, that is, timeliness of comments, chronological numerousness and also to allow for the identification of TAs whose DC
correspondence of the TA comments with the temperature and shipment was hard to identify.
status data, etc., 16,525 samples could be finally used for evaluations. After the same grouping procedure for CMs, we arrived at the
following categories: action (“check packaging,” “add dry ice,” “move
3.2.2. Feature engineering into proper storage room,” etc.), investigation (“still in contact with
In this subsection, we focus on the sematic evaluation of documented carrier,” “asked for storage conditions,” “confirming cargo location,”
alarm comments in terms of DCs and CMs and the resulting derivation of “investigate the case,” etc.), monitoring (“monitoring temperature,”
8
I. Konovalenko and A. Ludwig Expert Systems With Applications 190 (2022) 116208
9
I. Konovalenko and A. Ludwig Expert Systems With Applications 190 (2022) 116208
10
I. Konovalenko and A. Ludwig Expert Systems With Applications 190 (2022) 116208
11
I. Konovalenko and A. Ludwig Expert Systems With Applications 190 (2022) 116208
Table 2 Table 3
Classification performance of k-NN and Hk-NN. Performance of k-NN and Hk-NN on publicly available data sets.
ACC PR REC SP Coverage ACC (D) ACC (B) Clf. ACC PR REC SP Coverage ACC (D) ACC (B)
1
SC phase Preconditioning (4,293 samples) UCI Data set “Spambase”
DC k-NN 0.909 0.904 0.909 0.906 1.00 0.740 0.908
k-NN 0.895 0.789 0.651 0.974 1.00 0.455 0.813 Hk- 0.938 0.935 0.935 0.935 0.967 Abstain 0.935
Hk- 0.917 0.869 0.702 0.980 0.974 Abstain 0.841 NN
NN UCI Data set “Statlog (Vehicles Silhouettes)”2
Recommended CM k-NN 0.717 0.706 0.719 0.905 1.00 0.554 0.812
k-NN 0.797 0.777 0.763 0.933 1.00 0.584 0.848 Hk- 0.747 0.734 0.743 0.916 0.934 Abstain 0.830
Hk- 0.823 0.814 0.800 0.940 0.976 Abstain 0.870 NN
NN UCI Data set “EEG Eye State”3
SC phase Handling (5,265 samples) k-NN 0.804 0.818 0.793 0.793 1.00 0.723 0.793
DC Hk- 0.841 0.845 0.836 0.836 0.977 Abstain 0.836
k-NN 0.882 0.778 0.609 0.946 1.00 0.627 0.778 NN
Hk- 0.910 0.844 0.673 0.958 0.955 Abstain 0.816 UCI Data set “Semeion Handwritten Digit”4
NN k-NN 0.916 0.921 0.915 0.990 1.00 0.393 0.952
Recommended CM Hk- 0.943 0.946 0.941 0.994 0.965 Abstain 0.968
k-NN 0.751 0.769 0.713 0.917 1.00 0.632 0.815 NN
Hk- 0.781 0.800 0.763 0.931 0.929 Abstain 0.847 UCI Data set “Glass Identification”5
NN k-NN 0.976 0.954 0.951 0.995 1.00 0.000 0.973
SC phase Flight (3,256 samples) Hk- 1.00 1.00 1.00 1.00 0.986 Abstain 1.00
DC NN
k-NN 0.756 0.779 0.619 0.929 1.00 0.531 0.774 UCI Data set “MAGIC Gamma Telescope”6
Hk- 0.818 0.827 0.730 0.946 0.916 Abstain 0.838 k-NN 0.841 0.847 0.792 0.797 1.00 0.698 0.795
NN Hk- 0.853 0.856 0.811 0.811 0.967 Abstain 0.811
Recommended CM NN
k-NN 0.753 0.735 0.677 0.927 1.00 0.524 0.802 UCI Data set “Website Phishing”7
Hk- 0.804 0.793 0.740 0.941 0.929 Abstain 0.841 k-NN 0.880 0.845 0.840 0.928 1.00 0.813 0.884
NN Hk- 0.893 0.861 0.865 0.937 0.968 Abstain 0.901
SC phase Delivery (3,711 samples) NN
DC 1
k-NN 0.795 0.783 0.655 0.955 1.00 0.534 0.805 Data set can be downloaded at https://archive.ics.uci.edu/ml/datasets/Spam
Hk- 0.851 0.850 0.775 0.967 0.907 Abstain 0.871 base
2
NN Data set can be downloaded at https://archive.ics.uci.edu/ml/datasets/
Recommended CM Statlog+%28Vehicle+Silhouettes%29
k-NN 0.812 0.788 0.744 0.936 1.00 0.583 0.840 3
Data set can be downloaded at https://archive.ics.uci.edu/ml/datasets/
Hk- 0.844 0.826 0.796 0.948 0.957 Abstain 0.872 EEG+Eye+State.
NN 4
Data set can be downloaded at https://archive.ics.uci.edu/ml/datasets/
Semeion+Handwritten+Digit.
5
Data set can be downloaded at https://archive.ics.uci.edu/ml/datasets/Glass
4.1. General results
+Identification.
6
Data set can be downloaded at https://archive.ics.uci.edu/ml/datasets/MAG
In Section 3.2.2.3, we proposed the division of data into four data IC+Gamma+Telescope.
sets. We also chose to separately predict DCs and CMs instead of pre 7
Data set can be downloaded at https://archive.ics.uci.edu/ml/datasets/
dicting a fixed pair of categories. This way, the number of target classes Website+Phishing.
is dramatically reduced, and a better classification performance can be
expected. In the end, we obtained eight evaluation setups, namely for across eight prediction setups, Hk-NN constantly outperformed the
each SC phase and two target features. former in terms of true positive rate by a much larger margin. For
Table 2 contains evaluation results for k-NN and Hk-NN in terms of example, performance difference for DC for flight and delivery equaled
accuracy (ACC); precision (PR); recall (REC); specificity (SP); classifi 0.111 and 0.121, respectively. A smaller performance difference could
cation coverage (Coverage); accuracy in disagreement, that is, absti be registered for a positive predictive value: Hk-NN outperformed k-NN
nence from prediction (ACC (D)); and balanced accuracy (ACC (B)). by up to 0.080 for DC predictions in preconditioning. In terms of balanced
Accuracy is widely used in literature pertaining to alarm systems (Anuar accuracy, Hk-NN also outperformed k-NN across all evaluation setups.
et al., 2008; Farid, Harbi, & Rahman, 2010; Gaikwad & Thool, 2015; Su, Performance difference ranged between 0.022 (CM in preconditioning)
2011a; 2011b), and shows what portion of all alarms was correctly and 0.066 (DC in delivery).
classified by the algorithm. Precision, or positive predictive value, shows In Sections 3.1.1-3.1.2, we focused on the limitations of k-NN and
the number of cases where the algorithm correctly predicted a certain proposed Hk-NN to abstain from problematic predictions, among other
DC or CM out of all predictions for such DCs and CMs (Derczynski, things. In Table 2, both strengths and weaknesses of such abstinence are
2016). Recall, true positive rate, or sensitivity is also used in alarm reflected in prediction coverage and accuracy in disagreement. We see,
classification research (Farnaaz & Jabbar, 2016; Y. Wang, 2005), and on the one hand, that k-NN accuracy is close to random guess for test
shows what share of predictions for a specific DC or CM was correct out samples, where Hk-NN decided to abstain from predictions. It is espe
of all such predictions. Specificity, or true negative rate, shows the cially crucial when we have to deal with higher costs of mis
proportion of correctly classified negatives out of all negatives (Altman classifications. On the other hand, we face more TA examinations with
& Bland, 1994). Finally, balanced accuracy considers data imbalances Hk-NN, since it does not provide predictions for all test samples. Basi
by normalizing true positives and negatives by the number of positive cally, prediction coverage serves as an indicator for problematic regions
and negative data points, which is also expressed as the sum of recall and in data and hints at a possible accuracy gain of Hk-NN. The lower the
specificity divided by two (Mower, 2005). prediction coverage, the larger the performance difference between k-
Generally, we observe that Hk-NN consistently yields a better per NN and Hk-NN because of the presence of more problematic regions in
formance across all evaluation metrics. Performance gain for accuracy data. We consider whether this peculiarity of Hk-NN affects its value in
ranged between 0.022 (DC for SC phase preconditioning) and 0.062 (DC terms of temperature exposure in a simulation in Section 4.3.
for flight). Although k-NN achieved a competitively high specificity Our final observation regards fairly consistent superior accuracy of
12
I. Konovalenko and A. Ludwig Expert Systems With Applications 190 (2022) 116208
DCs vs. CMs. With the only exception of k-NN in a SC phase delivery, both
classifiers yielded more accurate predictions for DCs, especially for
handling (difference of 0.131 and 0.129 for k-NN and Hk-NN, respec
tively). We attribute this difference to the predominantly factual nature
of DCs, and a relatively judgmental nature of CMs because of the varying
proactiveness of alarm personnel. The resulting higher variance and
overlap of classes for CMs leads to somewhat lower accuracies. We
investigate this case in detail in Section 4.4 by extending Hk-NN to the
SBP.
13
I. Konovalenko and A. Ludwig Expert Systems With Applications 190 (2022) 116208
certain number of TAs triggered at the same time. We run a model for a Table 4
different number of simultaneous TAs (1 through 24) to assess the Extension to the SBP.
possible effect of increasing workload. We consider the exposure to SC phase Constrained accuracy Inclusive accuracy
temperatures above the upper threshold and calculate MKT for a period calculation calculation
of one hour prior to a TA, during which temperature was at a setpoint DC CM DC CM
level, and afterwards till the TA was processed and temperature returned
Preconditioning 0.941 0.963 0.977 0.991
to its normal value. For simplicity, we focus on the cases with temper (+0.002) (+0.003) (+0.038) (+0.031)
ature increases in the + 2–+8◦ C range, but ignore the cases of temper Handling 0.944 0.956 0.985 0.988
ature going below a lower threshold. Simulation results are illustrated in (±0.000) (− 0.016) (+0.041) (+0.016)
Fig. 8, and show the share of TAs for which MKT was exceeded in 1000 Flight 0.896 0.919 0.952 0.970
(+0.010) (+0.006) (+0.066) (+0.056)
simulation runs.
Delivery 0.890 0.963 0.961 0.992
We notice that for up to five simultaneous TAs per one alarm (+0.001) (− 0.006) (+0.072) (+0.023)
employee, all approaches to TA processing yield good product safety;
however, the situation changes with an increasing number of TAs.
Manual processing suffers the most from the increasing workload class by Hk-NN and repetition of the entire procedure, that is, selection
because of the sequential TA examination. Hk-NN and k-NN substan of k, estimation of the radius size, etc., is possible. However, it would
tially reduce the frequency of TAs with excess MKT; however, the pose an additional computational burden. In the case of homogenous
increasing number of TAs still leads to a slight increase in the cases of neighborhoods possibly with a single representative class, it would also
spoiled products. Again, Hk-NN demonstrates the leading performance result in considering the votes of very remote neighbors of a different
despite its disadvantage in prediction coverage. Performance gap in class in k-NN. It would also either increase the volume of terminal nodes
creases with the number of simultaneous TAs. The reasons for this of DTs containing distant data points of a different class or build decision
performance gap are the generally better classification accuracy of Hk- boundaries in sparse regions where the majority class was removed, thus
NN, and a lower false negative rate for CMs action and investigation that reducing the reliability of predictions for new observations close to
leads to a higher number of active TAs after misclassifications by k-NN. decision boundaries.
Provided the current maximum number of TAs per one LSP employee In light of these considerations, we adhere to a simple and straight
approaches nine and the growing trend in recent years can be observed, forward procedure for selecting the SBP based only on the data points
the simulation provides a projection of how a situation can develop in within the k-NN neighborhood. First, we check whether the class pre
the future. Either alarm personnel’s capacity expansion or TA processing dicted by Hk-NN coincides with the initial prediction by k-NN. If they are
automation would be necessary to avoid the negative effect of increasing not the same, it means that even though the class predicted by k-NN
workloads. prevailed in the neighborhood, another class was chosen based on the
Hk-NN voting policy. In this case, we take this initial k-NN prediction as
4.4. Extension to SBP the SBP.
If these predictions coincide, however, we further explore the
In addition to comparing the performance of Hk-NN with k-NN, we remaining data points within the neighborhood of k-NN. If the first
extend the evaluation to a scenario in which the SBP is also provided. predicted class is the only class in the hypersphere, then we abstain from
This extension is motivated by the following practical considerations. the SBP. If the hypersphere initially contains two classes, then the mi
First, it is possible to have competing classes in a k-NN neighborhood nority class is the SBP. Finally, if the k-NN neighborhood initially con
in training data, and consequently, in recommendations. This is not tains data points with over two classes, we identify whether they
traceable in standard libraries, and predictions in such situations are not compete with each other. In the case of no competition, the choice of the
deterministic and subject to the embedded logic of algorithm packages majority class is obvious; if two or more remaining classes compete, we
(e.g., a class with lower encoded numeric value is predicted). Second, choose them as the SBPs.
the LSP might have an intrinsic motivation to avoid an imprudent reli Table 4 shows the accuracy of predicted DCs and CMs. We calculated
ance on single-class predictions. If alarm personnel are presented with accuracy in two ways: a) constrained (abstinence from a prediction either
recommended CMs, the incentive to closely study TAs and ensure their for the first-, or the SBP was not considered a correct prediction if k-NN
accuracy may be negligible. To combat the negative effect of the prediction was incorrect); b) inclusive (abstinence from a prediction
abovementioned scenarios, we look into the change in accuracy asso either for the first-, or the SBP was considered a correct prediction if k-
ciated with the introduction of SBP. NN prediction was incorrect). Numbers in brackets show the perfor
A repeated application of the Hk-NN would be problematic in an SBP mance advantage over k-NN.
setting for a number of reasons. In the first scenario, consideration of a Obviously, a more inclusive calculation strategy provides more
second-major class within the existing boundaries would be conceiv optimistic results. The biggest difference in accuracy across calculation
able. However, radius neighbors classifier faces multiple cases of empty strategies reaches 0.071 for the DC delivery. The accuracy difference is at
radius neighborhood already in the first iteration. Removal of data least 0.028 (CM for preconditioning), and reflects the share of abstinence
points of the first predicted class would obviously increase the number cases in a particular data set for which k-NN failed to provide a correct
of such cases and delegate the voting task to the nearest neighbor clas prediction. It is also worth noting that a constrained calculation strategy
sifier. Such voting outcome would not be reliable if the first predicted can be disadvantageous for Hk-NN because k-NN can provide a correct
class coincides with the nearest neighbor prediction. The new nearest classification for test samples where Hk-NN chose to abstain. In case of
neighbor is not guaranteed to be close to the new observation, and may the inclusive calculation strategy, however, Hk-NN tends to consistently
not reliably indicate the probable class of the latter. Moreover, DTs outperform k-NN.
produce terminal nodes with minimized entropy or Gini impurity, in Generally, we observe that presenting the alarm personnel with two
which the new observation may be farther away from some points than (in the case of competing classes in the SBP, with three) DCs or CMs can
the most distant neighbor in a k-NN neighborhood. Therefore, the achieve an accuracy of well over or almost 0.9, even for the constrained
second-major class data points may either be distant, or, especially in the calculation. Inclusive accuracy calculation provides an accuracy of 0.99
case of a small k, represent noise. Finally, the voting policy in Hk-NN for CMs of preconditioning and delivery. At the same time, accuracy does
would fail in cases where only one class represents a k-NN neighborhood not fall below 0.95 for the inclusive calculation. Therefore, an extension
or all data points in a DT leaf. to the SBP may be advantageous if concerns about the reliance on single-
In the second scenario, the removal of data points of the first predicted class predictions are justified.
14
I. Konovalenko and A. Ludwig Expert Systems With Applications 190 (2022) 116208
One interesting finding worth noting is the consistently better per Table A1
formance for CMs, which is in contrast with the single-class predictions Variables in a simulation model.
by k-NN and Hk-NN (see Table 2. In Section 4.1, we ascribed this con Variable Source
trary observation to the fact that human judgment can introduce addi
TA examination time Derived from collected data on
tional variance to CMs (for instance, a more proactive employee decides documented CMs and TA triggering;
to monitor, even though the temperature is still within a permissible approximated as Γ(2.0, 0.15)
range, in contrast to a less proactive employee deciding to take no ac Extra time burden (CMs action and Derived from collected data on
tion). On the other hand, DCs are rather factual than judgmental in investigation) documented CMs; approximated as
nature. Consequently, we believe that the class boundaries for CMs tend Γ(2.0, 0.1)
to demonstrate overlaps for uncertain situations that are exacerbated by Extra time burden (CM monitoring) Derived from collected data on
documented CMs and inquiries to the
differing cautiousness of employees. For this reason, we see that the
LSP; approximated as Γ(2.0, 0.05)
introduction of SBP improves the accuracy of DCs to a lesser extent.
Deviation (range) for triggered TAs Derived from sensor measurement and
TA triggering data; approximated as
5. Conclusions U (0.1, 2.0)
Deviation (range) during enactment of a Derived from sensor measurement and
In this study, we focused on the problem of manual examinations of CM TA triggering data; approximated as
U (0.1, 0.3)
TAs in a pharmaceutical SC. Given the time-consuming nature of ex
Deviation (range) for brief deviations Derived from sensor measurement and
aminations, we searched for a way to provide alarm personnel with DS
requiring no CM TA triggering data; approximated as
based on the data collected for previous shipments and documented TAs. U (0.0, 0.1)
We proposed Hk-NN as the adjustment to k-NN in the context of DS TA triggering interval Derived from sensor measurement and
generation in TA processing. The new algorithm is represented by a two- TA triggering data (one sampling rate of
step voting procedure based on k-NN with an entropy-minimized fixed- 0.167 h)
size radius, pruned DTs, and nearest neighbor predictions. The proposed Probability of TA being triggered in SC Estimated on the basis of the numbers of
phase preconditioning, handling, flight, historically triggered TAs for different
modifications were targeted at the weaknesses of k-NN in a setting with and delivery SC phases; see Section 5.1
higher costs of misclassifications and preferred manual examinations in Probability of a TA requiring a CM action, Derived from collected data on CMs on
the cases of unreliable predictions. investigation, monitoring, and no action, the basis of historical TAs requiring
We used the SC and TA processing data from a large international or that no CM was documented for corresponding CMs
similar historical TAs
LSP from mid-2013 through 2020 for the evaluation of Hk-NN. Based on
TA classification accuracy for k-NN and Classifier evaluations; see column ACC
16,525 encoded TA comments in terms of DCs, CMs, and temperature Hk-NN in Table 2
and SC context data, we formed eight experimental setups. All setups Classification coverage for Hk-NN Classifier evaluations; see column
demonstrated the leading performance of Hk-NN. For example, Hk-NN Coverage in Table 2
outperformed k-NN by up to 0.062 with the accuracy of 0.917. It also Probability of misclassifying a TA with a Derived from confusion matrices for k-
CM action, investigation, monitoring, and NN and Hk-NN
yielded a balanced accuracy of 0.870 (0.066 higher than k-NN). Macro-
no action
average recall, precision, and specificity scores were also consistently
higher for Hk-NN. At the same time, for test samples, for which Hk-NN
abstained from predictions, accuracy of k-NN was close to random applications of Hk-NN may range from situations with a preference for
guessing and ranged between 0.455 and 0.627. Leading performance of abstinence from predictions over random guessing to the cases in which
Hk-NN was also backed up by additional evaluations on seven publicly the cost of (additional) manual processing is insignificant when
available ML data sets. compared to the criticality of a misclassification (i.e., predictive diag
Simulations for MKT stability demonstrated that—in comparison to nosis, loan risk prediction, predictive maintenance, etc.).
k-NN and manual processing—Hk-NN was associated with a lower fre However, despite more accurate predictions on DCs and CMs, Hk-NN
quency of MKT exceeding an upper threshold within TA processing time, still cannot be applied in some marginal cases. For example, it becomes
plus one hour prior to a deviation. The advantage increased linearly with less practical as the values of optimal k become small. For such scenarios
an increasing number of simultaneous TAs. Provided the container value that are less affected by noise, the search for a fixed-size radius becomes
of cooled pharmaceutical products ranges from a couple hundred impractical and DTs cannot be effectively controlled for overfitting.
thousand to millions of euros, the fewer cases of MKT exceeding a Similarly, if k-NN already offers a very good performance, which often
permissible threshold speak for significant potential monetary savings. speaks for a good data structure, the gains offered by Hk-NN may be
Motivated by computational and psychological considerations, we insignificant. Apart from these limitations, we see the potential of
also introduced the extension of Hk-NN to the SBP. Again, Hk-NN out extending our work in the following directions. First, further elabora
performed k-NN in terms of accuracy, and its extension to the SBP led to tions of voting policies are conceivable, namely, a study on the weighted
a significant accuracy gain for CMs (up to 0.207). We attributed this approach based on the share of data points in a k-NN neighborhood or a
increase to a higher class overlap in light of a more judgmental nature of DT leaf, and their distance to the test observation. One more direction is
CMs by alarm personnel. the use of a kernel function instead of a fixed-size radius that will allow
Consequently, in a narrower context of TA processing, Hk-NN is for a more flexible weighting of differently distant points (i.e., Parzen
capable of providing still interpretable and more reliable DS (in terms of windows). Although the currently used fixed radius can be regarded as a
used evaluation metrics and exceeded MKT) in comparison with k-NN univariate kernel, we believe that other functions could offer better
predictions. In this specific context, this DS potentially leads to reduced classification results (or at least for certain data generating processes).
TA examination times, more informed TA examinations, and ranking of
simultaneous TAs (depending on the recommended CM, TAs can be CRediT authorship contribution statement
presented in the order of their importance, as determined by the ne
cessity of human involvement). In a broader context, the proposed al Iurii Konovalenko: Conceptualization, Methodology, Software,
gorithm is beneficial for solving classification problems with higher Data curation, Writing – original draft, Visualization, Supervision,
misclassification costs. Thanks to its cautious voting mechanism that Project administration. André Ludwig: Conceptualization, Methodol
abstains from predictions in data regions with distant and/or heterog ogy, Data curation, Writing – original draft, Writing – review & editing.
enous samples, less reliable predictions can be identified, which is not
possible with traditional k-NN implementations. The suitable
15
I. Konovalenko and A. Ludwig Expert Systems With Applications 190 (2022) 116208
Declaration of Competing Interest Dreiseitl, S., & Ohno-Machado, L. (2002). Logistic regression and artificial neural
network classification models: A methodology review. Journal of Biomedical
Informatics, 35(5–6), 352–359. https://doi.org/10.1016/S1532-0464(03)00034-0
The authors declare that they have no known competing financial Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern Classification ((2nd ed.).). Wiley-
interests or personal relationships that could have appeared to influence Interscience.
the work reported in this paper. Emenike, C. C., Van Eyk, N. P., & Hoffman, A. J. (2016). Improving Cold Chain Logistics
through RFID temperature sensing and Predictive Modelling. In Presented at the 2016
IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), IEEE
Appendix (pp. 2331–2338). https://doi.org/10.1109/ITSC.2016.7795932
European Commission. Directive 2001/83/EC of the European Parliament and of the
Council on the Community Code Relating to Medicinal Products for Human Use
(2001).
European Medicines Agency. Note for Guidance on Stability Testing: Stability Testing of
new Drug Substances and Products (2003). CPMP/ICH/2736/99.
Appendix A. Supplementary data Farid, D., Harbi, N., & Rahman, M. Z. (2010). Combining Naive Bayes and Decision Tree
for Adaptive Intrusion Detection. International Journal of Network Security & Its
Supplementary data to this article can be found online at https://doi. Applications, 2(2), 12–25. https://doi.org/10.5121/ijnsa10.5121/ijnsa:
2010.220010.5121/ijnsa:2010.2202
org/10.1016/j.eswa.2021.116208.
Farnaaz, N., & Jabbar, M. A. (2016). Random Forest Modeling for Network Intrusion
Detection System. Procedia Computer Science, 89, 213–217. https://doi.org/10.1016/
References j.procs.2016.06.047
Fayyad, U. M., & Irani, K. B. (1993). Multi-Interval Discretization of Continuous-Valued
Attributes for Classification Learning. In Presented at the International Joint Conference
Ahmim, A., Maglaras, L., Ferrag, M. A., Derdour, M., & Janicke, H. (2019). A Novel
on Artificial Intelligence (pp. 1022–1027).
Hierarchical Intrusion Detection System Based on Decision Tree and Rules-Based
Gaikwad, D. P., & Thool, R. C. (2015). Intrusion Detection System Using Bagging
Models. In Presented at the 2019 15th International Conference on Distributed Computing
Ensemble Method of Machine Learning. In Presented at the 2015 International
in Sensor Systems (DCOSS), IEEE (pp. 228–233). https://doi.org/10.1109/
Conference on Computing Communication Control and automation(ICCUBEA), IEEE (pp.
DCOSS.2019.00059
291–295). https://doi.org/10.1109/ICCUBEA.2015.61
Aldayel, M. S. (2012). K-Nearest Neighbor classification for glass identification problem.
Gómez, S. A., Goron, A., Groza, A., & Letia, I. A. (2016). Assuring safety in air traffic
In Presented at the 2012 International Conference on Computer Systems and Industrial
control systems with argumentation and model checking. Expert Systems with
Informatics (ICCSII), IEEE (pp. 1–5). https://doi.org/10.1109/ICCSII.2012.6454522
Applications, 44, 367–385. https://doi.org/10.1016/j.eswa.2015.09.027
Allach, S., Ahmed, M. B., & Boudhir, A. A. (2019). Detection of driver’s alertness level
Haan, G. H., Hillegersberg, J. V., de Jong, E., & Sikkel, K. (2013). Adoption of Wireless
based on the Viola and Jones method and logistic regression analysis. International
Sensors in Supply Chains: A Process View Analysis of a Pharmaceutical Cold Chain.
Journal of. Intelligent Enterprise, 6(2/3/4), 356. https://doi.org/10.1504/
Journal of Theoretical and Applied Electronic Commerce Research, 8(2), 138–154.
IJIE.2019.101135
https://doi.org/10.4067/S0718-18762013000200011
Altman, D. G., & Bland, J. M. (1994). Statistics Notes: Diagnostic tests 1: Sensitivity and
Hafliðason, T., Ólafsdóttir, G., Bogason, S., & Stefánsson, G. (2012). Criteria for
specificity. Bmj, 308(6943), 1552. https://doi.org/10.1136/bmj.308.6943.1552
temperature alerts in cod supply chains. International Journal of Physical Distribution
Ammann, C. (2011). Stability Studies Needed to Define the Handling and Transport
& Logistics Management, 42(4), 355–371. https://doi.org/10.1108/
Conditions of Sensitive Pharmaceutical or Biotechnological Products. AAPS
09600031211231335
PharmSciTech, 12(4), 1264–1275. https://doi.org/10.1208/s12249-011-9684-0
Harrou, F., Taghezouit, B., & Sun, Y. (2019). Improved kNN-Based Monitoring Schemes
Anuar, N. B., Sallehudin, H., Gani, A., & Zakaria, O. (2008). Identifying False Alarm for
for Detecting Faults in PV Systems. IEEE Journal of Photovoltaics, 9(3), 811–821.
Network Intrusion Detection System Using Hybrid Data Mining and Decision Tree.
https://doi.org/10.1109/JPHOTOV.550386910.1109/JPHOTOV.2019.2896652
Malaysian. Journal of Computer Science, 21(2), 101–115. https://doi.org/10.22452/
Haynes, J. D. (1971). Worldwide Virtual Temperatures for Product Stability Testing.
mjcs10.22452/mjcs.vol21no210.22452/mjcs.vol21no2.3
Journal of Pharmaceutical Sciences, 60(6), 927–929. https://doi.org/10.1002/
Arar, Ö. F., & Ayan, K. (2017). A feature dependent Naive Bayes approach and its
jps.2600600629
application to the software defect prediction problem. Applied Soft Computing, 59,
Hsieh, C.-Y., Huang, C.-N., Liu, K.-C., Chu, W.-C., & Chan, C.-T. (2016). A machine
197–209. https://doi.org/10.1016/j.asoc.2017.05.043
learning approach to fall detection algorithm using wearable sensor. In Presented at
Bentley, J. L., Stanat, D. F., & Williams, E. H., Jr. (1977). The complexity of finding fixed-
the 2016 International Conference on Advanced Materials for Science and Engineering
radius near neighbors. Information Processing Letters, 6(6), 209–212. https://doi.org/
(ICAMSE), IEEE (pp. 707–710). https://doi.org/10.1109/ICAMSE.2016.7840209
10.1016/0020-0190(77)90070-9
Ishimtsev, V., Bernstein, A., Burnaev, E., & Nazarov, I. (2017). Conformal k-NN Anomaly
Bergstra, J., & Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization. The
Detector for Univariate Data Streams. In Presented at the Sixth Workshop on Conformal
Journal of Machine Learning Research, 13(1), 281–305.
and Probabilistic Prediction and Applications (pp. 213–227).
Bhatia, N., & Ashev, V. (2010). Survey of Nearest Neighbor Techniques. International
Jadhav, S. D., & Channe, H. P. (2016). Comparative Study of K-NN, Naive Bayes and
Journal of Computer Science and Information Security, 8(2), 1–4.
Decision Tree Classification Techniques. International Journal of Science and Research
Bhatti, U. A., Huang, M., Wu, D., Zhang, Y., Mehmood, A., & Han, H. (2019).
(IJSR), 5(1), 1842–1845. http://doi.org/10.21275/v5i1.NOV153131.
Recommendation system using feature extraction and pattern recognition in clinical
Jokanovic, B., & Amin, M. (2017). Fall Detection Using Deep Learning in Range-Doppler
care systems. Enterprise Information Systems, 13(3), 329–351. https://doi.org/
Radars. IEEE Transactions on Aerospace and Electronic Systems, 54(1), 180–189.
10.1080/17517575.2018.1557256
https://doi.org/10.1109/TAES.2017.2740098
Bouneffouf, D., Bouzeghoub, A., & Ganarski, A. L. (2013). Risk-Aware Recommender
Kim, P.-S., & Kutzner, A. (2008). In Lecture Notes in Computer ScienceTheory and
Systems. In Neural Networks: Tricks of the Trade (Vol. 8226, pp. 57–65). Berlin,
Applications of Models of Computation (pp. 246–257). Berlin, Heidelberg: Springer
Heidelberg: Springer Berlin Heidelberg. http://doi.org/10.1007/978-3-642-42054-
Berlin Heidelberg.
2_8.
Kohavi, R., & Sahami, M. (1996). Error-based and entropy-based discretization of
Bouzekri, E., Canny, A., Fayollas, C., Martinie, C., Palanque, P., Barboni, E., et al. (2019).
continuous features (pp. 114–119). Presented at the Second International
Engineering issues related to the development of a recommender system in a critical
Conference on Knowledge Discovery and Data Mining, Portland, Oregon.
context: Application to interactive cockpits. International Journal of Human-Computer
Kumar, B. J., Naveen, H., Kumar, B. P., Sharma, S. S., & Villegas, J. (2017). Logistic
Studies, 121, 122–141. https://doi.org/10.1016/j.ijhcs.2018.05.001
regression for polymorphic malware detection using ANOVA F-test (pp. 1–5).
Brown, R. A. (2015). Building a Balanced k-d Tree in O(knlogn) Time. Journal of Computer
Presented at the 2017 4th International Conference on Innovations in Information,
Graphics Techniques, 4(1), 50–68.
Embedded and Communication Systems (ICIIECS), IEEE. 10.1109/
Chen, R.-Y. (2017). An intelligent value stream-based approach to collaboration of food
ICIIECS.2017.8275880.
traceability cyber physical system by fog computing. Food Control, 71, 124–136.
Kumar, P. M., & Devi Gandhi, U. (2018). A novel three-tier Internet of Things
https://doi.org/10.1016/j.foodcont.2016.06.042
architecture with machine learning algorithm for early detection of heart diseases.
Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to Algorithms
Computers & Electrical Engineering, 65, 222–235. https://doi.org/10.1016/j.
(Third). MIT Press.
compeleceng.2017.09.001
Cristianini, N., & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and
Kültür, Y., & Çağlayan, M. U. (2016). Hybrid approaches for detecting credit card fraud.
Other Kernel-based Learning Methods ((1st ed.,, pp. 128–131). Cambridge University
Expert Systems, 34(2), Article e12191. https://doi.org/10.1111/exsy.12191
Press.
Lang, W., Jedermann, R., Mrugala, D., Jabbari, A., Krieg-Brückner, B., & Schill, K.
Dao, A.-Q., Koltai, K., Cals, S. D., Brandt, S. L., Lachter, J., Matessa, M., et al. (2015).
(2011). The “Intelligent Container”—A Cognitive Sensor Network for Transport
Evaluation of a Recommender System for Single Pilot Operations. Procedia
Management. IEEE Sensors Journal, 11(3), 688–698. https://doi.org/10.1109/
Manufacturing, 3, 3070–3077. https://doi.org/10.1016/j.promfg.2015.07.853
JSEN.2010.2060480
De Luca, A., & Termini, S. (1972). A definition of a nonprobabilistic entropy in the setting
Leng, K., Jin, L., Shi, W., & Van Nieuwenhuyse, I. (2018). Research on agricultural
of fuzzy sets theory. Information and Control, 20(4), 301–312. https://doi.org/
products supply chain inspection system based on internet of things. Cluster
10.1016/S0019-9958(72)90199-4
Computing, 22(S4), 8919–8927. https://doi.org/10.1007/s10586-018-2021-6
Derczynski, L. (2016). Complementarity, F-score, and NLP Evaluation. Presented at the
Li, L., Yu, Y., Bai, S., Hou, Y., & Chen, X. (2017). An Effective Two-Step Intrusion
International Conference on Language Resources and Evaluation.
Detection Approach Based on Binary Classification and k -NN. IEEE Access, 6,
Domeniconi, C., Peng, J., & Gunopulos, D. (2001). An Adaptive Metric Machine for
12060–12073. https://doi.org/10.1109/ACCESS.2017.2787719
Pattern Classification. In Presented at the Thirteenth International Conference on Neural
Information Processing Systems (pp. 437–443).
16
I. Konovalenko and A. Ludwig Expert Systems With Applications 190 (2022) 116208
Liu, J., Higgins, A., & Tan, Y.-H. (2010). IT enabled redesign of export procedure for Antennas and Propagation, 67(10), 6612–6626. https://doi.org/10.1109/
high-value pharmaceutical product under temperature control: the case of drug TAP.810.1109/TAP.2019.2921150
living lab (pp. 1–18). Presented at the Annual International Conference on Digital Shang, J., & Chen, T. (2020). Early Classification of Alarm Floods via Exponentially
Government Research, Public Administration Online Challenges and Opportunities, Attenuated Component Analysis. IEEE Transactions on Industrial Electronics, 67(10),
Puebla, Mexico. 8702–8712. https://doi.org/10.1109/TIE.4110.1109/TIE.2019.2949542
Matthias, D. M., Robertson, J., Garrison, M. M., Newland, S., & Nelson, C. (2007). Su, M.-Y. (2011a). Real-time anomaly detection systems for Denial-of-Service attacks by
Freezing temperatures in the vaccine cold chain: A systematic literature review. weighted k-nearest-neighbor classifiers. Expert Systems with Applications, 38(4),
Vaccine, 25(20), 3980–3986. https://doi.org/10.1016/j.vaccine.2007.02.052 3492–3498. https://doi.org/10.1016/j.eswa.2010.08.137
Mondal, S., Wijewardena, K. P., Karuppuswami, S., Kriti, N., Kumar, D., & Chahal, P. Su, M.-Y. (2011b). Using clustering to improve the KNN-based classifiers for online
(2019). Blockchain Inspired RFID-Based Information Architecture for Food Supply anomaly network traffic identification. Journal of Network and Computer Applications,
Chain. IEEE Internet of Things Journal, 6(3), 5803–5813. https://doi.org/10.1109/ 34(2), 722–730. https://doi.org/10.1016/j.jnca.2010.10.009
JIoT.648890710.1109/JIOT.2019.2907658 Swarnkar, M., & Hubballi, N. (2016). OCPAD: One class Naive Bayes classifier for
Mousheimish, R., Taher, Y., & Zeitouni, K. (2016). Automatic learning of predictive rules payload based anomaly detection. Expert Systems with Applications, 64, 330–339.
for complex event processing (pp. 414–417). Presented at the The 10th ACM https://doi.org/10.1016/j.eswa.2016.07.036
International Conference on Distributed Event-Based Systems, New York, New York, Thakur, M., & Forås, E. (2015). EPCIS based online temperature monitoring and
USA: ACM Press. 10.1145/2933267.2933430. traceability in a cold meat chain. Computers and Electronics in Agriculture, 117, 22–30.
Mousheimish, R., Taher, Y., & Zeitouni, K. (2017). In Automatic Learning of Predictive CEP https://doi.org/10.1016/j.compag.2015.07.006
Rules (pp. 158–169). New York, New York, USA: ACM Press. https://doi.org/ Tsang, Y. P., Choy, K. L., Wu, C. H., Ho, G. T. S., Lam, C. H. Y., & Koo, P. S. (2018). An
10.1145/3093742.3093917. Internet of Things (IoT)-based risk monitoring system for managing cold supply
Mower, J. P. (2005). PREP-Mt: Predictive RNA editor for plant mitochondrial genes. BMC chain risks. Industrial Management & Data Systems, 118(7), 1432–1462. https://doi.
Bioinformatics, 6(1), 96–115. https://doi.org/10.1186/1471-2105-6-96 org/10.1108/IMDS-09-2017-0384
Naderpour, M., Lu, J., & Zhang, G. (2014). An intelligent situation awareness support Tu, M., Lim, M. K., & Yang, M.-F. (2018). IoT-based production logistics and supply chain
system for safety-critical environments. Decision Support Systems, 59, 325–340. system – Part 1. Industrial Management & Data Systems, 118(1), 65–95. https://doi.
https://doi.org/10.1016/j.dss.2014.01.004 org/10.1108/IMDS-11-2016-0503
Nawaz, F., Janjua, N. K., & Hussain, O. K. (2019). PERCEPTUS: Predictive complex event Ulrich, T. A., Lew, R., Poresky, C. M., Rice, B. C., Thomas, K. D., & Boring, R. L. (2017).
processing and reasoning for IoT-enabled supply chain. Knowledge-Based Systems, Operator-in-the-Loop Study for a Computerized Operator Support System (COSS). Cross-
180, 133–146. https://doi.org/10.1016/j.knosys.2019.05.024 System and System-Independent Evaluations.
Om, H., & Kundu, A. (2012). A hybrid system for reducing the false alarm rate of Wang, Y. (2005). A multinomial logistic regression modeling approach for anomaly
anomaly intrusion detection system. In Presented at the 2012 1st International intrusion detection. Computers & Security, 24(8), 662–674. https://doi.org/10.1016/
Conference on Recent Advances in Information Technology (RAIT), IEEE (pp. 131–136). j.cose.2005.05.003
https://doi.org/10.1109/RAIT.2012.6194493 Wattanakul, S., Henry, S., Bentaha, L., Reeveerakul, N., & Ouzrout, Y. (2017). Improving
Pal, A., & Kant, K. (2019). Internet of Perishable Logistics: Building Smart Fresh Food risk management by using smart containers for real-time traceability (pp. 1–8).
Supply Chain Networks. IEEE Access, 7, 17675–17695. https://doi.org/10.1109/ Presented at the International Conference on Logistics and Transport, Bangkok,
ACCESS.2019.2894126 Thailand.
Peng, K., Leung, V. C. M., Zheng, L., Wang, S., Huang, C., & Lin, T. (2018). Intrusion West, D., Mangiameli, P., Rampal, R., & West, V. (2005). Ensemble strategies for a
Detection System Based on Decision Tree over Big Data in Fog Environment. Wireless medical diagnostic decision support system: A breast cancer diagnosis application.
Communications and Mobile Computing, 2018(5), 1–10. https://doi.org/10.1155/ European Journal of Operational Research, 162(2), 532–551. https://doi.org/
2018/4680867 10.1016/j.ejor.2003.10.013
Pilarski, M. G. (2014). The Concept of Recommender System Supporting Command and World Health Organization. (2003). Guide to good storage practices for pharmaceuticals
Control System in Hierarchical Organization. In Presented at the 2014 European (No. 908). WHO Technical Report Series (p. 12). World Health Organization.
Network Intelligence Conference (ENIC), IEEE (pp. 138–141). https://doi.org/ Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., et al. (2008). Top
10.1109/ENIC.2014.9 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1–37.
Ricci, F., Rokach, L., & Shapira, B. (Eds.). (2015). Recommender Systems Handbook. https://doi.org/10.1007/s10115-007-0114-2
Boston, MA: Springer US. Yang, K., Botero, U., Shen, H., Woodard, D. L., Forte, D., & Tehranipoor, M. M. (2018).
Ross, T. J. (2010). Fuzzy Logic with Engineering Applications ((3rd ed.).). Wiley. UCR: An Unclonable Environmentally Sensitive Chipless RFID Tag For Protecting
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation Supply Chain. ACM Transactions on Design Automation of Electronic Systems, 23(6),
of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. 1–24. https://doi.org/10.1145/3264658
https://doi.org/10.1016/0377-0427(87)90125-7 Yeh, I.-C., & Lien, C.-H. (2009). The comparisons of data mining techniques for the
Ruiz, D., Berenguer, V., Soriano, A., & Sánchez, B. (2011). A decision support system for predictive accuracy of probability of default of credit card clients. Expert Systems with
the diagnosis of melanoma: A comparative approach. Expert Systems with Applications, 36(2), 2473–2480. https://doi.org/10.1016/j.eswa.2007.12.020
Applications, 38(12), 15217–15223. https://doi.org/10.1016/j.eswa.2011.05.079 Zakeri, A., Saberi, M., Hussain, O. K., & Chang, E. (2018). An Early Detection System for
Schaefer, R. (2019). How to become CEIV Pharma Certified (p. (p. 129).). Internatinal Air Proactive Management of Raw Milk Quality: An Australian Case Study. IEEE Access,
Transport Association. 6, 64333–64349. https://doi.org/10.1109/ACCESS.2018.2877970
Serdarasan, S., & Tanyas, M. (2012). Dealing with Complexity in the Supply Chain: The Zhang, Y., Wang, W., Yan, L., Glamuzina, B., & Zhang, X. (2019). Development and
Effect of Supply Chain Management Initiatives. SSRN Electronic Journal. https://doi. evaluation of an intelligent traceability system for waterless live fish transportation.
org/10.2139/ssrn.2056331 Food Control, 95, 283–297. https://doi.org/10.1016/j.foodcont.2018.08.018
Shafiq, Y., Gibson, J. S., Kim, H., Ambulo, C. P., Ware, T. H., & Georgakopoulos, S. V. Zhu, W., Sun, W., & Romagnoli, J. (2018). Adaptive k-Nearest-Neighbor Method for
(2019). A Reusable Battery-Free RFID Temperature Sensor. IEEE Transactions on Process Monitoring. Industrial & Engineering Chemistry Research, 57(7), 2574–2586.
https://doi.org/10.1021/acs.iecr.7b0377110.1021/acs.iecr.7b03771.s002
17