Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Neural Computing and Applications

https://doi.org/10.1007/s00521-020-05432-2 (0123456789().,-volV)(0123456789().
,- volV)

S.I. : BIO-INSPIRED COMPUTING FOR DLA

Fault detection of continuous glucose measurements based


on modified k-medoids clustering algorithm
Xia Yu1 • Xiaoyu Sun1 • Yuhang Zhao1 • Jianchang Liu1 • Hongru Li1

Received: 10 August 2020 / Accepted: 7 October 2020


Ó Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract
As continuous glucose monitoring (CGM) systems provide critical feedback information of blood glucose concentration to
the artificial pancreas for patients with type 1 diabetes (T1D), faults in CGM may seriously affect the computation of
insulin infusion rates which can lead to fatal consequences accompany with hypoglycemia or hyperglycemia. In the present
work, the k-medoids clustering algorithm is modified by calculating cluster number with a Bayesian Information Criterion
(BIC)-based cost function and the SAC (SSE-ASW Criterion) evaluation coefficient which considers both SSE (Sum of
Square due to Error) and ASW (Average Silhouette Width) criteria. Then, the modified k-medoids clustering algorithm is
proposed to detect sensor failures online with CGM measurements. Different from the qualitative model-based methods
and quantitative model-based methods, sufficient clean data are the only requirement of the proposed method. During
online monitoring, the new glycemic variability is then tracked against predefined confidence limits during training period
to indicate abnormality. The feasibility of the proposed method is successfully assessed using CGM data collected from the
UVa/Padova metabolic simulator.

Keywords Type 1 diabetes  Continuous glucose monitoring  k-medoids clustering  Abnormality detection

1 Introduction Fault detection algorithms can be generally divided into


three groups: qualitative model-based methods [9], quan-
Continuous glucose monitoring (CGM) sensors are critical titative model-based methods [10–13] and process histori-
components in the administration of amount of insulin cal data-based methods [14]. Qualitative methods depend
injected to patients with type 1 diabetes (T1D), especially on the expert systems which contain quantities of if–then-
in the closed-loop artificial pancreas (AP) [1–7] control else rules. The rules are designed to mimic the decision
systems which are expected to regulate T1D patients’ process of human experts, and as the expanding of
blood glucose concentration (BGC) automatically. How- knowledge, the number of if–then-else rules will rapidly
ever, the CGM sensors may not be able to provide accurate increase, meanwhile, some rules may conflict with each
information of the actual BGC as a result of losing of other. Usually the rules are totally based on the phe-
sensitivity, dislodging, interruption in signal transmission nomenons and correlative solutions without considering the
and pressure on sensor patch area [8]. Erroneous CGM underlying physical mechanism of the system, and may fail
readings may lead to wrong insulin dose, which frequently in cases where a new condition is presented that is not
may cause hypoglycemia or hyperglycemia. Hence, defined in the knowledge database. On the other hand, a
detection of faults and abnormalities in CGM measure- quantitative model-based fault detection method [11–13]
ments is significant for AP systems. can be developed based on first principles such as material
and energy balances [15]. However, the cost of the model
may be high as the model may be so complex and large that
it is difficult for the computer to run the model and monitor
& Xia Yu
yuxia@ise.neu.edu.cn the CGM measurements within an acceptable time as more
details are added. In contrast to both qualitative model-
1
College of Information Science and Engineering, based methods and quantitative model-based method, in
Northeastern University, Shenyang, China

123
Neural Computing and Applications

process historical data-based methods, only the availability collected and may introduce further errors. Therefore, only
of large number of data is required. using blood glucose data is of great effectivity in sensor
Among historical data-based methods, support vector faults detection [37]. Generally, partition clustering algo-
machine (SVM) [16, 17], wavelet [18], Kalman filter [19], rithm like k-means algorithm and k-medoids algorithm is
kernel density-based stochastic model [20], rates of chan- combined with feature extraction methods like PCA, SVM
ges threshold-based model [21] and principal component [38, 39] in cases where the features of data are not easily
analysis (PCA) methods [17, 22] have been tested for observed. The glycemic variability which is one of the
detection of CGM readings. A discrete wavelet-transform- features [18, 21, 40] that are used to detect CGM failures
based online fault detection method [18] is developed to can be generated from CGM measurements easily so that
figure out CGM errors where the CGM signal is decom- an advanced feature extraction method is not necessary
posed, but the sensitivity of wavelet method might decrease while classifying BGC values and detecting CGM failures.
rapidly when the magnitude of errors is small. The pro- In this paper, a method with dual control limits that
posed Kalman filter [19] and rates of changes threshold- combines the k-medoids clustering algorithm and the
based model [21] methods require a comparable accurate Bayesian Information Criterion (BIC)-based cost function
prediction of CGM readings; however, the predictive errors is developed for detection of CGM-related faults. The
usually increase [23] as the increased prediction horizon in k-medoids clustering algorithm is used to split the history
cases the sensor errors last for a long time. A kernel den- CGM measurements into a number of clusters according to
sity-based stochastic modeling technique [20] is used to the glycemic variabilities. However, the number of clusters
detect CGM failures and the sensitivity might be improved that is usually decided according to prior information
as the accumulation of CGM readings while the iteration which may have significant influence on clustering results.
time will increase and is proportion to the size of the Thus, a cost function based on BIC is designed to calculate
database. SVM [16, 17] and PCA-based techniques [17, 22] the cluster number adaptively. Due to the randomness of
are used for detection of CGM sensor errors, multi-sensor the initialization of k-medoids, it is easy to cause local
signals are required to classify the CGM signals into dif- optimality [41]. Here we have added the SAC (SSE-ASW
ferent groups. Criterion) evaluation coefficient to eliminate unreasonable
For traditional data-driven model-based methods, most clustering results and avoid local optimality.
of them use the trend of blood glucose to extract features The rest of this paper is structured as follows. In Sec-
and identify the faults [24–26], but they always cannot tion II, the modified k-medoids algorithm and its applica-
distinguish characteristics between glucose fluctuations and tion in CGM fault detection is presented. Section III shows
abnormal measurements. Generally, data-driven methods the fault detection results and analysis the performance of
often need to estimate the corresponding measurement and the proposed method. The results are discussed in Sec-
residual by one-step ahead blood glucose prediction mod- tion IV and conclusions are provided in Section V.
els, then detect failures by thresholds or certain rules
[15, 27, 28]. One limitation is that these models often rely
heavily on a large amount of historical data and their 2 Methodology
prediction accuracy are sensitive on the size and distribu-
tion of the data set. In contrast, as an unsupervised learning 2.1 K-medoids clustering algorithm
method, clustering is no needs to label the specific classi-
fication of faults and the process of abnormality detection K-medoids clustering algorithm is one of the simplest and
is independent with measurements estimation. The advan- most typical unsupervised learning algorithms which is
tages and limitations of clustering method and traditional widely used in data clustering [42]. For a given data set
data-driven methods are shown in Table 1. X ¼ fx1 ; x2 ; . . .; xN g, where xi ¼ ½xi1 ; xi2 ; . . .; xid T 2 Rd ;
Recently, clustering methods are widely used in the i ¼ 1; 2; . . .; N, the goal of k-medoids clustering algorithm
fields of fault detection [29–31] except in CGM fault is to separate the finite unlabeled data set X into a finite and
detection. K-means is a commonly used algorithm in data disjoint K ðK\N Þ sets with hidden data structures rather
mining [32, 33] which has advantages of simple principles, than provided an accurate characterization of the unob-
easy implementation and low computation complexity. It is served samples generated from the same probability dis-
a simple clustering method which is similar with k-means tribution [43, 44] by minimizing the function F:
algorithm but more robust with noise [34, 35]. In addition,
k-medoids algorithm can also converge quickly within few X
K X
ki  
F ðK Þ ¼ dis xij ; Ci ð1Þ
iterative steps, and easy to be implemented [29, 31, 36] . i¼1 j¼1
On feature selection, multiple variables, such as meal
intakes and exercise information, may not easily to be where Ci and xij are cluster center and jth element of the ith

123
Neural Computing and Applications

Table 1 Advantages and limitations of abnormality detection based on clustering and traditional data-driven model-based methods
Clustering Traditional data-driven model

Advantages No need for exact classification label of each CGM sensor Many well-known methods could be used to obtain measurements
failure; estimation;
Independent of glucose dynamic model;
Simple and easy for application;
Limitations Sensitive to initialization parameters; Requires a large amount of historical data;
Possibility to converge to local optima. Sensitive to the size and distribution of the data set;
Complexity and low efficiency.

cluster, respectively, ki is the number of elements in the ith consistency in choosing the number of components [47]
 
cluster, dis xij ; Ci stands for the distance between xij and and satisfactory performance in a number of applications.
Ci , Euclidean distance is chosen in this paper. As the The Bayes factor is shown as follows [48, 49]:
purpose of k-medoids clustering algorithm is to minimize BIC ¼ 2loglik þ mlogðN Þ ð4Þ
the distance between samples and its cluster center in each
set, the function F can be simplified as: where m is the number of model parameters and N is the
number of samples, and loglik is the maximized log-likelihood:
X
K
F ðK Þ ¼ di ð2Þ X
N
i¼1 loglik ¼ logPrh^ðxi Þ ð5Þ
i¼1
where di is the maximum distance between samples in the
ith cluster and its cluster center. The cluster center Ci is where Prh ðxi Þ is densities for xi (contains the ‘‘true’’ den-
calculated by the following function: sity), h^ is the maximum-likelihood estimate of h.
In the Gaussian model [50], it is assumed that the
X
ki
Ci ¼ arg min disðxim ; xin Þ; ðm; n ¼ 1; 2; . . .; ki Þ ð3Þ variance r22 is known, and 2loglik is equal to the squared
m¼1;m6¼n error loss rN2 err. Therefore:
2
 
where xim and xin are elements in the ith cluster. N m logðN Þ
BIC ¼ 2 err þ r22 ð6Þ
The k-medoids clustering algorithm is summarized as r2 N
follows:
Step 1. Initial cluster centers randomly; As the increasing of total number of samples, the term
logðN Þ
Step 2. Locate each sample to its closest cluster center; N will decrease slowly and then ignore the previous
Step 3. Update cluster centers according to Eq. (3); coefficients. That is reasonable because of the fact that
Step 4. Repeat Step 2 and Step 3 until cluster centers do there may be more classes of data contained in a larger
not change any more or up to the largest running steps. database.
The clustering problem described in Eq. (2) can be
2.2 The modified k-medoids clustering modified with the BIC factor and is defined as:
algorithm !
X m
2 m logðN Þ
V ðmÞ ¼ log ð di Þ þ a ð7Þ
In k-medoids clustering method, the number of clusters has i¼1
N
a considerable influence on clustering performance, a big where m represents the number of clusters, a stands the
cluster number may lead to over-clustering where different regular factor and is set to be m in this paper to make the
groups of data share the same characteristics; on the other maximum distance between samples and cluster centers
hand, data in the same group may have different charac- can still have a significant contribution to the penalty.
teristics when the cluster number is small. The uncertain
When the samples are sufficient, the value of logNðN Þ
factors like carbohydrate intakes and exercises influence
the glucose dynamic a lot, which in turn make it difficult to changes little, so each possible number of clusters is
choose the optimal number of clusters. All above is known clustered. BIC is a penalty factor, and the larger VðmÞ is,
as cluster analysis problem [45]. the farther the sample in the class deviates from its clus-
The BIC is one of the most popular criterion for model tering center, and the worse the clustering effect is.
selection [46] in the EDGMM family due to its theoretical Accordingly, the number of clusters is also determined.

123
Neural Computing and Applications

Besides above modification, the k-medoids method still Through the comprehensive consideration of compact-
does not solve the problem of ending in local optima. As ness and separation, when the SAC coefficient is the
multiple clusters can often avoid local optimization, z times smallest, it proves that the clustering effect is better. The
repeated clustering is considered to find the optimal clus- modified k-medoids algorithm is summarized as follows:
tering mode according to certain criteria [41]. Step 1. Determine the maximum possible cluster num-
There are two principles for evaluating clustering and ber M;
selecting the optimal clustering mode: compactness, which Step 2. Set the maximum clustering times Z;
reflects the distribution of samples within the same cluster, Step 3. For each potential cluster number mð1\m  M Þ:
and separation degree, which reveals the distances between Step 3.1 For each cluster time zð1\z  ZÞ under the
different groups. When a data set is well-clustered, the above conditions:
members of a cluster must be as close to each other as Step 3.1.1 Split the N samples into m clusters using
possible, and the distance between two clusters should be k-medoids clustering algorithm;
as far as possible. We choose Sum of Square due to Error Step 3.1.2 Calculate SAC evaluation coefficient of the
(SSE) criterion to express the compactness and Average current situation with Eqs. (8)–(11);
Silhouette Width (ASW) [51] criterion to represent the Step 3.2 Select the situation with minimum SAC eval-
separation degree, so as to evaluate whether the clustering uation coefficient to continue;
algorithm has reached the optimal. And the SAC evaluation Step 3.3 Calculate V ðmÞ described as Eq. (7);
criterion is defined as: Step 4. Calculate the number of clusters k ¼
SSE argminðV ðmÞÞ.
SAC ¼ ð8Þ m
ASW
where SSE is a common error evaluation parameter. In 2.3 CGM fault detection using modified
order to express the compactness, the best way is to cal- k-medoids clustering algorithm
culate the sum of the distances between the sample points
and the cluster center in each class. The sum of the dis- The variability of BGC that responses differently to the
tances is the value of SSE as follows: carbohydrate intakes, exercises and insulin injection is a
reasonable character used to split CGM measurements into
X
m X
SSE ¼ jp  Ci j2 ð9Þ different clusters [40, 52]. Further, the BGC can be con-
i¼1 p2classi sidered as linear in a short interval because the effects of
distributions including carbohydrate intakes, exercises and
where p represents the point xij in the class i and Ci rep- so on usually last for a while. Thus, the BGC variability
resents the position of the center point of the class i. kðtÞ in the interval na can be calculated as:
The value of ASW is calculated as follows:
X
na
X
N kðtÞ ¼ arg min ðyi  y^i Þ2
ASW ¼ SCi ð10Þ k i¼1 ð12Þ
i¼1
y^i ¼ kxi þ b
where silhouette coefficient (SC), proposed by Rousseeuw
where i ¼ 1; 2; . . .; na , y is the CGM measurements.
[51], is applied to estimate the clustering effect by ana-
In order to better reflect the fluctuation of blood glucose,
lyzing cohesion and separation. The specific steps are as
derivative characteristics are selected in this work. The
follows: first, calculate the average distance between the
distance v of the BGC variability vector x which is defined
sample i and other samples in the same subset to get the
as the first-order derivation of BGC measurements y in the
cluster cohesion aðiÞ; secondly, calculate the average dis-
na interval from the cluster center Ci is normally dis-
tance bij between the sample i and each sample point in
tributed, where V  N ðl; r2 Þ. The first derivative formula
other subsets Cj , and take the minimum value of the
is defined as follows:
average distance set as the cluster separation degree bðiÞ;
then, calculate SC of the sample i according to the cohesion yðx0 þ hÞ  yðx0  hÞ
y0 ¼ ð13Þ
aðiÞ and segregation bðiÞ as follows: 2h
bðiÞ  aðiÞ Thus, we can get the first control chart S where only the
SCi ¼ ð11Þ upper limit is considered:
maxðaðiÞ; bðiÞÞ
S ¼ l þ mr ð14Þ
where the range of SC is [- 1, 1]. The closer the value is to
1, the more suitable the node is to be in its current subset In this paper, a 95.2% confidence interval is considered
rather than other adjacent subsets. as the training data are clean data without faults. For the

123
Neural Computing and Applications

given N samples that are split into k clusters, the first


control chart is constructed as:
S ¼ ½S1 ; S2 ; . . .; SK T ð15Þ
In the well-clustered database, the distance vnew between
correct classified new normal sample and its cluster center
is under the control chart S, while in cases the correct
classified new sample is abnormal, distance vnew is upper
than the control chart S.
However, the first-order derivative is a feature that
measures linear changes, and blood glucose is a non-sta-
tionary signal. The new sample is easy to be misclassified
by only using the first-order derivative feature, so it is
useful to calculate the second-order derivative feature, and
the calculation process is the same as first-order derivative.
The distribution of distance of the second derivative sam-
ple from its cluster center is the Gaussian distribution,
D  Nðh; f2 Þ. Like the first derivative, the second deriva-
tive formula is defined as:
yðx0 þ hÞ  2  yðx0 Þ þ yðx0  hÞ
y00 ¼ ð16Þ Fig. 1 The proposed CGM fault detection method based on modified
h2
k-medoids clustering algorithm
Therefore, we can calculate the second control variable
T as the upper limit:
T ¼ ½T1 ; T2 ; . . .; TK T 75% of adults. The first two days’ data (fault free) are used
ð17Þ to train the cluster, the last four days’ data with different
Ti ¼ hi þ m1i ði ¼ 1; . . .; kÞ
types of errors are used to evaluate the performance of the
The distance dnew of new sample from its cluster center proposed fault detection method based on modified k-me-
that contains faults goes far beyond T, especially in the doids clustering algorithm.
case that the new sample is misclassified.
The proposed modified k-medoids clustering algorithm- 3.1 CGM sensor error generation
based CGM fault detection method with dual control chats
is shown in Fig. 1. Once the statistic limits S and T are Faults in CGM sensors are mainly caused by sensor-re-
defined, for a new observation vector ynew , a fault is ceiver connection problem and biomechanical issues of the
detected if any of the following conditions is satisfied: sensor-tissue interface, such as motion of the patients, scar
Si \vnew ð18Þ tissue growth or degradation of sensor materials, and
pressure-induced sensor attenuation (PISA) [21, 54, 55]. In
Ti \dnew ð19Þ
order to demonstrate the effectiveness of the proposed fault
where Si and Ti are the two statistic limitations, detection method, several types of sensor faults with dif-
respectively. ferent ratios of disturbances are considered: spike [54],
drift changes, step changes [19] and PISA [55]. The fol-
lowing relations are used for error generation:
3 Results 1) Spike:

In this work, the data used to evaluate the proposed fault ge ðkÞ ¼ gðkÞ þ Die Me gðkÞ ð20Þ
detection method based on modified k-medoids clustering 2) Drift:
algorithm are generated with a five-minute sampling time
tested on 30 virtual patients (10 adolescents, 10 adults and ½ge ðkÞ; ge ðk þ 1Þ; . . .; ge ðk þ Due1 Þ
10 children) using the UVa/Padova metabolic simulator ¼ ½gðkÞ; gðk þ 1Þ; . . .; gðk þ Due1 Þ
ð21Þ
[53]. The duration is 6 days and different meal plan of each þ Die Me gðkÞ½1; 2; . . .; Due =Due
day for both adults and adolescents is shown in Table 2, s:t:Due 2 ½2; 3; 4; 5
while the amount of each meal for children is readjusted to

123
Neural Computing and Applications

Table 2 Meal plan for adults


Meal (time/calorie) Day1 Day2 Day3 Day4 Day5 Day6
and adolescents
Breakfast 09.45/48 09.10/55 09.00/40 09.25/45 09.00/50 09.30/48
Lunch 13.30/47 13.45/70 14.00/68 13.20/54 13.55/75 14.20/68
Dinner 17.45/75 18.00/65 18.20/75 17.40/70 18.20/60 18.20/70
Snack 21.30/31 22.00/20 22.30/25 21.10/31 – 22.00/25

3) Step: For online detection, a slide window is used to contain


the features, which leads to the residual fault information
½ge ðkÞ; ge ðk þ 1Þ; . . .; ge ðk þ Due1 Þ
of the previous iteration is still considered as a fault fea-
¼ ½gðkÞ; gðk þ 1Þ; . . .; gðk þ Due1 Þ þ Die Me gðkÞ ture. As a consequence, the false positive rate becomes
s:t:Due 2 ½2; 3; 4; 5 increasing under the situation of so-called hysteresis effect.
ð22Þ The summary of results for different types of CGM
faults is shown in Table 3 compared with the online fault
4) PISA: detection method based on standard k-medoids without
ge ð k þ t Þ ¼ optimization, where the formula for computing percent-
8    ages, successful detection ratio and false detection ratio
> 5t D
>
> g ð k þ t Þ  M e  1  exp ift  are:
>
> s 5
>
>
>
<    NF ¼ TP þ FN ð24Þ
5t þ D
gðk þ tÞ þ Me  1  exp TP
>
> s S¼ ð25Þ
>
>   
>
> NF
>
>  M  1  exp 5t if D \t\Du
: e e where NF denotes the total number of faults in CGM
s 5
D þ 3s measurements, TP represents the number of faults which is
s:t:Due ¼ ; t 2 ½1; Due ; s successfully detected, and FN denotes the faults that are
5
not detected.
2 ½5; 10; 15; 20; D 2 ½15; 20; 25; 30
As shown in Table 3, for these four typical CGM sensor
ð23Þ errors, the detection results of the proposed method are
acceptable. But when the duration of errors increases, the
where ge ðkÞ is the fault CGM measurement generated detection sensitivity may decrease because the errors may
from the original CGM value gðkÞ. All types of sensors happen coincident with normal signal patterns, like PISA
failures are added to data randomly and all Due , Die , d and (Fig. 2d). Spike and step changes (Fig. 2a and c) also have
Me are randomly picked within their range. good sensitivity as the abnormal signal values are always
contained within the interval where S statistic usually
3.2 CGM sensor fault detection increases rapidly. For drift change (Fig. 2b), the sensitivity
is good as well, but is comparably lower than the sensi-
Samples from 30 patients in UVa/Padova simulator are tivity for spike and step change. It is because that the error
collected to evaluate the proposed method. In the training occurred with the patterns as the normal signal behaviors
process, the number of iterations of the clustering algo- when the BGC values increase or decrease.
rithm is set to 25 to find the optimal clustering result. In the
test process, each kind of CGM sensor errors is tested
separately. For each test data set, errors which are inde- 4 Discussion
pendent from each other are added randomly to each of the
original CGM readings. The examples of fault detection In this study, the modified k-medoids clustering method is
results are shown in Fig. 2. developed for monitoring and detecting failures in CGM
In the next test process, the online fault detection result measurements. The proposed method does not need any
of the continuous blood glucose monitor sensor is shown in physiologic or physical information but only a sufficient
Fig. 3. As can be seen from the figure, the detection data set which is known to have no faults. The proposed
accuracy is satisfying. But because meals and other factors method has been validated using data collected from UVa/
will change the volatility of the patient’s blood glucose, the Padova simulator, which is approved by Food and Drug
blood glucose will suddenly rise or fall, which will cause Administration (FDA) and considered as an accepted
some false positives to the algorithm. method for in silico testing before clinical trials.

123
Neural Computing and Applications

(A) (B)

(C) (D)

Fig. 2 Examples of online CGM sensor fault detection a Spike, b. Drift change, c Step change, d PISA

The proposed method contains a BIC-based cost func- and detecting faults in CGM readings. And a reconstitution
tion and a dual control limit. The number of clusters is algorithm can be easily added to the proposed method and
considered as model complexity which can be decided by will surely improve the sensitivity of the proposed method
BIC principle. Thus, a new cost function that contains BIC as most of the CGM sensor errors will turn to be a spike
factor is proposed. By calculating the proposed cost func- which is comparably easy to be detected with different
tion, the number of clusters can be adjusted adaptively magnitude, decrease the false detection ratio as well.
without knowing exact number of glycemic dynamics The result indicates that the proposed method is able to
which is significant difficult to be decided with prior detect most of the CGM relevant failures with a high
information. The derivative of glycemic dynamics in each accuracy. The effects of false in insulin injection will be
sample is statistic analyzed and its distribution is set up as decreased with such a fault detection method and the fatal
the first control limit. To enforce the fault detection ability consequences caused by hypoglycemia and hyperglycemia
of the proposed method, the second derivative of glycemic will be reduced by warning the patients with T1D about the
dynamics is continuously selected as the second control CGM failures in manual operation or integrated in an AP
limit. For further studies, more signals like carbohydrate control system as soon as the faults are detected.
intakes and exercises statements can be easily added to the In the previous section, we mentioned the hysteresis
proposed method while classifying the glycemic dynamics effect. It is more obvious in fault diagnose than in fault

123
Neural Computing and Applications

Fig. 3 Long-term online fault detection

Table 3 Performance of the proposed fault detection method 5 Conclusion


Methods Modified k-medoids Standard k-medoids
K-medoids clustering algorithm with automatically adjus-
Types Peak Drift Steps PISA Peak Drift Steps PISA ted cluster number is developed for detection of failures in
NF 190 698 856 2430 190 712 865 2508 CGM readings using data generated from the UVa/Padova
TP 178 595 806 2126 164 493 744 1766 simulator. The proposed fault detection method can iden-
FN 12 103 50 304 26 219 121 742 tify CGM-related faults with a high accurate and thus can
S(%) 93.6 85.2 94.2 87.5 86.3 69.2 86.0 70.4 increase the safety of the patients with T1D who admin-
istrate the BGC with AP by warning the control system or
Bold values are better S(%) performance compared with the other
the patient directly. One limitation of this study is that we
fault detection method
have no conditions to verify the proposed method in the
real-world clinical condition, which means the method has
detection. The hysteresis effect will be changed with the
to take a supplementary trial for the final application in the
size of the slide window. Generally speaking, the smaller
near future. Further study will focus to complete the
the window, the weaker the hysteresis effect. However, for
experiment in clinical and to reduce the hysteresis effect,
drift and PISA, the duration is relatively long, there will not
which would further improve the accuracy and reduce the
be enough fault information to cause missing report.
false positives.
Therefore, this is a contradictory problem. Here, we pro-
pose some solutions as reference, and also as the future Acknowledgements This work was supported by the National Natural
study direction: First of all, some clinical rules can be Science Foundation of China No.61903071 and No.61973067.
added to deal with the hysteresis effect. For example, each
fault can be reported up to a certain duration, and then the
fault will not alarm again. Second, the time series infor- References
mation can be weighted, where the new dynamics infor-
mation has higher weight than the historical feature 1. Eric R, Jerome P, Martin C, Hugues C, Palerm CC (2010)
Closed-Loop insulin delivery using a subcutaneous glucose sen-
information. Third, integrating other algorithms or fusion sor and intraperitoneal insulin delivery. Diabetes Care
other features seems to be a feasible solution to detect the 33(1):121–127
short-term faults, which would complement long-term 2. Elkhatib FH, Russell SJ, Nathan DM, Sutherlin RG, Damiano ER
faults detection based on clustering algorithms. (2010) A bihormonal closed-loop artificial pancreas for type 1
diabetes. Sci Transl Med 2(27):27

123
Neural Computing and Applications

3. Hovorka R et al (2010) Manual closed-loop insulin delivery in 22. Turksoy K, Quinn L, Littlejohn E, Cinar A (2015) Monitoring
children and adolescents with type 1 diabetes: a phase 2 ran- and fault detection of continuous glucose sensor measurements.
domised crossover trial. Lancet 375(9716):743–751 In: American Control Conference
4. Elleri D et al (2013) Closed-loop basal insulin delivery over 36 23. Freckmann G, Pleus S, Link M, Haug C (2016) Accuracy of BG
hours in adolescents with type 1 diabetes: randomized clinical meters and CGM systems: possible influence factors for the
trial. Diabetes Care 36(4):838–844 glucose prediction based on tissue glucose concentrations. In:
5. Cinar A (2018) ‘‘Artificial Pancreas Systems an introduction to Kirchsteiger H, Jørgensen JB, Renard E, del Re L (eds) Predic-
the special issue,’’ (in English). IEEE Control Syst Mag tion methods for blood glucose concentration. Springer, Cham,
38(1):26–29 Heidelberg, New York, Dordrecht, London
6. Cinar A (2017) Multivariable adaptive artificial pancreas system 24. Song G, Zhao C, Sun Y (2016) A classification-based fault
in type 1 diabetes. Curr Diab Rep 17(10):88 detection method for Continuous glucose monitoring (CGM). In:
7. Turksoy K, Bayrak ES, Quinn L, Littlejohn E, Cinar A (2013) 2016 12th World Congress on Intelligent Control and Automation
Multivariable adaptive closed-loop control of an artificial pan- (WCICA), pp 956–961
creas without meal and activity announcement. Diabetes Technol 25. Zhao C, Fu Y (2015) Statistical analysis based online sensor
Ther 15(5):386–400 failure detection for continuous glucose monitoring in type I
8. Bequette BW (2014) Fault detection and safety in closed-loop diabetes. In: in en, Chemometrics and Intelligent Laboratory
artificial pancreas systems. J Diabetes Sci Technol Systems, vol. 144, pp. 128–137, 05/2015
8(6):1204–1214 26. Song G, Zhao C (2017) An effective fault detection method with
9. Venkatsubramanian V (2003) A review of process fault detection FDA classifier and global model for continuous glucose monitor
and diagnosis, Part II: qualitative models and search strategics. (CGM). In: 2017 36th chinese control conference (CCC),
Comput Chem Eng 27(3):313–326 pp 7448–7453
10. Venkatasubramanian V, Rengaswamy R, Yin K, Kavuri SN 27. Mahmoudi Z, Nørgaard K, Poulsen NK, Madsen H, Jørgensen JB
(2003) A review of process fault detection and diagnosis: part I: (2017) Fault and meal detection by redundant continuous glucose
Quantitative model-based methods. Comput Chem Eng monitors and the unscented Kalman filter. (in en), Biomedical
27(3):293–311 Signal Processing and Control, vol. 38, pp 86–99, September 1,
11. Zhao H, Zhao C (2016) An automatic denoising method with 2017
estimation of noise level and detection of noise variability in 28. Turksoy K, Hajizadeh I, Littlejohn E, Cinar A (2017) Multi-
continuous glucose monitoring. Ifac Papersonline 49(7):785–790 variate statistical monitoring of sensor faults of a multivariable
12. Zhao H, Zhao C, Gao F (2018) An automatic glucose monitoring artificial pancreas. In: (in en), IFAC-PapersOnLine, vol. 50, no. 1,
signal denoising method with noise level estimation and pp 10998–11004, July 1, 2017
responsive filter updating. Biomed Signal Process Control 29. Ahmad B, Jian W, Ali ZA, Tanvir S, Khan MSA (2019) Hybrid
41:172–185 anomaly detection by using clustering for wireless sensor net-
13. Feng J, Turksoy K, Samadi S, Hajizadeh I, Littlejohn E, Cinar A work. Wirel Personal Commun 106(4):1841–1853
(2017) Hybrid online sensor error detection and functional 30. Jiang H, Wu Y, Lyu K, Wang H (2019) Ocean data anomaly
redundancy for systems with time-varying parameters. J Process detection algorithm based on improved k-medoids. In: 2019
Control 60:115–127 Eleventh International Conference on Advanced Computational
14. Venkatasubramanian V, Rengaswamy R, Kavuri SN, Yin K Intelligence (ICACI), pp 196–201
(2003) ‘‘A review of process fault detection and diagnosis Part 31. Rustam Z, Talita AS, Mart T, Triyono D, Sugeng KA (2017)
III: process history based methods,’’ (in English). Comput Chem Fuzzy Kernel k-Medoids algorithm for anomaly detection prob-
Eng 27(3):327–346 lems. AIP Conf Proc 1862(1):030154
15. Turksoy K, Roy A, Cinar A (2017) ‘‘Real-time model-based fault 32. Wu X et al (2008) Top 10 algorithms in data mining. Knowl Inf
detection of continuous glucose sensor measurements,’’ (in Syst 14(1):1–37
English). IEEE Trans Biomed Eng 64(7):1437–1445 33. Ji C, Zou X, Liu S, Pan L (2020) ADARC: An anomaly detection
16. Brown J (2008) Using support vector machines to detect thera- algorithm based on relative outlier distance and biseries corre-
peutically incorrect measurements by the MiniMed CGMS. lation. Softw Practice Exp 50(11):2065–2081
J Diabetes Sci Technol 2(4):622–629 34. Madhulatha TS (2012) An overview on clustering methods. IOSR
17. Leal Y, Ruiz M, Lorencio C, Bondia J, Mujica L, Vehi J (2013) J Eng 2(4):719–725
Principal component analysis in combination with case-based 35. Azar AT, El-Said SA, Hassanien AE (2013) Fuzzy and hard
reasoning for detecting therapeutically correct and incorrect clustering analysis for thyroid disease. Comput Methods Pro-
measurements in continuous glucose monitoring systems. grams Biomed 111(1):1–16
Biomed Signal Process Control 8(6):603–614 36. Park H-S, Jun C-H (2009) A simple and fast algorithm for
18. Quan S, Qin SJ, Doniger KJ (2010) Online dropout detection in K-medoids clustering. Expert Syst Appl 36(2):3336–3341
subcutaneously implanted continuous glucose monitoring. In: 37. Kölle K, Fougner AL, Frelsøy Unstad KA, Stavdahl Ø (2018)
American Control Conference Fault detection in glucose control: is it time to move beyond
19. Zhao CH, Fu YJ (2015) ‘‘Statistical analysis based online sensor CGM data? IFAC-Papers OnLine 51(27):180–185
failure detection for continuous glucose monitoring in type I 38. Manikandan RPS, Kalpana AM, Naveenapriya M (2016) Outlier
diabetes,’’ (in English). Chemometr Intell Lab Syst 144:128–137 analysis and Detection using K-medoids with support vector
20. Matthew S, Le CA, Harris DL, Weston PJ, Harding JE, Geoffrey machine. In: International Conference on Computer Communi-
CJ (2012) Using stochastic modelling to identify unusual con- cation & Informatics
tinuous glucose monitor measurements and behaviour, in new- 39. Chitrakar R,Huang C (2013) Anomaly detection using Support
born infants. Biomed Eng 11(1):45 Vector Machine classification with k-Medoids clustering. In:
21. Baysal N, Cameron F, Buckingham BA, Wilson DM, Bequette Asian Himalayas International Conference on Internet
BW (2013) Detecting sensor and insulin infusion set anomalies in 40. Feng J et al (2018) Hybrid online multi-sensor error detection and
an artificial pancreas. In: American Control Conference functional redundancy for artificial pancreas control systems.
IFAC-PapersOnLine. https://doi.org/10.1016/j.ifacol.2018.09.289

123
Neural Computing and Applications

41. Ma R, Angryk R (2017) Distance and density clustering for time series in statistics. Springer, New York. https://doi.org/10.1007/
series data. In: 2017 IEEE International Conference on Data 978-0-387-84858-7
Mining Workshops (ICDMW), pp 25–32 51. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the inter-
42. Rokach L (2009) A survey of clustering algorithms. Data Min pretation and validation of cluster analysis. J Comput Appl Math
Knowl Disc Handbook 16(3):269–298 20:53–65
43. Baraldi A, Alpaydin E (2002) Constructive feedforward ART 52. Aghabozorgi S, Seyed Shirkhorshidi A, Ying Wah T (2015)
clustering networks. I. Neural Netw IEEE Trans 13(3):645–661 Time-series clustering—A decade review. Inf Syst 53:16–38
44. Lordo RA (2012) Learning from data: concepts, theory, and 53. Dassau E, Palerm CC, Zisser H, Buckingham BA, Jovanovic L,
methods. Technometrics 43(1):105–106 Doyle FJ (2009) In silico evaluation platform for artificial pan-
45. Fraley C, Raftery AE (1998) How many clusters? Which clus- creatic beta-cell development–a dynamic simulator for closed-
tering method? Answers via model-based cluster analysis. loop control with hardware-in-the-loop. Diabetes Technol Ther
Comput J 41(8):578–588 11(3):187–194
46. Zhao J, Jin L, Shi L (2015) Mixture model selection via hierar- 54. Facchinetti A, Del FS, Sparacino G, Cobelli C (2013) An online
chical BIC. Comput Stat Data Anal 88:139–153 failure detection method of the glucose sensor-insulin pump
47. Keribin C (2000) Consistent estimation of the order of mixture system: improved overnight safety of type-1 diabetic subjects.
models. Sankhyā Indian J Stat Ser A (1961–2002) 62(1):49–66 IEEE Trans Biomed Eng 60(2):406–416
48. Ljung L (2002) System identification: theory for the user. Tsin- 55. Facchinetti A, Del FS, Sparacino G, Cobelli C (2016) Modeling
ghua University Press, Beijing, pp 9–11 transient disconnections and compression artifacts of continuous
49. Anděl J, Perez MG, Negrao AI (1981) Estimating the dimension glucose sensors. Diabetes Technol Therapeut 18(4):264–272
of a linear model. Kybernetika -Praha- 17(6):514–525
50. Hastie T, Tibshirani R, Friedman J (2009) The elements of sta- Publisher’s Note Springer Nature remains neutral with regard to
tistical learning: data mining, inference, and prediction, 2nd edn. jurisdictional claims in published maps and institutional affiliations.
In: Bühlmann P, Diggle P, Gather U, Zeger S (eds) Springer

123

You might also like