Energies 15 07978

energies
Article
Improved Semi-Supervised Data-Mining-Based Schemes for
Fault Detection in a Grid-Connected Photovoltaic System
Benamar Bouyeddou 1,2 , Fouzi Harrou 3 , Bilal Taghezouit 4,5 , Ying Sun 3 and Amar Hadj Arab 4, *
1 LESM Lab., Faculty of Technology, University of Saida-Dr Moulay Tahar, Saida 20000, Algeria
2 STIC Lab., Department of Telecommunications, Abou Bekr Belkaid University, Tlemcen 13000, Algeria
3 Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah
University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
4 Centre de Développement des Energies Renouvelables (CDER), B.P. 62, Route de l’Observatoire,
Algiers 16340, Algeria
5 Laboratoire de Dispositifs de Communication et de Conversion Photovoltaique, Ecole Nationale
Polytechnique Alger, Algiers 16200, Algeria
* Correspondence: a.hadjarab@cder.dz
Abstract: Fault detection is a necessary component to perform ongoing monitoring of photovoltaic

plants and helps in their safety, maintainability, and productivity with the desired performance.
In this study, an innovative technique is introduced by amalgamating Latent Variable Regression
(LVR) methods, namely Principal Component Regression (PCR) and Partial Least Square (PLS), and
the Triple Exponentially Weighted Moving Average (TEWMA) statistical monitoring scheme. The
TEWMA scheme is known for its sensitivity to uncovering changes of small magnitude. Nevertheless,
TEWMA can only be utilized for monitoring single variables and ignoring the correlation among
monitored variables. To alleviate this difficulty, the LVR methods (i.e., PCR and PLS) are used
as residual generators. Then, the TEWMA is applied to the obtained residuals for fault detection
purposes, where the detection threshold is computed via kernel density estimation to improve its
Citation: Bouyeddou, B.; Harrou, F.;
Taghezouit, B.; Sun, Y.; Hadj Arab, A.
performance and widen its applicability in practice. Real data with different fault scenarios from a
Improved Semi-Supervised 9.54 kW photovoltaic plant has been used to verify the efficiency of the proposed schemes. Results
Data-Mining-Based Schemes for revealed the superior performance of the PLS-TEWMA chart compared to the PLS-TEWMA chart,
Fault Detection in a Grid-Connected particularly in detecting anomalies with small changes. Moreover, they have almost comparable
Photovoltaic System. Energies 2022, performance for large anomalies.
15, 7978. https://doi.org/10.3390/
en15217978 Keywords: fault detection; photovoltaic systems; data-driven methods; TEWMA; sensor faults; PLS;
Academic Editor: Manolis Souliotis
PCR; dimensionality reduction
Received: 5 October 2022

Accepted: 24 October 2022
Published: 27 October 2022 1. Introduction
Publisher’s Note: MDPI stays neutral The energy supply of the future must be clean, efficient, resource-efficient, and sustain-
with regard to jurisdictional claims in able. Solar Photovoltaic (PV) energy is a strategic option to satisfy environmental concerns
published maps and institutional affil- and meet the growing electricity demand [1]. It offers a safe, affordable, and renewable
iations. solution for generating power on a large scale. The latest statistics show that the installed
PV systems reached a total installed capacity of 942 GWdc (Gigawatt direct current) in
2021 [2]. Although there are advanced PV technologies, PV systems are still confronted with
different types of anomalies in real working conditions, such as short-circuit, open-circuit,
Copyright: © 2022 by the authors.
line-line, and partial shading, that decrease their performance and cause power losses [3].
Licensee MDPI, Basel, Switzerland.
This article is an open access article
Undoubtedly, such unsatisfactory faulty conditions raise undesirable maintenance costs
distributed under the terms and
and inefficiency.
conditions of the Creative Commons
Anomalies and atypical conditions (e.g., dirt, shading) can make the PV system deviate
Attribution (CC BY) license (https:// from the desired performance. Broadly speaking, both internal and external anomalies in PV
creativecommons.org/licenses/by/ systems can degrade their operating performance, collapsing the system and even resulting
4.0/). in severe security issues [4]. Internal faults include line-to-line faults, short/open circuits,
Energies 2022, 15, 7978. https://doi.org/10.3390/en15217978 https://www.mdpi.com/journal/energies

Energies 2022, 15, 7978 2 of 22
and defective components (e.g., generators, cables, and inverters). In addition, external
anomalies are related to some environmental circumstances, like dusty conditions [5],
and subversive acts, such as cyber-attacks (e.g., Denial of services and data integrity),
particularly against on-grid plants [6–8]. In the absence of convenient action, the system
lifetime can be drastically reduced in the long term [9]. Thus, PV systems need to be
continuously monitored for proper functioning, preserving their optimal performances
and making them more efficient, reliable, and sustainable [10].
Detecting anomalies in PV systems is essential to avoid energy drops and keep their
expected performance. Towards this end, numerous anomaly detection techniques have
been developed in the literature last two decades. Such techniques can be grouped into
two principal classes, namely, model-based or data-driven methods [11–17]. Model-based
approaches involve analytical models to predict normal (i.e., fault-free) PV system outputs.
Abnormal measurements are flagged when their deviations from the predictions exceed
the set threshold [13,16,17]. Various model-based approaches have been introduced in
the literature, such as single-diode-based models [18], double-diode-based models [19],
and Kalman filter [20]. For instance, Skomedal et al. [21] compared six monitoring charts,
the conventional Shewhart, three modified versions of CUSUM (Cumulative Sum), and a
modified modification of EWMA (Exponentially Weighted Moving Average), based on an
autoregressive integrated moving average model to detect low-rate losses in sizable grand
serial PV farms. They conclude that the best performance is obtained when using CUSUM-
based charts in this application. In [18], an approach to detect and identify faults in the DC
side of a PV plant is presented. This approach applies the Multivariate EWMA to residuals
obtained from the one-diode-based model for fault detection and the univariate EWMA to
residuals to discriminate the types of detected anomalies. Of course, the efficiency of the
model-based anomaly detection methods depends on the analytical model, which is not
often easy to obtain and time-consuming, particularly for large-scale systems.
Data-driven approaches, on the other hand, exploit available measurements when the
monitored PV system is functioning properly to learn the underlying reference empirical
model. Then, new faulty measurements are uncovered accordingly [22–24]. Data-based
anomaly detection techniques enclose statistical and machine learning-based techniques,
e.g., Support Vector Machine (SVM) and k-Nearest Neighbors (k-NN) algorithm. Within
data-based techniques, in [25], an approach based on Principal Component Analysis (PCA)
is proposed to monitor the DC side of a PV system. More specifically, PCA is applied to
select the most relevant features and generate residuals. Here, residuals are evaluated via
the multivariate charts Hotelling T2 and Squared Predicted Error (SPE) charts for anomaly
detection purposes. This approach showed satisfactory detection performance in detecting
different real anomalies, including short-circuit, inverter connection, and partial shading.
In [26], the PCA approach showed performance in detecting and classifying PV shading
under real conditions. In another study [27], a PCA-based image processing method was
introduced to detect faults in PV tracking systems. Deceglie et al. [28] proposed the Stochas-
tic Rate and Recovery (SRR) approach to compute losses related to the soiling problem.
The SRR defines cleaning and soiling periods using short-circuit current measurements.
Then soiling ratios are used to discriminate soiling profiles, and the resulting losses are
estimated according to prefixed cleaning thresholds. This study revealed that profile con-
struction is very sensitive to noise, and the estimation threshold should be judiciously set.
In [29], Kilic et al. addressed open and short circuit-related failures with the Random Vector
Functional Link Networks (RVFLN) and Canonical Correlation Analysis (CCA) as data
reduction techniques to reduce the dimension of selected attributes.
In [30], Xiong et al. studied the utility of Discrete Wavelet Transform (DWT) to identify
and isolate parallel and series arc faults. Specifically, the DWT-based technique is applied
to inspect capacitor current features, and results show significant variations during arc
faults, which allow for the revealing of their type and location. Nevertheless, the analysis
is based on simplified alleged data and specific arc faults. Indeed, it should be generalized
for other fault types onboard real PV systems. The approach proposed in [31] employs
Energies 2022, 15, 7978 3 of 22
Wavelets Packet Transform (WPT). Here, three PV plant’s variables have been chosen to
perform de detection procedure: voltage, voltage/energy, and impedance/energy. For each
variable, preset thresholds obtained through simulation are used to define the intervals
of normal variations. In [32], Edun et al. presented a method to detect disconnections in
PV systems using the Spread Spectrum Time Domain Reflectometry (SSTDR). It consists
of correlating generated to reflected signals, and the obtained baseline peaks are utilized
to reveal and locate the related faults. However, the SSTDR is too sensitive to noise and
attenuation, making it relatively adapted to a limited number of cells. In addition, the use
of this approach gets more complicated in large-scale PV plants where a large number of
reflections can strongly degrade their detection performance.
Recently, several supervised and semi-unsupervised machine leaning-based tech-
niques have been explored for fault detection and classification in PV systems [33–36].
Supervised methods require the availability of labeled data during the training, which is
not often easy to obtain, particularly for large-scale systems with various faults. On the
other hand, semi-supervised methods need only fault-free data in training and without
fault labeling. In [37], individual modules’ voltages are estimated by means of multi-layer
neural networks to find out short circuit situations. Results showed that detection accuracy
increased proportionally with the number of used layers. However, the training of this
model requires faulty measurements, which is not always feasible, even if it does not cover
all expected faults. Chen et al. [38] presented a multi faults detection approach based on
enhanced Semi-Supervised Ladder Networks (SSLN) when using restricted online sam-
ples. In [39], an auto-encoder-driven deep neural network approach is designed to deal
with operational and cyber-attacks related faults in grid-connected PV systems. The auto-
encoder is applied to estimate the normal measurements with the lowest error based on
multivariate features. Then faults are reported when estimation error exceeds an empirical
detection threshold based on the well-known three-sigma rule. In [40], the SVM algorithm
is applied to quantify power quality and isolate eventual disturbances within client-side
measurements. First, the DWT selects and extracts the most relevant features. Next, the
One-Class SVM (OCSVM) is involved in revealing disturbances. Finally, a Multi-Class
SVM (MCSVM) procedure is performed to classify the detected disturbances.
Anomaly detection in PV plants is essential for improving their safety and profitability
and optimizing maintenance scheduling. Possible anomalies could be caused by malfunc-
tioning sensors (sensor anomalies) or changes affecting the inspected system (called process
anomalies). The need for methods to accurately and quickly uncover abnormal conditions
(sensor or process anomalies) is vital to keep the PV system operating with the desired
performance. This paper presents a data-driven approach for detecting sensor anomalies
and anomalies in the DC side of the PV system. Importantly, the aim is to develop a semi-
supervised data-driven detector for PV systems monitoring that does not require labeled
data. In this study, we propose a Triple Exponentially Weighted Moving Average (TEWMA)-
based strategy to detect anomalies encountered in the DC side of a PV plant. The TEWMA
scheme is an effective and sensitive approach, which is found to have better detection of
small changes, usually adopted for inspecting only single process variables. However, data
from PV systems are multivariate input-output variables and cross-correlated, making the
use of the TEWMA chart inappropriate. To bypass this shortcoming, the proposed method
integrates Latent Variable Regression (LVR) methods with the TEWMA chart. LVR models,
namely, Principal Component Regression (PCR) and Partial Least Square Regression (PLS),
demonstrated good capability in modeling multivariate input-output data in different
applications by eliminating variables’ collinearity and reducing noises as well. In addition,
PLS/PCR models have been widely employed for response variables prediction based
on predictor variables. They perform well when transforming high dimensional input
data to lower dimension outputs that can be effectively used to track abnormal situations.
Importantly, the proposed LVR-TEWMA approach employs two complementary steps in
detecting anomalies. The first involves the use of the LVR models (i.e., PLS and PCR) to
generate the residual sequence and the TEWMA decision threshold based on data from the
reducing noises as well. In addition, PLS/PCR models have been widely emplo
response variables prediction based on predictor variables. They perform we
transforming high dimensional input data to lower dimension outputs that can
Energies 2022, 15, 7978
tively used to track abnormal situations. Importantly, the proposed4 ofLVR-TEW 22
proach employs two complementary steps in detecting anomalies. The first invo
use of the LVR models (i.e., PLS and PCR) to generate the residual sequence
PV system under nominal conditions. In the second step, the TEWMA charting statistic
isTEWMA
computeddecision threshold
using residuals based
from the on data
reference LVRfrom
modelthe PV on
based system under
the newly nominal con
received
In the
data. second step,
Comparison of thethe
TEWMATEWMA charting
statistic statistic isthreshold
to the determined computed valueusing
yieldsresiduals
the f
anomaly decision (faulty or healthy conditions). Note that the threshold
reference LVR model based on the newly received data. Comparison of the TEW in the conventional
TEWMA chart is analytically determined based on the Gaussian assumption of the data
tistic to the determined threshold value yields the anomaly decision (faulty or
distribution. However, this assumption is not valid for the data from PV systems. Here,
weconditions). Note that
extend the TEWMA the
chart tothreshold in the conventional
monitor non-Gaussian data by using TEWMA
the Kernelchart is analyti
density
termined
estimator based onthe
to determine thedetection
Gaussian assumption of the data distribution. However
threshold.
This work
sumption is isnot
organized
valid for as follows.
the data Section
from2 isPVdevoted to theHere,
systems. description of the used
we extend the TEWM
grid-tied
to monitor non-Gaussian data by using the Kernel density estimator to for
PV system. Then, Section 3 presents the LVR models (i.e., PLS and PCR) determin
PV system modeling. Section 4 briefly describes the proposed LVR-TEWMA monitoring
tectionand
method threshold.
discusses the experimental results. The last section includes this study and
providesThis
somework
futureis directions.
organized as follows. Section 2 is devoted to the description of
grid-tied PV system. Then, Section 3 presents the LVR models (i.e., PLS and PCR
2. PV Installation Description
system modeling. Section 4 briefly describes the proposed LVR-TEWMA mo
This section is devoted to presenting shortly the Grid-Connected Photovoltaic (GCPV)
method and discusses the experimental results. The last section includes this st
installation used in this study. Indeed, the proposed fault detection algorithm in this work
provides
will some
be verified usingfuture directions. and electrical data measurement collected from
the meteorological
a PV installation at the Centre de Developpement des Energies Renouvelables (CDER)
2.Algeria,
in PV Installation
with a total Description
power of 9.54 kWdc in operation since 2004 (Figure 1), where the
total solar PV power produced is injected into the low-voltage electrical grid. As shown
This
in Figure section
2, the PV systemis devoted
is composed to of
presenting
three identical shortly the Grid-Connected
single-phase PV sub-systems. Phot
(GCPV)
Each installation
PV sub-system used
contains in sub-array
a PV this study. Indeed,
of 30 PV Modulesthe proposed fault detection
(PVM), a 2.5-kilowatt (kW) algo
this work
grid-tie willand
inverter, beelectrical
verifiedcabinets
using for
theprotection.
meteorological and electrical data measurem
lected from a PV installation at the CentreEach
The PV array is divided into three sub-arrays. PV sub-array contains
de Developpement destwo parallel Renou
Energies
strings of 15 PVM in a series. The main electrical specifications of the PV Modules/sub-
(CDER) in Algeria, with a total power of 9.54 kWdc in operation since 2004 (F
array and the grid-tie inverter are shown respectively in Tables 1 and 2.
where theSTC
Where total solar PV
symbolizes thepower
Standardproduced is injected
Test Conditions (G = 1000into
W/m the
2 , Tlow-voltage
◦
C = 25 C, and
electri
As =shown
AM 1.5) andinMPP
Figure 2, the PV
symbolizes system isPower
the Maximum composed ofisthree
Point. G identical
the received single-phase
Irradiance
by the PV module
systems. Each PV during the flash test,
sub-system TC is theatemperature
contains PV sub-array of theof
PV30 Cell,
PVand AM is (PVM
Modules
the Air Mass. ISC is the short circuit current, VOC is the open circuit voltage, IMPP is the
kilowatt (kW) grid-tie inverter, and electrical cabinets for protection.
current at MPP, VMPP is the voltage at MPP, and PM is the maximum power.
The
The PV array isand
meteorological divided into
electrical three sub-arrays.
measured data used in thisEach PVare
work sub-array
recovered contains
by t
allel
an strings
external of 15 PVM
monitoring systemin a series.
composed The main
essentially electrical
of Sensors, specifications
a Data Acquisition unit of the P
ules/sub-array and the grid-tie inverter are shown respectively in Tables 1 and 2.
(Agilent 34970A), and software under PC (Figure 3).
Figure
Figure 1. 1. Picture
Picture ofPV
of the theArray
PV Array of thesystem
of the GCPV GCPV on system onatthe
the rooftop rooftop
CDER. at CDER.
Energies 2022, 15,
Energies 2022, 15,7978
x FOR PEER REVIEW 55 of
of 22
Figure 2.
2. Illustrative diagram of the used grid-connected
grid-connected PV
PV system.
system.
Table1.
Table 1. Electrical
Electrical Characteristic
Characteristic of
ofPV
PVModule
Moduleand
andPV
PVsub-array
sub-arrayIsofoton
IsofotonI106-12.
I106-12.
Parameters
Parameters ISC (A) ISC (A) VOC (V)
VOC (V) IMPP I(A)
MPP (A) VMPP
VMPP (V)(V) PM(W)
PM (W)
PV Module
PV Module 6.54 6.54 21.6 21.6 6.1 6.1 17.417.4 106
106
PVPVSub-Array 13.08 324 12.2 261 3180
13.08 324 12.2 261 3180
Sub-Array
Table 2. Electrical Characteristic of PV Inverter Fronius IG 30 under nominal conditions.
Table 2. ElectricalNominal
Characteristic
AC of PV
DCInverter
Voltage FroniusInverter
IG 30 under nominal conditions.
AC Voltage Frequency
Parameters
Power (W) Range (V) Efficiency
Inverter
(%) range (V) Range (Hz)
Value Nominal
2500 AC DC Voltage
150–400 92.7–94.3 AC 195–253
Voltage Frequency
49.8–50.2
Parameters Efficiency
Power (W) Range (V) Range (V) Range (Hz)
(%)
Energies 2022, 15, x FOR PEER REVIEW 6 of 22
Where STC symbolizes
Value 2500 the Standard
150–400 Test Conditions
92.7–94.3 (G = 1000 W/m2, T
195–253 C= 25 °C, and
49.8–50.2
AM = 1.5) and MPP symbolizes the Maximum Power Point. G is the received Irradiance
by the PV module during the flash test, TC is the temperature of the PV Cell, and AM is
the Air Mass. ISC is the short circuit current, VOC is the open circuit voltage, IMPP is the
current at MPP, VMPP is the voltage at MPP, and PM is the maximum power.
The meteorological and electrical measured data used in this work are recovered by
an external monitoring system composed essentially of Sensors, a Data Acquisition unit
(Agilent 34970A), and software under PC (Figure 3).
For the measure of tilted irradiance at 27 °C, a pyranometer (kipp&zonen CM11) and
a reference cell are used, and a thermocouple measures the ambient temperature. Two
hall-effect sensors (FW BELL CLSM-50S) were used to measure the current on both the
Direct current (DC) and Alternating Current (AC) sides of the grid-tied inverter. The DC
voltage at the MPP of the PV sub-array is measured by a simple voltage divider circuit;
on the other hand, a voltage transformer was used to measure the grid AC voltage.
The data acquisition and switch unit (Agilent 34970A) provides conditioning and
measurement of sensor signals. While the monitoring interface is designed under Lab-
VIEW software, this interface can recover, display, record, and analyze the measured data.
According to IEC 61724 standard, the sampling time was chosen at 1 min.
Figure 3.
Figure 3. Synoptic
Synopticdiagram
diagramof the PV PV
of the monitoring system.
monitoring system.
3. Materials and Methods

Here, we briefly introduce the KDE-Triple EWMA approach, PCR, and PLS-based
anomaly detection schemes.
Energies 2022, 15, 7978 6 of 22
For the measure of tilted irradiance at 27 ◦ C, a pyranometer (kipp&zonen CM11) and

a reference cell are used, and a thermocouple measures the ambient temperature. Two
hall-effect sensors (FW BELL CLSM-50S) were used to measure the current on both the
Direct current (DC) and Alternating Current (AC) sides of the grid-tied inverter. The DC
voltage at the MPP of the PV sub-array is measured by a simple voltage divider circuit; on
the other hand, a voltage transformer was used to measure the grid AC voltage.
The data acquisition and switch unit (Agilent 34970A) provides conditioning and
measurement of sensor signals. While the monitoring interface is designed under Lab-
Figure 3. Synoptic diagram of the PV monitoring system.
VIEW software, this interface can recover, display, record, and analyze the measured data.
According to IEC 61724 standard, the sampling time was chosen at 1 min.
3. Materials and Methods
3. Materials
Here, weandbriefly
Methods introduce the KDE-Triple EWMA approach, PCR, and PLS-based
anomaly detection schemes. the KDE-Triple EWMA approach, PCR, and PLS-based
Here, we briefly introduce
anomaly detection schemes.
3.1. PLS (Partial Least Square)
3.1. PLS (Partial Least Square)
PLS,also
PLS, alsocalled
called Projection
Projection to Latent
to Latent Structures,
Structures, is a well-known
is a well-known data-driven
data-driven dimen- dimen
sionalityreduction
sionality reduction technique
technique thatthat
has has
beenbeen extensively
extensively appliedapplied to monitor
to monitor multivariate
multivariate
andhighly
and highlycorrelated
correlated variables
variables [41–44].
[41–44]. The The essence
essence of PLSofmodel
PLS model
consistsconsists of transforming
of transforming
inputand
input andoutput
output data
data into
into a lower
a lower dimension
dimension spacespace
so thatsotheir
thatcorresponding
their corresponding
latent laten
variables
variableshave
havethe maximum
the maximum variance [43,44].
variance Importantly,
[43,44]. PLS constructs
Importantly, a modelalink-
PLS constructs model link
ing the latent variables (LVs) associated with X and Y. The PLS model
ing the latent variables (LVs) associated with X and Y. The PLS model comprises two comprises two
complementary models: inner and outer models (Figure 4).
complementary models: inner and outer models (Figure 4).
Figure4.4.General
Figure General design
design of PLS
of PLS model.
model.
Specifically,
Specifically,forfor
data inputs
data X with
inputs X withn observations from m from
n observations variables, X ∈ Rn×m X∈R
m variables, and n×m and
n×p
data output Y with n observations from p variables, Y ∈ R , the PLS model (i.e., outer
n×p
data output Y with n observations from p variables, Y∈R , the PLS model (i.e., oute
model) can be expressed as follows [42,45]:
model) can be expressed as follows [42,45]:
(
l T = TPTt +G t
X = ∑X i=1=tp
∑i li=1 tpi = TP, + G (1)
Y =∑ { li=1 uqTil = UQ T
t
+F t , (1
Y = ∑i=1 uqi = UQ + F
×l ×q
where T ∈ Rnn×l ∈ Rnn×q
where T∈R andUU∈Rand represent
representthe latent variable matrices associated with input
the latent variable matrices associated with inpu
and output matrices, respectively. The loading matrices of input and output space are
and output matrices, respectively. The loading matrices of input and output space are
P ∈ Rm×l and Q ∈ Rp×q , respectively. G and F represent the PLS model residuals matrices.
Here, l is the number of retained principal components (PCs), which can be calculated
using the cross-validation method [46]. The retained latent variables u j and t j are estimated
iteratively [47], and the PLS model establishes the relationship (i.e., inner model) between
T and U as:
U = Tβ + E, (2)
Here, l is the number of retained principal components (PCs), which can be calculated
using the cross-validation method [46]. The retained latent variables 𝑢𝑗 and 𝑡𝑗 are esti-
mated iteratively [47], and the PLS model establishes the relationship (i.e., inner model)
between T and U as:
Energies 2022, 15, 7978 U = Tβ + E, 7 of 22 (2)
where β denotes the regression matrix and E is the residual matrix. Finally, the output
variable
where Y is obtained
β denotes via thematrix
the regression following expression:
and E is the residual matrix. Finally, the output
variable Y is obtained via the following expression:
Y = TβQT + F* (3)
T ∗
Various extensions to the linearY = PLS +F
TβQ model (3)
have been reported in the literature to
widen the applicability of PLS models in practice, including nonlinear PLS [41],
Various extensions to the linear PLS model have been reported in the literature to
wavelet-
driventhe
widen PLS [48], recursive
applicability of PLSPLS [49], in
models and dynamic
practice, PLS [50].
including nonlinear PLS [41], wavelet-
driven PLS [48], recursive PLS [49], and dynamic PLS [50].
3.2. PCR (Principal Component Regression)
3.2. PCR (Principal Component Regression)
We now describe the PCR model. The PCR method is frequently used in dimension-
alityWe now describe
reduction andthe PCR model.
regression to The PCR method
handle is frequently
the collinearity used in dimensionality
problem in multivariate data
reduction and regression to handle the collinearity problem in multivariate data. PCR
PCR is performed in two complementary phases. At first, the data reduction procedure is
is performed in two complementary phases. At first, the data reduction procedure is
applied
applied totoexplanatory
explanatory variables
variables X using
X using Principal
Principal component
component analysisanalysis
(PCA).(PCA). After that
After that,
the resultingretained
the resulting retained principal
principal components
components (CPs)
(CPs) are thenare thentolinked
linked to thevariables
the response response varia-
bles via the Ordinary Least Squares (OLS) regression technique [51,52]
via the Ordinary Least Squares (OLS) regression technique [51,52] (Figure 5). (Figure 5).
Figure
Figure 5.5.General
Generalrepresentation
representation of PCR
of PCR model.
model.
After
Afterapplying
applyingthe
thePCA
PCAalgorithm, thethe
algorithm, input data
input matrix
data X isXdecomposed
matrix as fol-
is decomposed as follows
lows [52]:
[52]: k m
X = TWT = ∑ ti wTi k+ ∑ T
i = X̂ + E,
ti wm (4)
i=1T i=K+1
X = TW = ∑ ti wTi + ∑ ti wTi ̂ +E,
=X (4)
where X̂, and E are the approximated and residual matrices, respectively. Here, T ∈ Rn×m
i=1 i=K+1
and W ∈ Rm×m represent the principal components (PCs) and the loading matrices,
where ̂ , and
X
respectively. E arethat
Notice thethe
approximated and residual
cumulative percentage matrices,
variance (CPV)respectively. Here, T∈Rn×m
technique is widely
m×m
and W∈Rto
employed represent
decide the principal
the number of PCs tocomponents
be retained in (PCs) and the
the model loading
[53]. matrices,
Therefore, PCR respec-
tively. Notice that the cumulative percentage variance (CPV) technique is widely em-
constructs the linear regression between the matrix T̂ of the k retained principal components
and the response
ployed to decidevariable y as the
the number ofsolution for retained
PCs to be the following optimization
in the model [53].problem:
Therefore, PCR con-
structs the linear regression between the matrix T ̂ of the k retained principal components
β̂ = arg min(k T̂β − y k22 ). (5)
and the response variable y as the solution
β for the following optimization problem:
β̂ = as:
The least squares solution is obtained ̂β-y∥22 ).
arg min(∥T (5)
β
T −1 T
The least squares solution isβ̂obtained
= (T̂ T̂) as:
T̂ y. (6)
-1
More details about PCR and PLS-driven
β ̂T T
̂= anomaly
(T ̂) T T
̂techniques
y. can be found in [45]. (6)
3.3. TEWMA (Triple about
More details Exponential
PCRWeighted Moving Average)
and PLS-driven anomaly techniques can be found in [45].
The TEWMA (i.e., Triple EWMA) is an enhanced variant of the conventional single
EWMA
3.3. monitoring
TEWMA (Tripleapproach, builtWeighted
Exponential essentially for conveniently
Moving Average) uncovering changes with
small and moderate levels. Precisely, the decision statistic is based on applying triple
EWMA TheofTEWMAthe actual(i.e.,
andTriple EWMA)
past data pointsis[54,55].
an enhanced variant
For a given of the conventional
monitored variable X single
EWMA monitoring approach, built essentially for conveniently uncovering
2 changes with
xi, i = 1, . . . n, samples normally distributed follow normal distribution N(µ, σ ), the single
small
EWMAand moderate
statistic levels.
Ei is defined as:Precisely, the decision statistic is based on applying triple
EWMA of the actual and past data points [54,55]. For a given monitored variable X
Ei = λ1 xi + (1 − λ1 )Ei−1 . (7)
Then the double EWMA statistic, DEi , is expressed as:
DEi = λ2 Ei + (1 − λ2 )DEi−1 . (8)

Energies 2022, 15, 7978 8 of 22
Finally, the Triple TEWMA statistic, TEi can be calculated as follow:
TEi = λ3 DEi + (1 − λ3 )TEi−1 . (9)
The smoothing parameters λ1 , λ2 , and λ3 can either be equal or different. It has been
shown that the Double EWMA (DEWMA) maintains mostly the same performance for the
same or different smooth parameters [56,57]. The TEWMA charting statistic with the same
smoothing parameter is rewritten as follows [54,58]:

Ei = λxi + (1 − λ)Ei−1 ,
DEi = λEi + (1 − λ)DEi−1 , (10)
TEi = λDEi + (1 − λ),

where E0 = DE0 = TE0 = µ0 initialize the three statistics and usually set to be the anomaly-
free mean. As a weighted average, statistic Ei can be written as [54,58]:
Ei = λxi + (1 − λ)Ei−1
= λxi + (1 − λ)[λxi−1 + (1 − λ)Ei−2 ] (11)
= λ[xi + (1 − λ)xi−1 ] + (1 − λ)2 Ei−2 .
Hereby, the Ei statistic of the present sample can be computed as pondered sum
function of all paste samples j [54,56]:
i
Ei = λ ∑ (1 − λ)i−j xj + (1 − λ)i E0 , (12)
j=1
Similarly, DEi and TEi can be written as [54,56]:
i
DEi = λ ∑ (1 − λ)i−j Ej + (1 − λ)i DE0 , (13)
j=1
i
TEi = λ ∑ (1 − λ)i−j DEj + (1 − λ)i TE0 . (14)
j=1
After replacing Ei in Equation (13), DEi can be expressed in terms of xi as:

" #
i j
DEi = λ ∑ (1 − λ)i−j λ ∑ (1 − λ)j−k xk + (1 − λ)j E0 + (1 − λ)i DE0
j=1 k=1
i j i (15)
= λ ∑ ∑ (1 − λ)i−k xk +λ ∑ (1 − λ)i E0 + (1 − λ)i DE0
2
j=1 k=1 j=1
i−k i
=λ 2 i i
∑k=1 ∑j=k (1 − λ) xk +λi(1 − λ) E0 + (1 − λ)i DE0 .
We now substitute E0 into DE0 to obtain the final pondered form [54]:
i
DEi = λ2 ∑ (1 − λ)i−k (i − k + 1)xk + (λi + 1)(1 − λ)i DE0 (16)
k=1
After that, by substituting (16) in (14), TEi is expressed as [54]:

Energies 2022, 15, 7978 9 of 22
" #
i j
i−j j−k
TEi = λ ∑ (1 − λ) 2
λ ∑ (1 − λ) j
(j − k + 1)xk + (λj + 1)(1 − λ) DE0 + (1 − λ)i TE0
j=1 k=1
i j i
= λ3 ∑ ∑ (1 − λ)i−k (j − k + 1)xk +λ ∑ (1 − λ)i (λj + 1)DE0 + (1 − λ)i TE0
j=1 k=1 j=1 (17)
i i i
3 i−k i i
= λ ∑ ∑ (1 − λ) (j − k + 1)xk +λ ∑ (1 − λ) (λj + 1)DE0 + (1 − λ) TE0
k=1 j=k j=1
λ3 i−k
= 2 ∑ik=1 (1 − λ) (i − k + 1)(i − k + 2)xk + λ2 (1 − λ)i i(λi + λ + 2)DE0 + (1 − λ)i TE0
Finally, we substitute DE0 into TE0 , to obtain the TEi statistic as:
!
λ3 i (1 − λ)i
2 k∑
i−k
TEi = (1 − λ) (i − k + 1)(i − k + 2)xk + [λi(λi + λ + 2)+2]TE0 . (18)
=1
2
The monitoring scheme utilizing TEWMA defines the upper and lower control limits
as [54]:
v
u 6(1 − λ)6 λ 12(1 − λ)4 λ2 7 ( 1 − λ ) 2 λ3 λ4
u
UCL, LCL = µ0 ± Lσt + + + , (19)
(2 − λ)5 (2 − λ)4 (2 − λ)3 (2 − λ)2
where TEi values below the limits indicate that the monitored system operates without
anomalies. On the other hand, anomalies can be identified when TEi values overpass the
defined decision limits.
Note that the decision limits in (19) are calculated under the normality distribution
assumption of data. Nevertheless, this is not often satisfied, which could result in in-
creased false alarms rate and missed detections. To overcome this problem, kernel density
estimation (KDE) is adopted in this study to compute the decision limit.
3.4. KDE-TEWMA (Kernel Density Estimation TEWMA)

In this section, we introduce KDE, which is commonly used to estimate the probability
distribution of a given data, and how it can be used to compute the TEWMA threshold.
Let us consider X = x1 ; · · · ; xn represents the residual vector using PLS or PCR model. The
KDE is applied for estimating the probability density function (PDF) of the residuals by the
following formula [59]:
1 n x − xi

nh i∑
p(x) = K , (20)
=1 h
where xi is the ith observation. K is the kernel function; the Gaussian kernel is usually
used [60]: !
1 −θ2
K(x) = exp (21)
2π 2
In (20), h represents the smoothing bandwidth factor that determines the probability
estimation quality. H is the smoothing bandwidth factor that has a central impact on the
quality of the probability estimation. Small values lead to under-smoothed PDFs.
On the other hand, large values over-smooth them. As shown in [61], for n observa-
tions with the standard deviation σ, the optimal choice can be obtained as:
h = 1.06 σn−0.2 . (22)
Importantly, in the nonparametric procedure, firstly, the distribution of the TEWMA

statistic in (18) is estimated via univariate KDE using anomalies faults-free data. Then,
the nonparametric threshold of the mechanism is defined as the (1 − α)-th quantile of the
estimated distribution. An anomaly is flagged when the TEWMA statistic exceeds the
decision threshold.
nonparametric threshold of the mechanism is defined as the (1 − α)-th quantile of the es-
timated distribution. An anomaly is flagged when the TEWMA statistic exceeds the deci-
sion threshold.
Energies 2022, 15, 7978 10 of 22
3.5. Dataset Analysis
The data used in this work has been gathered using the above-described CDER 9.54
3.5. Dataset Analysis
kWp PV system (Figure 1). We consider here one month of measurement recorded with a
The data used in this work has been gathered using the above-described CDER
cadence of 10 min during the normal operation of the monitored system. Precisely, the
9.54 kWp PV system (Figure 1). We consider here one month of measurement recorded with
dataset is divided ainto training
cadence of 10data that includes
min during theoperation
the normal free firstofweek and usedsystem.
the monitored principally
Precisely, the
during the learning phase; and testing data (week four) which will be applied inused
dataset is divided into training data that includes the free first week and the principally
as-
sessment of our proposed
during themodels.
learning The provided
phase; measurements
and testing data (week four)concern
which the
willfollowing
be applied in the
assessment of our proposed models. The provided measurements
variables: solar irradiance, ambient temperature, cell temperature, DC power, DC current, concern the following
DC voltage, AC power, AC current, and AC voltage. Figure 6 shows five days from DC
variables: solar irradiance, ambient temperature, cell temperature, DC power, thecurrent,
DC voltage, AC power, AC current, and AC voltage. Figure 6 shows five days from the
considered data.
considered data.
Figure 6. Five days of the studied

Figure PV of
6. Five days data.
the studied PV data.
Figure
Figure 7 depicts the 7 depicts (PDF)
estimated the estimated (PDF) of
of training training
data baseddata
onbased
the on
KDEthe approach.
KDE approach. We
observe that the considered variables have non-Gaussian distribution. Thus, applying the
We observe that the considered variables have non-Gaussian distribution. Thus, applying
conventional monitoring charts (e.g., TEWMA) is not appropriate as the assumption of
the conventional monitoring charts (e.g.,
normality is violated. TEWMA)
Hence, using an is not appropriate
appropriate model toas the assumption
capture the dynamic in this
of normality is violated. Hence, using an appropriate model to
data and generate uncorrelated residuals is needed. capture the dynamic in this
data and generate uncorrelated residuals is needed.
5, x FOR PEER REVIEW
Energies 2022, 15, 7978 11 of 22 11 of 22
Figure 7. Distribution of the

Figure
Figure 7. considered of
7. Distribution
Distribution variables.
of the considered
the consideredvariables.
variables.
Here,
Here, the correlation
Here, the correlation
between correlation
the nine between
variables
between thethenine
isnine variables
quantified
variables is quantified
using
is the Pearson
quantified usingusing the
cor-
the Pearson
Pearson cor-
correlation
relation heatmap,
heatmap, as as illustrated
illustrated in in Figure
Figure 8. We8. We
can
relation heatmap, as illustrated in Figure 8. We can conclude from Figure 8 that DC cur-can conclude
conclude from from
Figure Figure
8 8
that that
DC DC
cur-
current,
rent, DC
DC power, power, AC current, and AC power are highly correlated. Moreover, the cell
rent, DC power, AC current, and ACACcurrent,
power are and highly
AC power are highly
correlated. correlated.the
Moreover, Moreover,
cell tem- the cell tem-
temperature
perature is is strongly
strongly related
related to DC tocurrent,
DC current,
DC DC power,
power, AC AC current,
current, and AC and AC power.
power. Nega-
perature is strongly related to DC current, DC power, AC current, and AC power. Nega-
Negative correlation between DC voltage and cell temperature.
tive correlation between DC voltage and cell temperature. The AC voltage is totally The AC voltage is totally
un-
tive correlation between DC voltage
uncorrelated with and cell temperature. The AC voltage is totally un-
correlated with thethe
restrest of the
of the variables.
variables. TheThe irradiance,
irradiance, cellcell temperature,
temperature, current
current DC,DC,
AC
correlated with the
AC rest of thepower
current, variables.and Thepower
irradiance, cell temperature, current DC,them.
AC Finally, a
current, power DC,DC, and power AC AC present
present a high
a high correlation
correlation between
between them. Finally, a low
current, power DC,lowand power
correlation
correlation AC present
is reported
is reported a high
between
between thecorrelation
the
DC DC between
voltage
voltage andand them.
the the Finally,
ambient
ambient cell cell a low the the
temperature,
temperature, DC
correlation is reported
DC between
current,
current, ACAC the DCDCvoltage
current,
current, DC power,
power, and
and andtheAC
AC ambient
power. cell temperature, the DC
power.
current, AC current, DC power, and AC power.
Figure8.
Figure 8. Correlation
Correlation matrix
matrix of
oftraining
trainingPV
PVdata.
data.
Figure 8. Correlation matrix of training PV data.

Energies 2022,15,
Energies 2022,
2022, 15,7978
15, xx FOR
FOR PEER
PEER REVIEW
REVIEW 12of
12
12 of 22
of 22
In PCR
In PCR and
andPLS
and PLSmodels,
PLS models,the
models, thenumber
the numberofof
number ofselected
selectedPCs
selected PCs
PCs can
canstrongly
can strongly
stronglyaffect
affect
affectthethe
quality
the qualityof
quality
the
of constructed
the constructed prediction models.
prediction models.We have
We haveapplied the
applied cumulative
the cumulativepercentage
of the constructed prediction models. We have applied the cumulative percentage vari- percentage variance
vari-
(CPV)
ance method
ance (CPV)
(CPV) to determine
method
method to the the
to determine
determine suitable number
the suitable
suitable number
numberof of
CPs.
of CPs.
CPs. AsAsdepicted
As depictedinin
depicted inFigure
Figure9,
Figure 9, the
9, the
the
obtained
obtained results
obtained results show
results show that
show that five
that fiveand
five andfour
and fourCPs
four CPsare
CPs are needed to
are needed describe
needed to 99.87%
to describe
describe 99.87%of
99.87% ofthe variability
of the
the varia-
varia-
in X for
bility
bility inboth
in X forPLS
X for both(Figure
both PLS 9a) and
PLS (Figure
(Figure 99.98
9a)
9a) andfor
and PCR
99.98
99.98 for(Figure
for PCR 9b) models,
PCR (Figure
(Figure 9b) respectively.
9b) models,
models, respectively.
respectively.
Figure 9. Correlation
Figure 9. Correlation matrix
matrix of
of training
training data.
data.
Figure 10 presents
Figure 10 presents the
the scatter
scatter plot
plot of
of the
the measured
measured and
and predicted
predicted DC
DC power
power using
using
PLS and
PLS andPCR
and PCRmodels
PCR modelsbased
models based
basedonontesting
on data.
testing
testing Overall,
data.
data. results
Overall,
Overall, indicate
results
results that that
indicate
indicate both models
that both
both have
models
models
allowed
have good
allowed prediction,
good and
prediction, relatively
and PLS
relativelyprovided
PLS highest
provided prediction
highest quality.
prediction
have allowed good prediction, and relatively PLS provided highest prediction quality. quality.
Figure
Figure 10.
10. Scatter
Scatter plot
plot of
of the
the measured
measured and
and predicted
predicted DC
DC power
power using
using PLS
PLS and
and PCR models.
PCR models.
models.
Figure
Figure 11
11 shows
shows the
the boxplots
boxplots of
of residual
residual errors
errors of
of PLS
PLS and PCR models obtained
obtained with
with
training data.
training data. As illustrated, best prediction results are obtained with PLS
prediction results are obtained with PLS model.
PLS model.
model.
Energies 2022,15,
Energies2022, 15,7978
x FOR PEER REVIEW 13 13
In fact, the PLS model returned the smallest residual prediction errors, where of
2222
ofmost
values were close to zero, resulting in the compacter box compared to PCR.
In fact, the PLS model returned the smallest residual prediction errors, where most
values were close to zero, resulting in the compacter box compared to PCR.
Figure11.
Figure 11. Box
Box plot
plot of
of residual
residualerrors
errorsof
ofPLS
PLSand
andPCR
PCRmodels.
models.
4. TheIn LVR-TEWMA-Based
fact, the PLS model returned the smallest
Fault Detection in PVresidual
Systems prediction errors, where most
values
Figure were
11. Box
This close to
plot of
section zero, resulting
residual the
presents errors
key in the
of concept compacter
PLS and PCR of the box
models. compared to PCR. approach for
proposed fault detection
PV systems
4. monitoring. In this work, at first, inlatent variable
Systemsregression methods (i.e., PCR
4. The
The LVR-TEWMA-Based
LVR-TEWMA-Based Fault Fault Detection
Detection in PVPV Systems
and PLS) are calibrated using normal operating data from the inspected PV system. Then,
This
This section
section presents
presents the
thekey
keyconcept
concept ofofthetheproposed
proposed fault detection approach forfor
PV
the calibrated models are employed to sense any anomaliesfault in newdetection
data via approach
the TEWMA
systems
PV systems monitoring.
monitoring. In this work,
In thisThe
work, at first, latent
at first, (called variable
latent variable regression
regression methods (i.e., PCR
statistical monitoring charts. deviation residuals) betweenmethods (i.e., PCR
the measured data
and
and PLS)
PLS) are
are calibrated
calibrated using normal
using normal operating
operatingdata datafrom
fromthe theinspected
inspectedPV PVsystem.
system.Then,
Then,
and the prediction from the constructed LVR models (i.e., PCR and PLS) delivers pertinent
the
the calibrated
calibratedand models
models are employed
are employed to to sense
senseany anyanomalies
anomalies innew newdata
datavia
viathe
theTEWMA
TEWMA
information can be evaluated to uncover anomalies ininthe PV system. Reasonably,
statistical
statistical monitoring charts. The deviation (called residuals) between the measured data
when the monitoring
inspected PV charts.
systemTheisdeviation
operating (called residuals)
customarily, between
the the fluctuate
residuals measuredarounddata
and
and the prediction from the
the constructed
constructed LVR LVR models(i.e., (i.e.,PCR
PCRand andPLS)
PLS)delivers
deliverspertinent
pertinent
zerothe prediction
because from
of noise measurements. On the models
other hand, if an atypical event or an anomaly
information
information and
and can be
can be evaluated
evaluated to to uncover
uncover anomalies
anomaliesininthe thePVPVsystem.
system.Reasonably,
Reasonably,
occurs in the PV system, the residuals diverge from zero to reveal that the PV system
when
when the
theinspected
inspectedPV PVsystem
system isisoperating
operating customarily,
customarily, thethe
residuals fluctuate
residuals around
fluctuate aroundzero
requires attention. Figure 12 schematically recapitulates the proposed LVR-TEWMA
because of noise measurements. On the other hand, if an atypical
zero because of noise measurements. On the other hand, if an atypical event or an anomaly event or an anomaly
anomaly detection framework.
occurs
occurs in in the
the PV
PVsystem,
system,the theresiduals
residualsdiverge
divergefrom fromzerozerototoreveal
revealthat
thatthe
thePVPVsystem
systemre-
quires attention. Figure 12 schematically recapitulates the
requires attention. Figure 12 schematically recapitulates the proposed LVR-TEWMAproposed LVR-TEWMA anomaly
detection framework.
anomaly detection framework.
Figure 12. General fault detection procedure in PV system using LVR-based KDE triple exponen-
tially smoothing driven monitoring schemes.
Figure 12. General fault detection procedure in PV system using LVR-based KDE triple exponen-
12. Generalthe
FigureSuccinctly, faultPLS
detection procedure
residuals in PV system
are expressed using LVR-based KDE triple exponentially
as follows:
tially smoothing driven monitoring schemes.
smoothing driven monitoring schemes.
𝑅 = 𝑌 − 𝑌̂ (23)
Succinctly, the PLS residuals are expressed as follows:
Succinctly, the PLS residuals are expressed as follows:
𝑅 = 𝑌 − 𝑌̂ (23)
R = Y − Ŷ (23)
Energies 2022, 15, 7978 14 of 22
After that, we applied the TEWMA chart to evaluate the residuals for anomaly de-
After
tection that, weThe
purposes. applied the TEWMA
LVR-TEWMA chart todetection
anomaly evaluate strategy
the residuals
can beformainly
anomaly de-into
split
tection
two purposes.
phases. The LVR-TEWMA
(1) During anomaly detection
the model construction phase,strategy can bethe
we calibrate mainly
LVRsplit intoand
models
two phases.
compute the(1) Duringdecision
TEWMA the model construction
threshold basedphase, we calibrate
on normal the LVR
operating data.models andmon-
(2) In the
compute
itoring the TEWMA
phase, decision
we generate thethreshold
residualsbased on normal
via the operating
constructed LVRdata.
models(2) In theapply
and mon- the
itoring phase,
already computedwe generate
TEWMA thethreshold
residuals for
via the constructed
monitoring new LVR models
data. Here, and
weapply
appliedthe the
already computed TEWMA threshold for monitoring new data. Here, we applied
KDE approach to determine the reference threshold. To do so, we estimate the PDF of the the KDE
approachmeasurements
TEWMA to determine the reference
through KDE. threshold.
Next, theTo do so, we
detection estimateofthe
threshold thePDF of the
LVR-TEWMA
TEWMA measurements through KDE. Next, the detection threshold of the LVR-TEWMA
approach is obtained as the (1 − α)-th quantile of the estimated PDF in respect of targeted
approach
false alarmisprobability
obtained as α.
the (1 − α)-th quantile of the estimated PDF in respect of targeted
falseIn
alarm probability α.
summary, after building the LVR models and computing the TEWMA decision
In summary,
threshold, the newafter building
online data canthebe
LVR models to
evaluated and computing
sense theanomalies.
potential TEWMA decision
Specifically,
the TEWMA charting statistic is compared with the detection threshold, H, Specifically,
threshold, the new online data can be evaluated to sense potential anomalies. to decide about
the TEWMA charting statistic is compared with the detection threshold, H, to decide
the presence of an anomaly as follows:
about the presence of an anomaly as follows:

No
𝑁𝑜 anomaly
𝑎𝑛𝑜𝑚𝑎𝑙𝑦 i𝑖𝑓 f TEW
𝑇𝐸𝑊𝑀𝐴 MA<<𝐻,H,
δ= 𝛿={ (24) (24)
𝐴𝑛𝑜𝑚𝑎𝑙𝑦
Anomaly i𝑖𝑓f 𝑇𝐸𝑊𝑀𝐴
TEW MA>> 𝐻.H.
5.
5. Results
Results
This
This section verifiesthe
section verifies theperformance
performanceofof the
the proposed
proposed LVR-TWEMA
LVR-TWEMA approaches
approaches in in
detecting several types of anomalies in PV systems that are recorded
detecting several types of anomalies in PV systems that are recorded during real plants’ during real plants’
lifetime. Here, we
lifetime. Here, wefocus
focusour
ourstudy
studyononthe theDCDC side
side of of
PVPV systems,
systems, andand
we we consider
consider herehere
specifically
specifically the
the following
following scenarios: String faults
scenarios: String faults (SF),
(SF), Inverter
InverterDisconnections
Disconnections(ID),
(ID),Sce-
Scenar-
ios withwith
narios Circuit Breaker
Circuit Faults,
Breaker Short-Circuit
Faults, Short-CircuitFaults (SCF),
Faults andand
(SCF), Sensor Faults
Sensor in Pyranometer
Faults in Pyra-
(SFP).
nometer Different scenariosscenarios
(SFP). Different are carried areout usingout
carried realusing
measurements gathered gathered
real measurements using theus-CDER
9.54
ing the CDER 9.54 kWp PV system described in Section 2. The normal and faulty data in
kWp PV system described in Section 2. The normal and faulty data were selected
the
were summer
selectedwhen most days
in the summer whenaremost
characterized by clear skies
days are characterized and skies
by clear medium ambient air
and medium
temperature (which ranges from 20 to 35 ◦ C). We employed five statistical metrics for the
ambient air temperature (which ranges from 20 to 35 °C). We employed five statistical
evaluation: trueevaluation:
metrics for the positive rate
true(TPR), false
positive positive
rate (TPR),rate
false(FPR), accuracy,
positive the area
rate (FPR), underthe
accuracy, curve
(AUC), andcurve
area under EER (equal
(AUC),error
and EERrate)(equal
[62]. error rate) [62].
5.1.
5.1. Scenarios with
with String
StringFaults
Faults
The
The aim of of this
this experiment
experimentisistotoassessassessthethe capacity
capacity of of LVR-TWMA
LVR-TWMA approaches
approaches to to
uncover
uncover thethe string’s
string’sfault
fault(i.e.,
(i.e.,open
open circuits).
circuits). Broadly
Broadly speaking,
speaking, OpenOpen circuits
circuits are gen-
are gener-
erally related
ally related to to
thethe degradation
degradation of DC
of DC protection
protection or detachment
or detachment between
between PV modules
PV modules in
series. Here, we deactivate the circuit breaker to voluntarily create the desired
in series. Here, we deactivate the circuit breaker to voluntarily create the desired open open circuit
faults. faults.
circuit Figure Figure
13 presents the corresponding
13 presents detectiondetection
the corresponding results using theusing
results PLS- and PCR-and
the PLS-
based TEWMA
PCR-based TEWMA methods. Both Both
methods. methods detect
methods this fault
detect withwith
this fault largelarge
magnitude. TableTable
magnitude. 3 3
quantitatively confirms
quantitatively confirms the
the obtained
obtainedresults
resultsininterms
termsofofthe
thefive
five scores.
scores. Both
Both methods
methods reach
areach
highadetection
high detection rate, no
rate, with with no false
false alarms
alarms and and an accuracy
an accuracy of 0.9942.
of 0.9942.
Figure 13. Detection results of (a) PLS and (b) PCR-based TEWMA charts in the presence of string fault.
Figure 13. Detection results of (a) PLS and (b) PCR-based TEWMA charts in the presence of string
fault.
Energies 2022, 15, 7978 15 of 22

Table 3. Results of PLS-based and PCR-based TEWMA charts under string fault.
Method TPR FPR Accuracy AUC EER

PLS-TEWMATable 3. Results
0.98of PLS-based
0 and PCR-based TEWMA
0.9942 charts under
0.99 string fault.
0.0034
PCR-TEWMA 0.98
Method 0 TPR 0.9942
FPR 0.99
Accuracy AUC 0.0034 EER
PLS-TEWMA 0.98 0 0.9942 0.99 0.0034
5.2. Scenarios with Inverter Disconnections
PCR-TEWMA 0.98 0 0.9942 0.99 0.0034
We now investigate the monitoring efficiency of the LVR-TEWMA methods in the

5.2. Scenarios with Inverter Disconnections
presence of inverter disconnections in the inspected PV system. In this case, the PV system
will still shut down tillWe
thenow investigate the monitoring efficiency of the LVR-TEWMA methods in the
inverter is reconnected. Here, we consider inverter disconnec-
presence of inverter disconnections in the inspected PV system. In this case, the PV
tions resulted due system
to gridwill
instability.
still shut More specifically,
down till the inverterinverter disconnections
is reconnected. Here, werefer to ainverter
consider
situation where thedisconnections
voltage and resulted
frequency dueoftothe
gridgrid exceedMore
instability. the inverter operating
specifically, limits.
inverter disconnections
Inverter disconnections
refer toare short in
a situation timethe
where and haveand
voltage large amplitude.
frequency In this
of the grid case
exceed thestudy, we
inverter operating
used real data withlimits. Inverter
inverter disconnectionsto
disconnections arestudy
short inthetime and have large
performance of amplitude.
the proposed In this case
study, we used real data with inverter disconnections to study the performance of the
monitoring techniques.
proposed monitoring techniques.
Figure 14 depicts the results of the PLS-TWEMA and PCR-TEWMA schemes. We ob-
Figure 14 depicts the results of the PLS-TWEMA and PCR-TEWMA schemes. We
serve that the twoobserve
schemes that successfully
the two schemes flagged the occurred
successfully flagged the inverter
occurreddisconnections;
inverter disconnections;
these anomalies arethese anomalies are with large amplitude. We also noticeanomalies
with large amplitude. We also notice that these are within
that these anomalies are within
short periods, which canperiods,
short be verywhichhelpfulcanin
bedistinguishing
very helpful in them from other
distinguishing themabnormal
from othercon-abnormal
ditions, such as temporary shading and string faults. Table 4 summarizes the results of results
conditions, such as temporary shading and string faults. Table 4 summarizes the
of this
this comparison using comparison
the using the
five statistical five statistical
metrics. Resultsmetrics.
show thatResults
the show
PLS-TEWMAthat the PLS-TEWMA
ap-
approach outperformed the PCR-TEWMA approach in detecting all occurred faults (i.e.,
proach outperformed the PCR-TEWMA approach in detecting all occurred faults (i.e.,
TPR = 100%).
TPR = 100%).
Figure 14. Detection results of (a) PLS- and (b) PCR-based TEWMA charts in the presence of inverter
Figure 14. Detection results of (a) PLS- and (b) PCR-based TEWMA charts in the presence of inverter
disconnections.
disconnections.
Table 4. Results of PLS- and PCR-based TEWMA charts under inverter disconnections.
Table 4. Results of PLS- and PCR-based TEWMA charts under inverter disconnections.
Method TPR
PLS-TEWMA FPR 1 Accuracy
0.0418 AUC
0.9583 0.9791EER 0.0417
PLS-TEWMA PCR-TEWMA
1 0.04180.75 0.0399
0.9583 0.9593
0.9791 0.8550
0.0417 0.0407
PCR-TEWMA 0.75 0.0399 0.9593 0.8550 0.0407
5.3. Scenario with Circuit Breaker Faults
Now,
5.3. Scenario with Circuit we assess
Breaker the performances of the proposed methodologies under circuit breaker
Faults
faults. These components are principally designed to protect PV systems and users from
Now, we assess the shocks,
eventual performances of the
short circuits, proposed
and potential methodologies
overloads. under
Figure 15 and Tablecircuit
5 illustrate the
breaker faults. These components
detection results ofare
the principally
PLS-TEWMAdesigned to protect
and PCR-TEWMA PV systems
methods and us-of RCCB
in the presence
ers from eventual shocks, short circuits, and potential overloads. Figure 15 and Table 5
x FOR PEER REVIEW 16 of 22
illustrate the detection results of the PLS-TEWMA and PCR-TEWMA methods in the pres-
ence of RCCB faults. In overall, it is clear that both PLS and PCR maintain high perfor- 16 of 22
Energies 2022, 15, 7978
mance with this type of fault. They reach a high detection rate with a TPR of around 0.98.
illustrate the detection results of the PLS-TEWMA and PCR-TEWMA methods in the pres-
ence of RCCB faults. In In
faults. overall,
overall,ititis
is clear thatboth
clear that bothPLSPLS
andand
PCRPCR maintain
maintain high perfor-
high performance with this
mance with this typetypeof
offault. Theyreach
fault. They reacha high
a high detection
detection rate rate
with with
a TPRaofTPR of around
around 0.98. 0.98.
Figure 15. Detection results of (a) PLS- and (b) PCR-based TEWMA charts in the presence of RCCB
faults.
Table 5.15.
Figure Results of PLS-
Detection and
results
Figure PCR-based
15.of
Detection TEWMA
(a) PLS-results
and charts
(b)(a)PCR-based
of PLS- under
and (b) RCCB
TEWMA
PCR-based faults.
charts
TEWMA incharts
the presence of RCCB
in the presence of RCCB faults.
faults.
Method Table 5. Results TPRof PLS- andFPR Accuracy
PCR-based TEWMA AUCfaults.
charts under RCCB EER
TablePLS-TEWMA 0.9815
5. Results of PLS- andMethod
PCR-based 0.0220 charts under
TEWMA TPR
0.9782
FPRRCCB faults.
0.9797
Accuracy AUC
0.0218
EER
PCR-TEWMA 0.9815 0.0210 0.9791 0.9802 0.0209
Method PLS-TEWMA
TPR 0.9815
FPR 0.0220
Accuracy 0.9782
AUC 0.9797 EER 0.0218
PCR-TEWMA 0.9815 0.0210 0.9791 0.9802 0.0209
PLS-TEWMA
5.4. Short-Circuit Fault 0.9815 0.0220 0.9782 0.9797 0.0218
PCR-TEWMA
In this scenario, 0.9815 Faultmethods
the proposed
5.4. Short-Circuit 0.0210 are evaluated
0.9791 to detect 0.9802 0.0209
short-circuit faults.
Figure 16 displays theInresults of the the
this scenario, two approaches
proposed methodswhen
are two PV modules
evaluated are short- faults.
to detect short-circuit
5.4. Short-Circuit Fault
Figure 16 displays the results of the two approaches when two
circuited. We can see that the two approaches can uncover this fault. Table 6 indicates that PV modules are short-
the PLS-TEWMA circuited.
In this scenario,
hasthe We can
theproposed
best see methods
that the two
performance, approaches
are
with a TPR can
evaluated uncover
of to this
detectOn
0.8649. fault.contrary,
Table 6faults.
short-circuit
the indicates
the that
the PLS-TEWMA has the best performance, with a TPR of 0.8649. On the contrary, the
Figure
PCR-TEWMA16 displays the results
provides of the two
unsatisfactory approaches
results with high when
missedtwodetection
PV modules (TPRare short-
= 0.1351).
PCR-TEWMA provides unsatisfactory results with high missed detection (TPR = 0.1351).
circuited.
This couldWe be can see
Thisthat
attributed tothe
could betwo
the approaches
prediction
attributed can uncover
to accuracy of the
the prediction thismodel
PLS
accuracy fault.
of Table
andmodel
the PLS 6 indicates
the sensitivitythat
of
and the sensitivity of
the TEWMA
PLS-TEWMA has
chartthe thesmall
to TEWMA
the bestchart
performance,
changes. with a TPR of 0.8649. On the contrary, the
to the small changes.
PCR-TEWMA provides unsatisfactory results with high missed detection (TPR = 0.1351).
This could be attributed to the prediction accuracy of the PLS model and the sensitivity of
the TEWMA chart to the small changes.
Figure 16. DetectionFigure

results16.of (a) PLS-results
Detection and (b) PCR-based
of (a) TEWMA
PLS- and (b) charts
PCR-based in the
TEWMA presence
charts of two of two
in the presence
short-circuited modules.
Table 6.16.
Figure Results of PLS-
Detection and of
results PCR-based TEWMA
(a) PLS- and charts under
(b) PCR-based two modules
TEWMA short-circuited.
charts in the presence of two
TablePLS-TEWMA 0.8649 TEWMA
6. Results of PLS- and PCR-based 0 charts under
0.9823 0.9324
two modules 0.0095
short-circuited.
FOR PEER REVIEW 17 of 22
Energies 2022, 15, 7978 17 of 22
PCR-TEWMATable 6. Results
0.1351 0
of PLS- and PCR-based 0.8867
TEWMA charts under 0.5676 0.0606
two modules short-circuited.

5.5. Sensor Bias Faults in the Pyranometer
PLS-TEWMA 0.8649 0 0.9823 0.9324 0.0095
We now considerPCR-TEWMA
sensor faults in pyranometers
0.1351 that
0 could0.8867
be due to 0.5676
degradation, 0.0606
aging, or other factors. Importantly, the pyranometer adapts the PV modules’ direction to
optimally track the 5.5.sun’s
Sensormovements
Bias Faults in and be continuously highly exposed to sunlight.
the Pyranometer
Consequently, faults affecting this precious
We now consider sensor component can prevent
faults in pyranometers thatthe PVbe
could system
due to from
degradation,
being in the right orientation, which can significantly decrease the produced energy. Here,
aging, or other factors. Importantly, the pyranometer adapts the PV modules’ direction
to optimally
we simulate bias faults track the sun’s
with different movements
magnitudes and bethe
to assess continuously highly
sensitivity exposed
of the to sunlight.
two pro-
Consequently, faults affecting this precious component can prevent the PV system from
posed approaches. The simulated bias, 𝑏, begins at sample 200. We used the following
being in the right orientation, which can significantly decrease the produced energy. Here,
model to simulate we biassimulate
faults in the solar irradiance measurements.
bias faults with different magnitudes to assess the sensitivity of the two
proposed approaches. The simulated bias, b, begins at sample 200. We used the following
(25)
𝑦𝑖 = 𝑥𝑖 + 𝑏,
model to simulate bias faults in the solar irradiance measurements.
where 𝑦𝑖 and 𝑥𝑖 are the faulty and fault-free measurements, respectively. Here, as part
yi = xi + b,
of the total variation in solar irradiance measurements, we consider the following magni- (25)
tudes: 5%, 10%, 20%, 30%,
where 40%,
yi and xi and 50%.
are the faulty and fault-free measurements, respectively. Here, as part of
Detection results of the
the total PLS-TEWMA
variation and PCR-TEWMA
in solar irradiance measurements, we charts in the
consider case of bias
the following magnitudes:
5%, 10%, 20%, 30%, 40%, and 50%.
faults (with 10%) in the pyranometer measurements are depicted in Figure 17. Visually,
Figure 15 shows that both Detection
chartsresults of the PLS-TEWMA
can recognize and PCR-TEWMA
this bias fault. A comparison charts in the case
between the of bias
faults (with 10%) in the pyranometer measurements are depicted in Figure 17. Visually,
two charts under the presence of bias faults with different magnitudes is given in Table 7.
Figure 15 shows that both charts can recognize this bias fault. A comparison between the
Detection results of theunder
two charts PLS-TEWMA
the presenceand PCR-TEWMA
of bias chartsmagnitudes
faults with different in the case of bias
is given in Table 7.
faults (with 10%) in the pyranometer measurements are depicted in Figure 17.
Detection results of the PLS-TEWMA and PCR-TEWMA charts in the case of bias Visually,
Figure 17 shows that both
faults charts
(with 10%)can recognize
in the this measurements
pyranometer bias fault. A comparison
are depicted in between theVisually,
Figure 17.
Figure 17 shows that both charts can recognize this bias fault.
two charts under the presence of bias faults with different magnitudes is given in Table 8. A comparison between the
two charts under the presence of bias faults with different
Results in Table 8 confirm the superiority of the PLS-TEWMA scheme compared to the magnitudes is given in Table 8.
Results in Table 8 confirm the superiority of the PLS-TEWMA scheme compared to the
PCR-TEWMA, particularly for faults with small levels. For instance, for bias faults with
PCR-TEWMA, particularly for faults with small levels. For instance, for bias faults with
5%, the PLS-TEMWA 5%, scheme detects this
the PLS-TEMWA sensor
scheme faultthis
detects with a TPR
sensor of 0.9231,
fault while
with a TPR of the PCR-
0.9231, while the
TEWMA scheme obtains a TPR of 0.7802.
PCR-TEWMA scheme obtains a TPR of 0.7802.
Figure 17. Detection Figure

results17.
of (a) PLS- and
Detection (b) of
results PCR-based TEWMA
(a) PLS- and charts in
(b) PCR-based the presence
TEWMA charts of
in sensor
the presence of
bias fault in the pyranometer measurements (a bias of 10% of the total variation in solar irradiance
sensor bias fault in the pyranometer measurements (a bias of 10% of the total variation in solar
measurements). irradiance measurements).
Table 7. Results of PLS- and PCR-based TEWMA charts under bias sensors fault in pyranometer.
Bias TEWMA
TPR FPR Accuracy AUC EER
Sensor (B) Method
PLS 0.9780 0 0.9931 0.9890 0.0041
50%
PCR 0.9451 0 0.9828 0.9725 0.0102
Energies 2022, 15, 7978 18 of 22
Table 7. Results of PLS- and PCR-based TEWMA charts under bias sensors fault in pyranometer.
Bias TEWMA
Sensor (B) Method
PLS 0.9780 0 0.9931 0.9890 0.0041
50%
PCR 0.9451 0 0.9828 0.9725 0.0102
PLS 0.9670 0 0.9897 0.9835 0.0061
40%
PCR 0.9341 0 0.9794 0.9670 0.0122
PLS 0.9670 0 0.9897 0.9835 0.0061
30%
PCR 0.9231 0 0.9759 0.9615 0.0143
PLS 0.9560 0 0.9863 0.9780 0.0081
20%
PCR 0.9011 0 0.9691 0.9505 0.0183
PLS 0.9451 0 0.9828 0.9725 0.0102
10%
PCR 0.8571 0 0.9553 0.9286 0.0265
PLS 0.9231 0 0.9759 0.9615 0.0143
5%
PCR 0.7802 0 0.9313 0.8901 0.0407
Table 8. Results of PLS- and PCR-based DEWMA charts under bias sensors fault in pyranometer.
Bias DEWMA
Sensor (B) Method
PLS 0.9622 0 0.9776 0.9811 0.0224
50%
PCR 0.9553 0 0.9735 0.9777 0.0265
PLS 0.9588 0 0.9756 0.9794 0.0244
40%
PCR
PLS 0.9313 0
0.9670 0 0.9593
0.9897 0.9656
0.9835 0.0407
0.0061
30% PLS
30% PCR0.9313 0
0.9231 0 0.9593
0.9759 0.9656
0.9615 0.0407
0.0143
PCR 0.9038 0 0.9430 0.9519 0.0570
PLS 0.9560 0 0.9863 0.9780 0.0081
20% PLS 0.9003 0 0.9409 0.9502 0.0591
20% PCR0.8729 0.9011 0 0.9691 0.9505 0.0183
PCR 0 0.9246 0.9364 0.0754
PLS PLS 0.7388 0.9451
0 0 0.9828
0.8452 0.9725
0.8694 0.0102
0.1548
10%
10%
PCR PCR0.7457 0.8571
0 0 0.9553
0.8493 0.9286
0.8729 0.0265
0.1507
PLS PLS 0.7216 0.9231
0 0 0.9759
0.8350 0.9615
0.8608 0.0143
0.1650
5%5%
PCR
PCR0.6976 0
0.7802 0 0.8208
0.9313 0.8488
0.8901 0.1792
0.0407
Figure 1818 depicts

depictsthe
theAUCAUCvalues
valuesofofthethe
twotwo charts
charts forfor
the the considered
considered faults.
faults. The The
main
main conclusion
conclusion from from Figure
Figure 18 is 18 is that
that when when the amplitude
the amplitude of the
of the occurred
occurred fault
fault is small,
is small, the
the PLS-TEWMA
PLS-TEWMA scheme
scheme is better
is better thanthan
thethe PCR-TEWMAscheme.
PCR-TEWMA scheme.On Onthe
theother
otherhand,
hand, for
for a
a large
large fault,
fault, thetwo
the twoschemes
schemesare arecomparable.
comparable.
Figure 18.
Figure 18. AUC
AUCvalues
valuesbybyPLS-
PLS-and PCR-based
and TEWMA
PCR-based charts
TEWMA for different
charts bias magnitudes
for different in thein
bias magnitudes
pyranometer.
the pyranometer.
For comparison purposes, we applied PLS- and PCR-based double EWMA

(DEWMA) charts to detect bias sensors fault in the pyranometer (Table 8). Here, we also
used the KDE approach to estimate the decision thresholds for the two charts, which en-
ables more detection flexibility. From Table 8, we observe that for abnormal changes with
Energies 2022, 15, 7978 19 of 22
For comparison purposes, we applied PLS- and PCR-based double EWMA (DEWMA)
charts to detect bias sensors fault in the pyranometer (Table 8). Here, we also used the
KDE approach to estimate the decision thresholds for the two charts, which enables more
detection flexibility. From Table 8, we observe that for abnormal changes with large
magnitudes, the two approaches (i.e., PLS-DEWMA and PCR-DEWMA) achieved relatively
comparable performance, while for small changes, the PLS-DEWM chart outperformed the
PCR-DEMWA approach.
From Tables 7 and 8, we observe that the PLS- and PCR-based TEWMA charts showed
improved detection ability compared to the PLS- and PCR-based DEWMA chart, particu-
larly for small faults. At the same time, they are relatively comparable for moderate and
large faults. Overall, the proposed PLS-TEWMA approach is more effective in detecting
small faults compared to the other charts. This is mainly due to the capacity of the TEWMA
chart to sense small changes in data. However, the results revealed that the PLS-TEWMA
approach can sense small changes but with some missed detection (TPR = 0.9231) (Table 7).
This could be attributed to the used PLS model, which is a linear model, that is not suited
to capture process nonlinearity. Thus, we recommend adopting nonlinear models, such as
nonlinear PLS, to enhance the detection capability of the proposed approach.
6. Conclusions
This study is within a data-driven anomaly detection framework for PV systems
monitoring. An efficient strategy combining latent variable regression methods (i.e., PCR
and PLS) with the TEWMA monitoring chart has been developed to detect anomalies in the
DC sides of PV systems. Specifically, LVR methods have been built using training data and
then used to generate residuals, which are used as anomaly indicators. Here, the TEWMA
chart is applied to residuals for fault detection. The decision threshold in the TEWMA chart
is determined using a KDE-based approach to allow more flexibility and improve detection
accuracy. Several real anomalies and simulated sensor faults have been considered to study
the performance of the proposed approaches. Results revealed that the PLS-TEWMA chart
provided superior detection performance compared to that of the PCR-TEWMA chart and
PLS- and PCR-based DEWMA charts, particularly for small faults.
In future work, we plan to improve the proposed technique by incorporating machine
learning-based classification methods to discriminate between the detected anomalies.
Another direction of improvement consists of amalgamating the benefit of deep-learning
models [45,63] and the detection capacity of the TEWMA to sense small changes. In
addition, it will be interesting to investigate the detection capability of this approach
in uncovering other types of sensor anomalies, such as sensor degradation anomalies,
sensor-freezing anomalies, and multiple sensor anomalies that affect different sensors
(e.g., pyranometers and thermocouples) simultaneously. Data from PV plants is tainted
with noise, which can degrade the detection performance by increasing the number of
false alarms rate in the case of a low signal-to-noise ratio. We also plan in future work to
construct a robust detection approach that combines wavelet multiscale presentation with
the TEWMA chart.
Author Contributions: B.B.: Conceptualization, formal analysis, investigation, methodology, soft-

ware, writing—original draft, and writing—review and editing. F.H.: Conceptualization, formal
analysis, investigation, methodology, software, supervision, writing—original draft, and writing—
review and editing. B.T.: Formal analysis, software, funding acquisition, writing—original draft, and
writing—review and editing. Y.S.: Investigation, conceptualization, formal analysis, methodology,
writing—review and editing, and supervision. A.H.A.: project administration, funding acquisition,
and writing—review and editing. All authors have read and agreed to the published version of
the manuscript.
Funding: This work was supported by funding from the King Abdullah University of Science and
Technology (KAUST), Office of Sponsored Research (OSR), under Award No: OSR-2019-CRG7-3800,
and from the Centre de Développement des Energies Renouvelables (CDER), Direction Générale de
la Recherche Scientifique et du Développement Technologique (DGRSDT).
Energies 2022, 15, 7978 20 of 22
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Ahmed, W.; Sheikh, J.A.; Farjana, S.H.; Mahmud, M.A.P. Defects Impact on PV System GHG Mitigation Potential and Climate
Change. Sustainability 2021, 13, 7793. [CrossRef]
2. IEA PVPS. Trends in Photovoltaic Applications. 2021. Available online: https://tecsol.blogs.com/files/iea-pvps-trends-report-
2021-1.pdf (accessed on 23 October 2022).
3. Garoudja, E.; Harrou, F.; Sun, Y.; Kara, K.; Chouder, A.; Silvestre, S. Statistical fault detection in photovoltaic systems. Sol. Energy
2017, 150, 485–499. [CrossRef]
4. Alam, M.K.; Khan, F.; Johnson, J.; Flicker, J. A comprehensive review of catastrophic faults in PV arrays: Types, detection, and
mitigation techniques. IEEE J. Photovolt. 2015, 5, 982–997. [CrossRef]
5. Hare, J.; Shi, X.; Gupta, S.; Bazzi, A. Fault diagnostics in smart micro-grids: A survey. Renew. Sustain. Energy Rev. 2016, 60,
1114–1124. [CrossRef]
6. Ye, J.; Giani, A.; Elasser, A.; Mazumder, S.K.; Farnell, C.; Mantooth, H.A.; Kim, T.; Liu, J.; Chen, B.; Seo, G.S.; et al. A Review of
Cyber–Physical Security for Photovoltaic Systems. IEEE J. Emerg. Sel. Top. Power Electron. 2021, 10, 4879–4901. [CrossRef]
7. Wang, W.; Harrou, F.; Bouyeddou, B.; Senouci, S.M.; Sun, Y. A stacked deep learning approach to cyber-attacks detection in
industrial systems: Application to power system and gas pipeline systems. Clust. Comput. 2022, 25, 561–578. [CrossRef]
8. Wang, W.; Harrou, F.; Bouyeddou, B.; Senouci, S.M.; Sun, Y. Cyber-attacks detection in industrial systems using artificial
intelligence-driven methods. Int. J. Crit. Infrastruct. Prot. 2022, 38, 100542. [CrossRef]
9. Le, M.; Nguyen, D.K.; Dao, V.D.; Vu, N.H.; Vu, H.H.T. Remote anomaly detection and classification of solar photovoltaic modules
based on deep neural network. Sustain. Energy Technol. Assess. 2021, 48, 101545. [CrossRef]
10. Janarthanan, R.; Maheshwari, R.U.; Shukla, P.K.; Shukla, P.K.; Mirjalili, S.; Kumar, M. Intelligent detection of the PV faults based
on artificial neural network and type 2 fuzzy systems. Energies 2021, 14, 6584. [CrossRef]
11. Harrou, F.; Saidi, A.; Sun, Y.; Khadraoui, S. Monitoring of photovoltaic systems using improved kernel-based learning schemes.
IEEE J. Photovolt. 2021, 11, 806–818. [CrossRef]
12. Mellit, A.; Tina, G.M.; Kalogirou, S.A. Fault detection and diagnosis methods for photovoltaic systems: A review. Renew. Sustain.
Energy Rev. 2018, 91, 1–17. [CrossRef]
13. Madeti, S.R.; Singh, S.N. Modeling of PV system based on experimental data for fault detection using kNN method. Sol. Energy
2018, 173, 139–151. [CrossRef]
14. Badr, M.M.; Hamad, M.S.; Abdel-Khalik, A.S.; Hamdy, R.A.; Ahmed, S.; Hamdan, E. Fault Identification of Photovoltaic Array
Based on Machine Learning Classifiers. IEEE Access 2021, 9, 159113–159132. [CrossRef]
15. Benkercha, R.; Moulahoum, S. Fault detection and diagnosis based on C4. 5 decision tree algorithm for grid connected PV system.
Sol. Energy 2018, 173, 610–634. [CrossRef]
16. Harrou, F.; Taghezouit, B.; Sun, Y. Robust and flexible strategy for fault detection in grid-connected photovoltaic systems. Energy
Convers. Manag. 2019, 180, 1153–1166. [CrossRef]
17. Harrou, F.; Dairi, A.; Taghezouit, B.; Sun, Y. An unsupervised monitoring procedure for detecting anomalies in photovoltaic
systems using a one-class Support Vector Machine. Sol. Energy 2019, 179, 48–58. [CrossRef]
18. Harrou, F.; Sun, Y.; Taghezouit, B.; Saidi, A.; Hamlati, M.E. Reliable fault detection and diagnosis of photovoltaic systems based
on statistical monitoring approaches. Renew. Energy 2018, 116, 22–37. [CrossRef]
19. Huang, C.M.; Chen, S.J.; Yang, S.P. A Parameter Estimation Method for a Photovoltaic Power Generation System Based on a
Two-Diode Model. Energies 2022, 15, 1460. [CrossRef]
20. Kang, B.K.; Kim, S.T.; Bae, S.H.; Park, J.W. Diagnosis of output power lowering in a PV array by using the Kalman-filter algorithm.
IEEE Trans. Energy Convers. 2012, 27, 885–894. [CrossRef]
21. Skomedal, Å.F.; Øgaard, M.B.; Haug, H.; Marstein, E.S. Robust and fast detection of small power losses in large-scale PV systems.
IEEE J. Photovolt. 2021, 11, 819–826. [CrossRef]
22. Spataru, S.; Sera, D.; Kerekes, T.; Teodorescu, R. Diagnostic method for photovoltaic systems based on light I–V measurements.
Sol. Energy 2015, 119, 29–44. [CrossRef]
23. Chouder, A.; Silvestre, S. Automatic supervision and fault detection of PV systems based on power losses analysis. Energy
Convers. Manag. 2010, 51, 1929–1937. [CrossRef]
24. Hou, Z.S.; Wang, Z. From model-based control to data-driven control: Survey, classification and perspective. Inf. Sci. 2013, 235,
3–35. [CrossRef]
25. Taghezouit, B.; Harrou, F.; Sun, Y.; Arab, A.H.; Larbes, C. Multivariate statistical monitoring of photovoltaic plant operation.
Energy Convers. Manag. 2020, 205, 112317. [CrossRef]
Energies 2022, 15, 7978 21 of 22
26. Fadhel, S.; Delpha, C.; Diallo, D.; Bahri, I.; Migan, A.; Trabelsi, M.; Mimouni, M.F. PV shading fault detection and classification
based on IV curve using principal component analysis: Application to isolated PV system. Sol. Energy 2019, 179, 1–10. [CrossRef]
27. Amaral, T.G.; Pires, V.F.; Pires, A.J. Fault detection in PV tracking systems using an image processing algorithm based on PCA.
Energies 2021, 14, 7278. [CrossRef]
28. Deceglie, M.G.; Micheli, L.; Muller, M. Quantifying soiling loss directly from PV yield. IEEE J. Photovolt. 2018, 8, 547–551.
[CrossRef]
29. Kiliç, H.; Gumus, B.; Khaki, B.; Yilmaz, M.; Palensky, P.; Authority, P. A Robust Data-Driven Approach for Fault Detection in
Photovoltaic Arrays. In Proceedings of the 10th IEEE PES Innovative Smart Grid Technologies Europe, ISGT-Europe 2020, Virtual,
26–28 October 2020.
30. Xiong, Q.; Liu, X.; Feng, X.; Gattozzi, A.L.; Shi, Y.; Zhu, L.; Ji, S.; Hebner, R.E. Arc fault detection and localization in photovoltaic
systems using feature distribution maps of parallel capacitor currents. IEEE J. Photovolt. 2018, 8, 1090–1097. [CrossRef]
31. Kumar, B.P.; Ilango, G.S.; Reddy, M.J.B.; Chilakapati, N. Online fault detection and diagnosis in photovoltaic systems using
wavelet packets. IEEE J. Photovolt. 2017, 8, 257–265. [CrossRef]
32. Edun, A.S.; Kingston, S.; LaFlamme, C.; Benoit, E.; Scarpulla, M.A.; Furse, C.M.; Harley, J.B. Detection and localization of
disconnections in a large-scale string of photovoltaics using SSTDR. IEEE J. Photovolt. 2021, 11, 1097–1104. [CrossRef]
33. Wang, M.H.; Lin, Z.H.; Lu, S.D. A Fault Detection Method Based on CNN and Symmetrized Dot Pattern for PV Modules. Energies
2022, 15, 6449. [CrossRef]
34. Cui, F.; Tu, Y.; Gao, W. A Photovoltaic System Fault Identification Method Based on Improved Deep Residual Shrinkage Networks.
Energies 2022, 15, 3961. [CrossRef]
35. Harrou, F.; Taghezouit, B.; Sun, Y. Improved kNN-based monitoring schemes for detecting faults in PV systems. IEEE J. Photovolt.
2019, 9, 811–821. [CrossRef]
36. Harrou, F.; Taghezouit, B.; Khadraoui, S.; Dairi, A.; Sun, Y.; Hadj Arab, A. Ensemble Learning Techniques-Based Monitoring
Charts for Fault Detection in Photovoltaic Systems. Energies 2022, 15, 6716. [CrossRef]
37. Karatepe, E.; Hiyama, T. Controlling of artificial neural network for fault diagnosis of photovoltaic array. In Proceedings of the
2011 16th International Conference on Intelligent System Applications to Power Systems, Hersonissos, Greece, 25–28 September
2011; pp. 1–6.
38. Chen, S.Q.; Yang, G.J.; Gao, W.; Guo, M.F. Photovoltaic fault diagnosis via semisupervised ladder network with string voltage
and current measures. IEEE J. Photovolt. 2020, 11, 219–231. [CrossRef]
39. Gaggero, G.B.; Rossi, M.; Girdinio, P.; Marchese, M. Detecting System Fault/Cyberattack within a Photovoltaic System Connected
to the Grid: A Neural Network-Based Solution. J. Sens. Actuator Netw. 2020, 9, 20. [CrossRef]
40. Parvez, I.; Aghili, M.; Sarwat, A.I.; Rahman, S.; Alam, F. Online power quality disturbance detection by support vector machine
in smart meter. J. Mod. Power Syst. Clean Energy 2019, 7, 1328–1339. [CrossRef]
41. Madakyaru, M.; Harrou, F.; Sun, Y. Monitoring distillation column systems using improved nonlinear partial least squares-based
strategies. IEEE Sens. J. 2019, 19, 11697–11705. [CrossRef]
42. Kourti, T.; MacGregor, J.F. Process analysis, monitoring and diagnosis, using multivariate projection methods. Chemom. Intell. Lab.
Syst. 1995, 28, 3–21. [CrossRef]
43. Geladi, P.; Kowalski, B.R. Partial least-squares regression: A tutorial. Anal. Chim. Acta 1986, 185, 1–17. [CrossRef]
44. Harrou, F.; Sun, Y.; Madakyaru, M. Kullback-leibler distance-based enhanced detection of incipient anomalies. J. Loss Prev. Process
Ind. 2016, 44, 73–87. [CrossRef]
45. Harrou, F.; Sun, Y.; Hering, A.S.; Madakyaru, M.; Dairi, A. Linear Latent Variable Regression (LVR)-Based Process Monitoring; Elsevier
BV: Amsterdam, The Netherlands, 2021.
46. Li, B.; Morris, J.; Martin, E.B. Model selection for partial least squares regression. Chemom. Intell. Lab. Syst. 2002, 64, 79–89.
[CrossRef]
47. MacGregor, J.F.; Kourti, T. Statistical process control of multivariate processes. Control Eng. Pract. 1995, 3, 403–414. [CrossRef]
48. Madakyaru, M.; Harrou, F.; Sun, Y. Improved data-based fault detection strategy and application to distillation columns. Process
Saf. Environ. Prot. 2017, 107, 22–34. [CrossRef]
49. Wang, X.; Kruger, U.; Lennox, B. Recursive partial least squares algorithms for monitoring complex industrial processes. Control.
Eng. Pract. 2003, 11, 613–632. [CrossRef]
50. Ahn, S.J.; Lee, C.J.; Jung, Y.; Han, C.; Yoon, E.S.; Lee, G. Fault diagnosis of the multi-stage flash desalination process based on
signed digraph and dynamic partial least square. Desalination 2008, 228, 68–83. [CrossRef]
51. Bouyeddou, B.; Harrou, F.; Saidi, A.; Sun, Y. An Effective Wind Power Prediction using Latent Regression Models. In Proceedings
of the 2021 International Conference on ICT for Smart Society (ICISS), Bandung, Indonesia, 2–4 August 2021; pp. 1–6.
52. Frank, L.E.; Friedman, J.H. A statistical view of some chemometrics regression tools. Technometrics 1993, 35, 109–135. [CrossRef]
53. Harrou, F.; Nounou, M.N.; Nounou, H.N. Detecting abnormal ozone levels using PCA-based GLR hypothesis testing. In
Proceedings of the 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Singapore, 16–19 April 2013;
pp. 95–102.
54. Alevizakos, V.; Chatterjee, K.; Koukouvinos, C. The triple exponentially weighted moving average control chart. Qual. Technol.
Quant. Manag. 2021, 18, 326–354. [CrossRef]
Energies 2022, 15, 7978 22 of 22
55. Alevizakos, V.; Chatterjee, K.; Koukouvinos, C. A nonparametric triple exponentially weighted moving average sign control
chart. Qual. Reliab. Eng. Int. 2021, 37, 1504–1523. [CrossRef]
56. Mahmoud, M.A.; Woodall, W.H. An evaluation of the double exponentially weighted moving average control chart. Commun.
Stat.—Simul. Comput. 2010, 39, 933–949. [CrossRef]
57. Zhang, L.; Chen, G. An extended EWMA mean chart. Qual. Technol. Quant. Manag. 2005, 2, 39–52. [CrossRef]
58. Zhang, L.; Bebbington, M.S.; Govindaraju, K.; Lai, C.D. Composite EWMA control charts. Commun. Stat.-Simul. Comput. 2004, 33,
1133–1158. [CrossRef]
59. Martin, E.; Morris, A. Non-parametric confidence bounds for process performance monitoring charts. J. Proc. Control 1996, 6,
349–358. [CrossRef]
60. Chen, Y.C. A tutorial on kernel density estimation and recent advances. Biostat. Epidemiol. 2017, 1, 161–187. [CrossRef]
61. Mugdadi, A.R.; Ahmad, I.A. A bandwidth selection for kernel density estimation of functions of random variables. Comput. Stat.
Data Anal. 2004, 47, 49–62. [CrossRef]
62. Harrou, F.; Khaldi, B.; Sun, Y.; Cherif, F. An efficient statistical strategy to monitor a robot swarm. IEEE Sens. J. 2019, 20, 2214–2223.
[CrossRef]
63. Dairi, A.; Harrou, F.; Sun, Y.; Khadraoui, S. Short-term forecasting of photovoltaic solar power production using variational
auto-encoder driven deep learning approach. Appl. Sci. 2020, 10, 8400. [CrossRef]

Energies 15 07978

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Energies 15 07978

Uploaded by

Copyright:

Available Formats

energies

Abstract: Fault detection is a necessary component to perform ongoing monitoring of photovoltaic

Received: 5 October 2022

Energies 2022, 15, 7978. https://doi.org/10.3390/en15217978 https://www.mdpi.com/journal/energies

3. Materials and Methods

For the measure of tilted irradiance at 27 ◦ C, a pyranometer (kipp&zonen CM11) and

Then the double EWMA statistic, DEi , is expressed as:

DEi = λ2 Ei + (1 − λ2 )DEi−1 . (8)

Finally, the Triple TEWMA statistic, TEi can be calculated as follow:

TEi = λ3 DEi + (1 − λ3 )TEi−1 . (9)

Similarly, DEi and TEi can be written as [54,56]:

After replacing Ei in Equation (13), DEi can be expressed in terms of xi as:

After that, by substituting (16) in (14), TEi is expressed as [54]:

3.4. KDE-TEWMA (Kernel Density Estimation TEWMA)

h = 1.06 σn−0.2 . (22)

Importantly, in the nonparametric procedure, firstly, the distribution of the TEWMA

Figure 6. Five days of the studied

Energies 2022, 15, x FOR PEER REVIEW 11 of 22

Figure 7. Distribution of the

Figure 8. Correlation matrix of training PV data.

Energies 2022, 15, 7978 15 of 22

Method TPR FPR Accuracy AUC EER

We now investigate the monitoring efficiency of the LVR-TEWMA methods in the

Figure 16. DetectionFigure

Method TPR FPR Accuracy AUC EER

Figure 17. Detection Figure

Figure 1818 depicts

For comparison purposes, we applied PLS- and PCR-based double EWMA

Author Contributions: B.B.: Conceptualization, formal analysis, investigation, methodology, soft-

Institutional Review Board Statement: Not applicable.

You might also like