Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Information Sciences 643 (2023) 119235

Contents lists available at ScienceDirect

Information Sciences
journal homepage: www.elsevier.com/locate/ins

Semi-supervised federated learning on evolving data streams


Cobbinah B. Mawuli a,b , Jay Kumar a , Ebenezer Nanor a , Shangxuan Fu a ,
Liangxu Pan a , Qinli Yang a,b , Wei Zhang c , Junming Shao a,b,∗
a
Data Mining Lab, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
b
Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou 313001, China
c
Southwest China Research Institute of Electronic Equipment, Science and Technology on Electronic Information Control Laboratory, Chengdu,
Sichuan, China

A R T I C L E I N F O A B S T R A C T

Keywords: Federated learning allows multiple clients to jointly train a model on their private data without
Federated learning revealing their local data to a centralized server. Thereby, federated learning has attracted
Prototype-based learning increasing attention in recent years, and many algorithms have been proposed. However, existing
Data stream
federated learning algorithms often focus on static data, which tend to fail in the scenarios
Concept drift
Semi-supervised learning
of data streams. Due to the varying distributions/concepts within and among the clients over
time, the joint learning model must learn these different emerging concepts dynamically and
simultaneously. The task becomes more challenging when the continuous arriving data are
partially labeled for the participating clients. In this paper, we propose SFLEDS (Semi-supervised
Federated Learning on Evolving Data Streams), a new federated learning prototype-based method
to tackle the problems of label scarcity, concept drift, and privacy preservation in the federated
semi-supervised evolving data stream environment. Extensive experiments show that SFLEDS
outperforms both state-of-the-art semi-supervised and supervised algorithms. The source code for
the proposed method is publicly available on github (https://github.com/mvisionai/FedLimited).

1. Introduction

With the advent of the Internet of Things (IoT), the number of intelligent devices has rapidly grown in recent years. Many of
these devices usually collect data in an unprecedented large scale and private environments. As a result, sharing this data with a
centralized entity to train a learning model is usually time and resources consuming.
Federated learning (FL) solves this issue as it allows multiple clients to collaboratively train a learning model on their private
data without any participants leaking their data to a centralized server [17]. While the client data in FL is typically assumed to
be static, in many real-world scenarios such as data generation in IoT, data arrives continuously and is dynamic. This leaves us
facing the following dilemma: How will we handle FL in the evolving data streams environment where data arrives continuously
and time and storage are of serious concern? Moreover, in the real-world scenario, the participating clients will not have all their
data streams instances fully labeled (label scarcity). Furthermore, the data stream at each client-side usually has different stream

*Corresponding author at: Data Mining Lab, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu
611731, China.
E-mail addresses: cobbinahben@std.uestc.edu.cn (C.B. Mawuli), jay@std.uestc.edu.cn (J. Kumar), sireben21@ieee.org (E. Nanor), fushangxuan@std.uestc.edu.cn
(S. Fu), liangxv.pan@gmail.com (L. Pan), qinli.yang@uestc.edu.cn (Q. Yang), rubydirt@sohu.com (W. Zhang), junmshao@uestc.edu.cn (J. Shao).

https://doi.org/10.1016/j.ins.2023.119235
Received 5 July 2022; Received in revised form 8 March 2023; Accepted 23 May 2023
Available online 29 May 2023
0020-0255/© 2023 Published by Elsevier Inc.
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

distributions/concepts over time, known as concept drift [14]. As a typical challenge on data streams, concept drift occurs when
there are unpredictable changes in the underlying distribution of streaming data over time. However, concept drift becomes more
challenging in the federated evolving data streams environment since the underlying distribution of data changes both within a
client and among all the participating clients. With the above premises, we summarize the challenges of current federated learning
algorithms in regard to the data streams environment as follows:
Challenge 1: Non-stationary Environment. There are many FL algorithms in the literature, for example, FedAvg [4], FedProx
[25], FedBN [26] and the like. While these algorithms have presented great potential in the static environment, they have a challenge
in the practical settings where data arrives continuously. Specifically, these algorithms need a considerate amount of time to train
on all available data batches through several iterations for optimal convergence. This is permissible in the static environment where
all data is available before training begins. But in the data streams environment, new data instances are constantly arriving, and
complete information on all data points is not available. Almost all the existing FL algorithms are not designed specifically for this
scenario. A new FL algorithm will be beneficial when clients’ data are constantly arriving and there is a constraint on time.
Challenge 2: Label Scarcity. Although, popular semi-supervised FL algorithms [19,29] have been designed to handle partially
labeled data in the distributed setting, the task becomes increasingly challenging when the continuous nature of the arriving data
needs to be considered in handling delayed/limited labels [41]. These FL algorithms [19] need enough information on the unlabeled
data instances in the typical FL setting, but the delayed information on these unlabeled data instances and the high velocity of
arriving data streams make it impractical for the current FL algorithms.
Challenge 3: Varying Distributed Concept Drifts. Several algorithms have been proposed to solve the issue of statistical
heterogeneity in FL [44], where clients’ data are termed to be non-independent and identically distributed (non-IDD). Whiles this
non-IDD issue is assumed to be stationary (not recurring, not gradual, not abrupt, etc.), it is still a serious critical problem for
researchers as they race to find a lasting solution. This statistical heterogeneity evolves with time in the data stream environment
and becomes more complex to handle when the drift occurs among different distributed sites and within sites.
To address these challenges, we propose a new prototype-based FL framework SFLEDS (Semi-supervised Federated Learning on
Evolving Data Streams), which learns on evolving data streams and handles the problem of distributed concept drift, label scarcity
and privacy preservation simultaneously. The contributions of this paper can be summarized as follows.

• Effective Prototype-based data Representation: Unlike different FL approaches which target the averaging of clients model
parameters, we propose a new prototype-based approach for federated semi-supervised data streams learning which is very
efficient and effective. Specifically, we leverage and build upon micro-clustering to summarize incoming streaming data instances
by dynamically maintaining a set of distributed representative micro-clusters/prototypes using an error-driven technique to
capture intra-and-inter client concept drifts. Prototype-based data maintenance is a technique that improves data abstraction
by dynamically capturing the underlying local and global data distributions of all streams. This helps to address the challenges
of label scarcity and non-stationary nature in the federated learning setting. By utilizing this approach, it is possible to more
effectively solve these challenges and improve the overall effectiveness of the federated learning process for evolving data
streams.
• Robust Inter-client Concept Consistency: We introduce a probabilistic inter-client server consistency matching method for
federated data stream learning, which learns inter-client prototype consistency between multiple clients to select similar proto-
types to complement client inference decisions. This method robustly solves the challenge of varying distributed concept drifts
in the distributed evolving data streams setting.
• Client Privacy Preservation: A secret sharing prototype-based mapping is introduced to preserve privacy among participating
clients in the distributed data streams setting. This method is simple yet effective and efficient. We innovatively adapt or modify
the secret sharing technique to effectively preserve the privacy of the reliable prototypes of each client communicated to the
central server in the context of distributed evolving data streams. It is computationally efficient and ideal for federated semi-
supervised data streaming tasks, since conventional data streaming tasks are memory and computation constrained.
• High Performance: The proposed method has been empirically validated and has outperformed many state-of-the-art methods
when tested on both synthetic and real-world data sets in the context of distributed evolving data streams. In our experi-
mental setting, we focused on several clients (i.e., 𝐾 = 10, 30) performing a distributed evolving data stream semi-supervised
task. Furthermore, we thoroughly compared our work to 15 other methods in the distributed client setting, demonstrating the
effectiveness and robustness of the proposed method.

The rest of this paper is organized as follows: Section 2 discusses related works. Section 3 introduces the problem statement.
Section 4 presents the proposed algorithm. Section 5 details several experiments that were conducted. Finally, Section 6 presents the
conclusion and outlines future research.

2. Related work

In this section, we start with a brief review of federated learning, and then introduce semi-supervised learning and distributed
data stream mining, which are highly related to our work.

2
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

2.1. Federated learning

FL is an efficient and secure framework for multiple clients to collaboratively train a global model on their private data. The
stochastic gradient descent (SGD) algorithm has been dominant in the training of FL algorithms [20]. Specifically, federated stochastic
gradient descent (FedSGD) [27] is the direct application of stochastic gradient descent (SGD) to the federated environment in which
the gradients are calculated on all local data of clients. The server then averages the gradients and uses them to perform a gradient
descent step. Federated averaging (FedAvg) [4] is a generalization of FedSGD in which the updated weights are exchanged instead
of the gradients. FedAvg have showed tremendous benefits where clients’ data are independent and identically distributed (IDD).
In the non-IDD scenario, it performs woefully. In literature, FL algorithms such as FedMA [44], FedProx [25], and FedBN [26]
were proposed to handle the challenge of non-IDD in the FL setting. Also, to forget catastrophic learning in FL setting in a form
of continual learning, Fed-ADP [46] was proposed. It decomposes parameters additively to avoid catastrophic forgetting. In fact,
dealing with non-iid distributions is to be one of the most critical challenges in FL. The main shortcoming with these FL works in the
context of evolving data streams is their inability to effectively handle gradual changes in the statistical heterogeneity of their local
data over time. Also, sharing massive model data among all users requires both large storage space as well as stable communication
between users and the server. Importantly, one challenge with using these deep learning-based federated learning techniques for data
stream mining is the time-consuming optimization process of the objective function, which can hinder real-time processing. From
FL literature survey [48], it becomes obvious that almost all existing federated learning algorithms are designed to target: (1) deep
learning models, and (2) the static environment where data is not evolving.
Maintaining user privacy is another critical aspect of research in FL systems. Although the current FL framework offers some
level of privacy protection for the clients’ data, ensuring the security of clients’ model parameters requires additional measures. Two
main approaches have been proposed to address this issue. The first approach involves keeping the training data on the user device,
allowing users to update their individual models locally. The second approach involves securely aggregating local models at the
central server to update the global model. This can be achieved through a secure aggregation process, which involves users masking
their local model updates using random masks, as proposed by Bonawitz et al. [5]. In this protocol, each user conceals their local
update by applying additive secret sharing. They use private and pairwise random keys to mask their update before transmitting
it to the server. After aggregating the masked models, the server can learn the aggregate of all user models as the additional
randomness cancels out. At the end of the process, the server does not obtain any information about individual models except the
aggregated model since they are concealed using random keys which are unknown to the server. Several other FL algorithms for
secure aggregation with additive masking have been proposed [12,15]. Furthermore, various privacy-preserving techniques such
as differential privacy and homomorphic encryption schemes have become popular in FL [45]. However, these techniques have
limitations in terms of accuracy trade-offs and computational expenses, which make them less than ideal for distributed and evolving
data streams.
In this work, we formulate a new FL approach for evolving data streams using prototype-based representation to solve these chal-
lenges the traditional FL algorithms face, such as their inability to adapt to changing concepts and their computational inefficiency.
As an added advantage, we leverage the additive secret sharing protocol to secure the representative prototypes of the participating
clients in the FL framework instead of the model’s weights and biases in previous works.

2.2. Semi-supervised learning

Semi-supervised learning (SSL) seeks to learn from both unlabeled and labeled data. Over the past decades, various related
approaches have been proposed to tackle the challenges of SSL [11,42]. Two approaches have been extensively used in the
semi-supervised classification: Expectation-Maximization (EM) and predicting the labels of unlabeled instances [34]. In EM, the
expectation step learns the labels of unlabeled instances, and the maximization step updates the model [2]. For the second approach,
unlabeled examples’ labels are estimated first, then used to update the model alongside identified label instances. Clustering algo-
rithms are one type of technique in this area, and they identify a label of an instance based on which cluster it belongs [40,6].
Graph-based semi-supervised approaches, which follow the manifold assumption [6,33], where data instances are represented as
nodes, and edges indicate the similarity between the data instances, are also becoming more popular. A semi-supervised learning
approach based on harmonic functions and Gaussian random fields was proposed by Zhu et al. [50]. Self-training and Co-training
are two semi-supervised algorithms that are well-known [49]. Recently, sophisticated generative models have been proposed in
handling the issue of SSL by estimating the class of an unlabeled data via mixture models [31]. However, one potential issue with
these learning algorithms is that they may not be able to function if the required labeling cannot be obtained. Secondly, most of the
techniques assume that the data follows the same distribution, which may not hold in the case of distributed evolving data streams
where the data distribution is constantly changing with massive concept drift. Specifically, these previous approaches have focused
on static datasets or have not effectively addressed the issue of concept drift. This can limit the ability of the algorithms to learn and
adapt to changing concepts in the federated semi-supervised environment.
In this work, we propose a semi-supervised federated learning method for evolving data streams, which aims to address the
challenges of label deficiency and concept drift in the federated learning setting.

2.3. Distributed data stream mining

Data streams have infinite length and are generated at high speed. They are mostly evolving with the assumption that data
distributions dynamically change over time and are non-stationary. Many prior works in distributed data streams mining employ

3
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

a centralized model for mining multiple data streams without the consideration for the federated environment [8,37,9]. Such a
framework is limited in several aspects. To begin with, the centralized data stream mining can have a long response time. Secondly,
the central collection of data can result in a significant communication overhead. Liu et al. [28] presented a vertically distributed
online semi-supervised classification approach that is distributed but is only applicable to support vector machine classifiers. In
a similar vein, Kholod et al. [21] offered the approach of parallelizing Naive Bayes classifiers on horizontally distributed data.
Ensemble learning techniques that aggregate many local classifiers of any type are more popular in distributed data mining. Parker
et al. [36] handled distributed data streams using a system of stacked classifiers with the potential to learn new classes as they occur
over time. The most relevant approaches for comparison are ensembles of local classifiers with maximum confidence aggregation,
stacking, and voting [7]. However, not all ensemble aggregation algorithms are suited for online distributed stream learning since
it is time-consuming, often suffer from scalability issues, and not robust enough to learn the entire distribution for all participating
clients. Additionally, these distributed data streaming mining works poses the challenge in efficiently maintaining the privacy and
security of the data and the participating clients. It is also worth noting that these related works tend to assume that concept drift
arises from a single source, rather than considering the possibility of multiple sources contributing to the drift.
In contrast, our proposed method addresses these shortcomings by utilizing a prototype-based approach and incorporating dy-
namic inter-client consistency learning and an error-driven approach to adapt to changing concepts in the data. Also, an efficient
additive secret sharing mechanism is introduced on the prototypes to maintain the privacy of participating clients. As such, our work
represents a novel contribution to the field and offers a promising solution for semi-supervised federated learning on evolving data
streams.

3. Problem formulation

Assume there are 𝐾 clients and 𝐷 = {𝐷𝑘 }𝐾 𝑘=1


be a set of distributed streams for the participating clients. Each client has its own
streaming data set 𝐷𝑘 = {(𝑥𝑘𝑡 , 𝑦𝑘𝑡 )}∞𝑡=1
. Each instance 𝑥𝑘𝑡 is a 𝑑-dimensional vector containing a feature set and a class 𝑦𝑘𝑡 ∈ [∅, 1, ..., 𝐿],
where ∅ represents data instances with no corresponding labels and 𝑡 is the arrival time of the data instance. For the composition
of both label and unlabeled data instances, 𝐷𝑘 can also be depicted as 𝐷𝑘 = {𝑥𝑘,𝑠 1
, 𝑥𝑘,𝑠
2
, ..., 𝑥𝑘,𝑠 𝑘,𝑢 𝑘,𝑢
𝑎 , 𝑥1 , ..., 𝑥𝑏 }, where 𝑎 ∈ 𝑡 and 𝑏 ∈ 𝑡,
𝑘,𝑠 𝑘,𝑢
𝑎 + 𝑏 = ∞, and 𝑥 and 𝑥 are label and unlabeled data instances respectively. Besides, each client data stream 𝐷𝑘 has its data
distribution and evolving concepts 𝑡 (𝑥𝑘𝑡 , 𝑦𝑘𝑡 ). The existence of concept drift in evolving data streams presents two practical scenarios
in the federated domain at timestamp 𝑡 + 1:
Intra-client Concept Drift: if 0,𝑡 (𝑥𝑘𝑡 , 𝑦𝑘𝑡 ) ≠ 𝑡+1,∞ (𝑥𝑘𝑡 , 𝑦𝑘𝑡 ), denoted as ∃ ∶ 𝑃𝑡 (𝑥𝑘𝑡 , 𝑦𝑘𝑡 ) ≠ 𝑃𝑡+1 (𝑥𝑘𝑡 , 𝑦𝑘𝑡 ) for the same client 𝑘, where 𝑃𝑡 (𝑥𝑘𝑡 , 𝑦𝑘𝑡 )
is the probability distribution of data instance at time 𝑡 for client 𝑘
Inter-client Concept Drift: if 0,𝑡 (𝑥𝑖𝑡 , 𝑦𝑖𝑡 ) ≠ 𝑡+1,∞ (𝑥𝑗𝑡 , 𝑦𝑗𝑡 ), denoted as ∃ ∶ 𝑃𝑡 (𝑥𝑖𝑡 , 𝑦𝑖𝑡 ) ≠ 𝑃𝑡+1 (𝑥𝑗𝑡 , 𝑦𝑗𝑡 ) for two clients 𝑖 ∈ 𝐾 and 𝑗 ∈ 𝐾, where
𝑖 ≠ 𝑗.
Now our objective is to address the issue of label scarcity in federated learning on varying distributed concept drifts in evolving
data streams by proposing a solution called Semi-supervised Federated Learning on Evolving Data Streams (SFLEDS). In this approach,
data streams at the clients may or may not come with accompanying labels, while also preserving the privacy of the clients. Under
∑ ∑ 𝑘,𝑠 ⋃ 𝑘,𝑢
the federated semi-supervised framework, we have a dynamic online global model 𝑥𝑠 ,𝑥𝑢 ∈𝐷 ∶ 𝐾 𝑘=1 (𝑥𝑡 𝑥𝑡+1 ) → 𝑌 that adapts to
𝑡 𝑡+1
both local and distributed changing concepts (both Intra-client and Inter-client) to map each data stream instance to a class label. In
order to achieve our goal, we utilize both a global model  and a set of local models 𝑓𝑘 at each client. At timestamp 𝑡0 , each client
𝑘 trains its local model on a set of learned local prototypes 𝑀𝑘 = {𝑚𝑁 }𝑁 from an initial set of data at the client, 𝐷𝑘 . For each local
𝑘 𝑛=1
client prototypes, a set of significant prototypes 𝑀𝑝𝑘 ⊆ 𝑀𝑘 is incrementally learned with time. These representative prototypes are
masked with a secret sharing technique and then uploaded to a global server 𝐺 to form a set of global prototypes 𝑔 = {𝑀𝑝𝐾 }𝐾 𝑘=1
.
The goal is to collaborate with other clients’ uploaded prototypes and perform a dynamic inter-client consistency learning to learn a
global semi-supervised classifier  by performing privacy-preserving collaborative learning with all clients over each stream instance
while handling concept drifts.

4. Method

Here, we first give the overview of our proposed method SFLEDS, which is illustrated in Fig. 1 and Algorithm 4. A summary of
all symbols and annotations can be found in Table 1. We propose a prototype-based semi-supervised federated learning method that
addresses the challenges of label scarcity, varying distributed concept drifts, and privacy in distributed evolving concept drifting data
streams. The proposed algorithm utilizes k-means clustering to summarize stream instances as prototypes, introduces distributed
prototype techniques and performs a collaborative semi-supervised prediction task on arriving stream instances by leveraging both
a global model and a local model, performs a probabilistic inter-client server consistency matching technique to handle varying
distributed concept drift, and preserves the privacy of participating clients by securing the communication of reliable prototypes
between clients and the server using additive security sharing.
The framework involves five main components: (1) Client Stream and Model Initialization, (2) Collaborative Semi-supervised
Classification, (3) Inter-client Prototype Consistency Check, (4) Federated Semi-supervised Online Concept Maintenance, and (5)
Prototype Masking with Additive Secret Sharing. The subsequent subsections will elaborate on each part of the proposed method,
respectively.

4
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

Fig. 1. An overview of the proposed SFLEDS framework. 1, Each participating client builds an initial learning model by summarizing an initial data stream into
micro-clusters/prototypes using K-means clustering. 2, When a new stream instance arrives, the label is inferred through a federated prediction using both the local
and global models. 3, An inter-client consistency learning is performed on the uploaded representative prototypes to build a global model that supports each client’s
local stream classification. 4, Based on the outcome of the collaborative prediction, an error-driven technique is used to update the reliability of both the local and
global prototypes. A federated concept maintenance technique periodically handles obsolete prototypes and the challenge of limited labels. 5, The learned local
prototypes of each client with high representativeness are privacy preserved with a secret sharing technique and sent to the global server. See details in Section 4.

Table 1
The summary of symbols, variable and notations.

Symbol Definition

𝐷 Data stream
𝐷𝑘 Client 𝑘 data stream
𝐷𝑘𝑖 The 𝑖-th client data instance at time t for client 𝑘
𝐷𝑘𝑖𝑛𝑖𝑡 An initial set of client data stream
𝑥𝑘𝑡 The 𝑖-th data instance features at time t
𝑦𝑘𝑡 The 𝑖-th data instance label at time t
𝑑 Number of attributes
 Number of cluster components at server
 Number of clusters per class
𝜆 The decay rate
𝑚𝑘 A client prototype
𝑀𝑘 A set of client prototypes
𝑀𝑝𝑘 A set of representative client prototypes to be uploaded to server
𝑔 A set of global prototypes for collaborated learning
𝑠 A set of consistent global prototypes after inter-client consistency check
𝑐𝑠 The closet inferred prototype by global model
𝑀𝐾𝑐 The closet inferred prototype by client model
 Cluster features for storing the linear sum of data points in a cluster
 Cluster features for storing the squared sum of data points in a cluster
 Cluster feature for storing prototype reliability
 Cluster feature representing a unique client ID
 A cluster feature for storing micro-cluster radius
𝑓𝑘 Client Model; creates, maintains, preserves prototypes and makes prediction
 Global Model; maintains and preserves prototypes and makes prediction
𝑚𝑎𝑥𝑃 Controls maximum number of prototypes
𝐾 Number of clients

5
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

4.1. Client stream and model initialization

The design of our proposed framework starts with the initialization of the clients’ data streams and learning models. The client
stream and model initialization component divides the large data streams among participating clients using a data partitioning
strategy, summarizes stream instances as prototypes using k-means clustering, and constructs the initial model for each participating
client. For the clients’ data streams initialization, we create shards of distributed data streams from a single voluminous data stream
for all clients since data stream instances arrive with time. Specifically, for the single data stream 𝐷, we partitioned the data by time
intervals and distributed it to different clients using a sequential and alternating time interval data partitioning approach, where
each participating client 𝑘 received data instances one time interval at a time.
After the client stream partition, the model initialization algorithm is called on each client’s stream data 𝐷𝑘 . An initial number
𝑁𝑖𝑛𝑖𝑡 (𝑁𝑖𝑛𝑖𝑡 < |𝐷𝑘 |) of labeled data instances 𝐷𝑘𝑖𝑛𝑖𝑡 from 𝐷𝑘 is partitioned into a set of disjoint unique classes. For each disjoint class
in 𝐷𝑘𝑖𝑛𝑖𝑡 ,  clusters are created. The k-means algorithm is used to create the  clusters for the class instances due to its efficiency,
effectiveness, and simplicity. While we chose K-means for initialization clustering process, we acknowledge that other clustering
algorithms, such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), can also be used. As in [16], we then
summarize each class cluster into a prototype or micro-cluster 𝑚𝑘 = (, , , , , , 𝑁, 𝑇 , ) with nine (9) cluster features. The
cluster features  and  are the linear sum and squared sum of all data points in the cluster, respectively.  represents the radius
of the cluster,  is the center of the cluster,  being the class the cluster belongs,  means the representativeness of the cluster,
𝑁 denotes the number of data points in the cluster,  denotes the update time of the micro-cluster and  represents the client ID
the micro-cluster belongs. By default, the values of  and  are set to 0 and 1 respectively, before the start of the experiment.
Mathematically we define , ,  and  in Algorithm 1. These prototypes with labels are then used to build an initial model
for each client stream. The model initialization algorithm is represented in Algorithm 1.

Algorithm 1: Clients Model Initialization.


Input:
𝐷: Stream data,
: Number of clusters per class,
𝐾: Number of participating clients,
Output:
𝑓𝑘 : Client models {𝑓𝑘 , ..., 𝑓𝐾 }
1 𝑀𝑘 ← 𝑒𝑚𝑝𝑡𝑦;
2 foreach client 𝑘 ∈ 𝐾 in parallel do
3 𝐷𝑘 ← get client stream data ;
4 𝑓𝑘 ← empty;
5 Get 𝐷𝑘𝑖𝑛𝑖𝑡 from 𝐷𝑘 ;
/* Partition into L unique (number of classes) disjoint sets */
6 𝐿 ← 𝑃 𝐴𝑅𝑇 𝐼𝑇 𝐼𝑂𝑁(𝐷𝑘𝑖𝑛𝑖𝑡 );
7 foreach 𝑋𝑙 ∈ 𝐿 do
/* Perform k-means clustering on 𝑙 */
8  ← 𝑘-means(𝑙 , );
9 foreach 𝑖 ∈  do
/* Create prototypes and initialize model */
10 𝑁 ← |𝑖 |;

11  ← 𝑁 𝑥;
𝑗=1 𝑗

12  ← 𝑁 (𝑥
𝑗=1 𝑗
)2 ;
13  ← 1;
14  ← 𝐿𝑆∕𝑁 ;
 ← ( 𝑁×−
2
15
𝑁2
)1∕2 ;
16  ← 𝑘;
17  ← 0;
18 𝑚𝑘 ← (, , , , , , 𝑁, 𝑇 , );
19 𝑀𝑘 ← 𝑀𝑘 ∪ 𝑚𝑘 ;
20 end
21 end
/* build client model */
22 𝑓𝑘 ← 𝑓𝑘 ∪ {𝑀𝑘 };
23 end

6
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

4.2. Collaborative semi-supervised classification

The collaborative semi-supervised prediction task uses both the global model and the local model to classify incoming data stream
instances. Each client builds a KNN lazy-learning classification model 𝑓𝑘 with its prototypes 𝑀𝑘 after the client model initialization
stage. The client’s model 𝑓𝑘 serves as a warm-up model for the classification task.
When a new data instance 𝑥𝑘𝑡 arrives for each client, the data is first summarized in the form of a prototype 𝑚𝑡𝑘 and the raw stream
instance discarded. The classification model 𝑓𝑘 then finds the closest labeled pseudopoint (based on euclidean distance) 𝑀𝑘𝑐 ∈ 𝑀𝑘
whose centroid is nearest to 𝑥𝑘𝑡 . In the instance where there are available global consistent prototypes 𝑠 ⊆ 𝑔 learned from inter-
client prototype consistency check on the server, these prototypes are downloaded as helper prototypes to help in the classification
of the current stream instance. For the helper prototypes downloaded from the server, we also train a different prediction model 
on them, which helps in the final prediction on 𝑥𝑡 . The inferred closest global prototype by  on the 𝑠 prototypes is denoted as
𝑐𝑠 . The distances returned by the local model 𝑓𝑘 and the global model  for their predicted closest prototypes are compared, and
the lesser distance is chosen as the predicted prototype for the current stream instance. The creation and upload of representative
prototypes 𝑀𝑝𝑘 for each client to form 𝑔 is discussed in Subsection 4.3.

Definition 1 (Representative prototypes). We define the representative prototypes 𝑀𝑝𝑘 of a client as reliable 𝑀𝑘 if their significance
values (cluster features, ) are above a certain threshold 𝑊𝑡 ∗ at any user-specified time interval 𝑡𝑔𝑎𝑝 .


We assume that the class of the new stream instance is the label  cluster feature of 𝑀𝑘𝑐 𝑐𝑠 → 𝑚𝑐 . Based on the prediction
outcome, if the label is wrong and the stream instance falls outside the decision boundary, it is regarded as an outlier. According to
the idea of cohesion and separation, the outlier is outside the decision boundary, and it is distinct from existing examples and meets
the separation criterion. Such drifting instances are considered outliers stored in a buffer. We periodically detect enough outlier
instances in the buffer close to each other using a dynamic concept drifting technique in Subsection 4.4.
For each client stream learning, we calculate a reliability score for the closest inferred prototype 𝑚𝑐 for each instance prediction
based on the output of the model prediction. Also, there is a dynamical update of all prototypes (both at the local and the server-side)
using the fading function in Equation (1). Each client uploads its high representative prototypes to the global server for the inter-
client prototype consistency check to select similar prototypes to support each client’s local prototypes when a new stream instance
arrives.

𝑚𝑖 = 𝑚𝑖 × 2−𝜆×(𝑡−𝑡0 ) (1)


where 𝑡 is the current stream time instance and 𝑡0 is the last known time the reliability of the prototype 𝑚𝑖 was updated, and 𝜆 is the
decay rate.

4.3. Inter-client prototype consistency check

We introduce a simple but intuitive inter-client prototype consistency check on the server to learn similar prototypes for each
client as helper prototypes. Aforementioned in Subsection 4.2, each client uploads its representative prototypes to the server and
masks them with an addictive secret sharing technique. At each time instance 𝑡, the client model checks prototypes whose cluster
feature  has a high value and uploads it to the server while preserving the privacy of these prototypes using the secret sharing
technique discussed in Subsection 4.5.
The uploaded prototypes of each client are identified on the server with a unique masked ID ( cluster feature). The server
performs a distribution consistency matching for each client’s reliable prototypes with the other reliable prototypes of each client.
The inter-client consistency check assumes that similar prototypes share the same feature distribution and have a higher weighted
log-likelihood score than drifting prototypes. Specifically, thus, if two prototypes are intrinsically close, their distributions over
different Gaussian components are similar.

Definition 2 (Inter-client prototype consistency). The shared consistency for client 𝑖 ∈ 𝐾 representative prototypes 𝑖𝑔 ⊆ 𝑔 with
client 𝑗 ∈ 𝐾 representative prototypes 𝑗𝑔 ⊆ 𝑔 on the server is the weighted Gaussian mixture log-probabilities estimates on client
𝑗 prototypes when trained on the prototypes of client 𝑖.

The reason for estimating prototype similarity between clients with the maximum weighted likelihood is that many proposed
methods assume that instances belonging to the same class should have less euclidean distance between the centers of their cluster
feature and farther apart from the instances of the other classes. However, this assumption does not hold when the boundaries
between classes are quite near. Thereby, we considered this complex situation with a maximum weighted log-likelihood score which is
also supported in the euclidean space using the Gaussian mixture model, which utilizes the expectation-maximization [38] technique.
The learned mixture distribution from the client prototypes represents the probability distribution of all observations in the client’s
prototype space. Though each client uploads its reliable prototypes to the server, not all of these prototypes would be consistent
with the current concept of other clients. The representative server prototypes 𝑔 = {𝑘𝑔 , ..., 𝐾 𝐾 𝑘 𝑘 𝑘
𝑔 }𝑘=1 where 𝑔 = {𝑀𝑝,1 , ..., 𝑀𝑝,𝑛 }
are each clients representative prototypes on the server with their cluster feature  ≥ 𝑊𝑡 . When a local client 𝑘 requests global

representative prototypes from the server, the server trains a Gaussian mixture model on the current client’s prototypes and uses

7
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

the trained model’s feature space to infer the maximum weighted log-likelihood score from the other clients’ 𝐾 − 1 representative
prototypes. Mathematically, the model is trained on each client’s representative prototypes’ cluster features [ ∈ 𝑘𝑔 ] on the server
to optimize the log-likelihood function:
𝑁

(Θ) = log 𝑃 (|Θ) = log 𝑃 ( 𝑖 |Θ)
𝑖=1
𝑁
(𝑄 ) (2)
∑ ∑
= log 𝜋𝑞 𝑝𝑞 ( 𝑖 |𝜃𝑞 )
𝑖=1 𝑞=1

where 𝜋𝑞 represents the gaussian component prior, Θ = (𝜋1 , ..., 𝜋𝑄 , 𝜃1 , ..., 𝜃𝑄 ) model parameters, 𝑄 number of gaussian components, 𝜃𝑞

describes the Gaussian density function 𝑝𝑞 such that 𝑝𝑞 ( 𝑖 |𝜃𝑞 ) ∼  ( 𝑖 |𝜇𝑞 , Σ𝑞 ) and 𝑄 𝜋 = 1.
𝑞=1 𝑞
For optimal solution for Equation (2), using the Expectation-Maximization algorithm [32], the latent variable 𝑃 (𝑐𝑞 | 𝑖 ) is intro-
duced to realized Equation (3). 𝑃 (𝑐𝑞 | 𝑖 ) represents the probability of observation  belonging to the component 𝑐.
𝑁 ∑
∑ 𝑄
( )
(Θ) = 𝑃 (𝑐𝑞 | 𝑖 ) log 𝜋𝑞 + log  ( 𝑖 |𝜇𝑞 , Σ𝑞 ) (3)
𝑖=1 𝑞=1

After the global training for client 𝑘 representative prototypes, the optimized model is used to infer 𝑃 ( 𝑥 |𝑐𝑞 ), where 𝑥 ≠ 𝑘 of
the 𝐾 − 1 clients’ prototypes. We used a buffer to store the weighted likelihood scores (𝑃 ( 𝑥 |𝑐𝑞 )) along with the indexes of the
inferred prototypes of the other clients. After, these scores are sorted in their representative order and the first 𝑛 prototypes instances
from this buffer are selected as consistent prototypes to help the 𝑘 clients’ local prototypes predict the stream instance at its local
environment. The algorithm for the inter-client consistency check is presented in Algorithm 2. A running example of the inter-client
consistency check with five clients is also shown in Fig. 2.

Algorithm 2: Inter-client Consistency Check.


Input:
𝑔 : Global representative prototypes,
𝑘: Current local client,
: Number of cluster components,
𝑁 : Number of similar representative prototypes
Output:
𝑀𝑠 : Client consistent prototypes
/* Trained client server Model */
1 𝑀𝑜𝑑𝑒𝑙 ← 𝑇 𝑟𝑎𝑖𝑛𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛𝑆𝑝𝑎𝑐𝑒(𝑘𝑔 , ) using Eqn (3). ;
2 𝐵 ← 𝑏𝑢𝑓 𝑓 𝑒𝑟;
𝑗
3 foreach server reliable client prototypes 𝑔 ∈ 𝑔 do
4 foreach 𝑚𝑐 ∈ 𝑗𝑔
do
5 if 𝑙𝑎𝑏𝑒𝑙𝐸𝑥𝑖𝑠𝑡𝑠 then
6 𝑑 ← 𝑔𝑒𝑡𝑊 𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝑆𝑐𝑜𝑟𝑒(𝑀𝑜𝑑𝑒𝑙, 𝑚𝑐);
7 𝐵←𝐵∪𝑑 ;
8 end
9 end
10 end
11 𝑖𝑛𝑑𝑒𝑥𝑒𝑠 ← 𝑆𝑜𝑟𝑡𝐵𝑢𝑓 𝑓 𝑒𝑟𝑆𝑐𝑜𝑟𝑒𝑠(𝐵) //Descending ;
12 𝑀𝑠 ← 𝑔𝑒𝑡𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑃 𝑟𝑜𝑡𝑜𝑡𝑦𝑝𝑒𝑠(𝑗𝑔 , 𝑖𝑛𝑑𝑒𝑥𝑒𝑠, 𝑁) ;
13 return 𝑀𝑠 ;

4.4. Federated semi-supervised online concept maintenance

After the collaborative prediction of each data stream instance at the client-side, the reliability of both the local and global
prototypes needs to be updated dynamically to learn the current stream concept. The principal objective is to utilize the prediction
performance of the current stream instance to infer the representativeness of neighboring prototypes. The following techniques
handle the maintenance of the prototypes in both the local and global models:
Technique 1. Error-Driven Prototype Reliability Learning: We utilize the collaborative prediction, and the learned euclidean space
for the inter-client prototype consistency check to discard less representative prototypes dynamically. For the error-driven approach,
the key idea is to increase the importance of prototypes consistent with the class of the current stream instance. For each prediction
instance, if the inferred closest prototype’s label is consistent with the correct prediction, the importance increases by 1, otherwise it
decreases by 1. Specifically, based on the inferred closest prototype by the collaborative prediction, its cluster feature  is updated
to 𝑚𝑐 ( ←  ± 1), where the closest prototype 𝑚𝑐 could either be a global or local prototype.
Technique 2. Local and Global Concept Drift Handling: We use the fading function defined in Equation (1) to handle all prototypes
concept reliability updates at both the client and global side with respect to time. As time elapses in the streaming environment, some

8
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

Fig. 2. Illustrative running example for the inter-client consistency check operation. We outline the training operation between the local and global model under the
inter-client consistency check with five clients, as described in Algorithm 3. Additional details can be found in Section 4.

prototypes become unreliable and inconsistent with the current stream concept or distribution. Based on the concept drift reliability
update, these unreliable prototypes are removed from local and global learning models. Here, the cluster feature  for all prototypes
is updated as 𝑚( ←  × 2−𝜆×Δ𝑡 ). Where Δ𝑡 is the elapsed time and 𝑚 is all prototypes (𝑚𝑘 and 𝑖𝑔 ) at both the client and global
environment.
Technique 3. Prototype Deletion, Creation, and Merging: We eliminate both local and global prototypes with weak representative-
ness i.e. when 𝑚() < 𝑊𝑡 , where 𝑊𝑡 is a reliability threshold parameter set by the user. Also, global prototypes with less euclidean
space importance for all the inter-client prototype consistency checks are automatically removed from the global server.
Based on the outcome of the collaborative prediction on the current stream instance 𝑥𝑘𝑡 at time 𝑡, 𝑥𝑘𝑡 is either added to the inferred
closest prototype 𝑚𝑐 or a new prototype is created based on criteria (see line 1 in Algorithm 3). When the correct prediction is made
and the criteria are satisfied, 𝑚𝑐 is updated with the current stream instance as 𝑚𝑐 ( ←  + 𝑥𝑘𝑡 ,  ←  + (𝑥𝑘𝑡 )2 , 𝑁 ← 𝑁 + 1, 𝑇 ← 𝑡).
In case a new prototype needs to be created, a maximum threshold parameter (the number of prototypes to be stored by each client)
𝑚𝑎𝑥𝑃 is checked to decide whether to create or merge existing prototypes to ensure efficient memory use (see line 39 in Algorithm 4).
Furthermore, an arriving data instance is considered an outlier if its inferred distance is greater than the radius of the inferred closest
prototype (see line 36 in Algorithm 4). This is based on the idea that if a stream instance is farther away from its inferred closest
prototype than the radius of the cluster that prototype belongs to, it is unlikely to be similar to the other data points in that cluster
and may therefore be considered an outlier. Specifically, a stream instance is considered an outlier if it is not similar enough to
the other data points in its inferred closest prototype’s cluster, as measured by the Euclidean distance between the instance and the
prototype. This outlier stream instance is created as a prototype but later deleted from the learning framework as its reliability score
decreases abruptly.

Algorithm 3: Dynamic Prototype Creation.


Input:
𝑀𝑘 : Client prototypes,
𝑓𝑘 : Client Model,
𝑥𝑙𝑡 : Current stream instance at time 𝑡 with label 𝑙 ∈ [∅, 1, ..., 𝐿] ,
𝑚𝑎𝑥𝑃 : Maximum number of prototypes
Output:
𝑓𝑘 : Updated client model
1 if 𝑠𝑖𝑧𝑒(𝑀𝑘 ) ≥ 𝑚𝑎𝑥𝑃 then
2 if 𝑢𝑛𝑙𝑎𝑏𝑒𝑙𝑃 𝑟𝑜𝑡𝑜𝑡𝑦𝑝𝑒𝐸𝑥𝑖𝑠𝑡𝑠 then
3 [𝑚𝑢𝑘,𝑖 , 𝑚𝑠𝑘,𝑗 ] ← 𝑓 𝑖𝑛𝑑𝑁𝑒𝑎𝑟𝑒𝑠𝑡𝑃 𝑎𝑖𝑟(𝑓𝑘 , 𝑀𝑘 );
𝑢 𝑠
4 𝑀𝑒𝑟𝑔𝑒𝑃 𝑟𝑜𝑡𝑜(𝑀𝑘,𝑖 , 𝑀𝑘,𝑗 );
5 else
6 [(𝑚𝑠𝑘,𝑖 , 𝑚𝑠𝑘,𝑗 )] ← 𝑓 𝑖𝑛𝑑𝑁𝑒𝑎𝑟𝑒𝑠𝑡𝑃 𝑎𝑖𝑟(𝑓𝑘 , 𝑀𝑘 );
𝑠 𝑠
7 𝑀𝑒𝑟𝑔𝑒𝑃 𝑟𝑜𝑡𝑜(𝑀𝑘,𝑖 , 𝑀𝑘,𝑗 );
8 end
9 end
10 𝑚𝑘,𝑛𝑒𝑤 ← ( ← 𝑥𝑡 ,  ← 𝑥2𝑡 ,  ← LS/N,  ← 𝑙,  ← 1, 𝑁 ← 1, 𝑇 ← 𝑡) ;
11 𝑓𝑘 ← 𝑓𝑘 ∪ 𝑚𝑘,𝑛𝑒𝑤 ;
12 return 𝑓𝑘 ;

In the semi-supervised setting, we define two merging criteria: (1) Where both labeled and unlabeled prototypes exist for client
𝑢 and labeled 𝑀 𝑠 prototypes are searched and merged. The new label of the merged
𝑘: For this condition, the closest unlabeled 𝑀𝑘,𝑖 𝑘,𝑗
prototype becomes the label of the merged labeled prototype, (2) Where there are no unlabeled prototypes: The frequent class

9
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

existing for all the prototypes is searched, and the two nearest prototypes are merged. The merging criteria are mathematically
defined as:

{
𝑢 , 𝑀 𝑠 ) if |𝑀 𝑢 | ≥ 1
arg min 𝐸𝐷(𝑀𝑘,𝑖 𝑘,𝑗 𝑘,𝑖
[𝑚𝑘,𝑖 , 𝑚𝑘,𝑗 ] = 𝑠 , 𝑀 𝑠 ) otherwise
(4)
arg min 𝐸𝐷(𝑀𝑘,𝑖 𝑘,𝑗

where 𝐸𝐷(𝑀𝑘,𝑖 , 𝑀𝑘,𝑗 ) is the euclidean distance between two prototypes. Based on the satisfied merging criterion, the two queried
prototypes are merged as 𝑚𝑘 ( ←  𝑖 +  𝑗 ,  ←  𝑖 +  𝑗 , 𝑁 ← 𝑁𝑖 + 𝑁𝑗 , 𝑇 ← max(𝑇𝑖 , 𝑇𝑗 ),  ← max(𝑖 , 𝑗 )).
The Algorithm for creating and merging prototypes during the online prototype maintenance is shown below. It is worth noting
that both the create and merge operation occurs at the client-side, but the global server can perform a merge operation.
Algorithm 4: Proposed SFLEDS Algorithm.
Input:
𝐷: Clients’ data stream {𝐷𝑘 , ..., 𝐷𝐾 }, 𝑚𝑎𝑥𝑃 : maximum client prototypes, 𝑁 : Number of highly consistent prototypes on server, 𝑊𝑡 :
Reliability threshold, 𝑔 : High representative global prototypes
Output:
𝐴𝑐𝑐: Prediction accuracy
1 𝑓𝐾 ← 𝑐𝑙𝑖𝑒𝑛𝑡𝑠𝑀𝑜𝑑𝑒𝑙𝐼𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑒() //Initialize Clients Model;
2 𝑔 ← 𝑒𝑚𝑝𝑡𝑦 ;
3  ← 𝑒𝑚𝑝𝑡𝑦 ;
4 foreach 𝑐𝑙𝑖𝑒𝑛𝑡 𝑠𝑡𝑟𝑒𝑎𝑚 𝑖𝑛 𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙 do
5 foreach 𝑋𝑡 = (𝑥𝑘𝑡 , 𝑦𝑘𝑡 ) ∈ 𝑠𝑡𝑟𝑒𝑎𝑚 𝑖𝑛𝑠𝑡𝑎𝑛𝑐𝑒 𝑓 𝑟𝑜𝑚 𝐷𝑘 do
6 [𝑀𝑘𝑐 , 𝑑𝑖𝑠𝑡𝑐 ] ← 𝑓𝑘 .𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝑥𝑘𝑡 ) // local classifier prediction with k-NN ;
7 if 𝑦𝑘𝑡 ≠ ∅ then
8 if 𝑀𝑘𝑐 .==𝑦𝑘𝑡 then
9 (𝑀𝑘𝑐 ) ← (𝑀𝑘𝑐 ) + 1 ;
10 𝑇 (𝑀𝑘𝑐 ) ← 𝑡 ;
11 else
12 (𝑀𝑘𝑐 ) ← (𝑀𝑘𝑐 ) − 1 ;
13 𝑀𝑘𝑐 .𝑟𝑒𝑚𝑜𝑣𝑒();
14 end
15 end
16 𝑀𝑝𝑘 ← 𝑔𝑒𝑡𝑅𝑒𝑙𝑖𝑎𝑏𝑙𝑒𝑃 𝑟𝑜𝑡𝑜𝑡𝑦𝑝𝑒𝑠() ;
17 𝑔 ← 𝑔 ∪ 𝑝𝑟𝑒𝑠𝑒𝑟𝑣𝑒𝐴𝑛𝑑𝑈 𝑝𝑙𝑜𝑎𝑑(𝑀𝑝𝑘 ) ;
18 Server
19  ← 𝑠𝑒𝑟𝑣𝑒𝑟𝑀𝑜𝑑𝑒𝑙(𝑔 ) ;
20 𝑠 ←  .𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦𝐶ℎ𝑒𝑐𝑘(𝑔 , 𝑘, 𝑁) using Algorithm 2 ;
21 if 𝑠𝑖𝑧𝑒(𝑠 ) > 1 then
/* global prediction using k-NN */
22 [𝑐𝑠 , 𝑑𝑖𝑠𝑡𝑔 ] ←  .𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝑥𝑡 , 𝑠 );
23 if 𝑐𝑠 .==𝑦𝑡 then
24 (𝑐𝑠 ) ← (𝑐𝑠 ) + 1 ;
25 𝑇 (𝑐𝑠 ) ← 𝑡 ;
26
27
28
29 else
30 (𝑐𝑠 ) ← (𝑐𝑠 ) − 1 ;

31 𝐿 ← 𝑀𝑘𝑐 .;
32 𝑚𝑐 ← 𝑀𝑘𝑐 ;
33 if 𝑑𝑖𝑠𝑡𝑔 < 𝑑𝑖𝑠𝑡𝑐 &&𝑐𝑠 . == ∅) then
34 𝐿 ← 𝑐𝑠 .;
35 𝑚𝑐 ← 𝑐𝑠 ;
36 if (𝑑𝑖𝑠𝑡𝑐 ≤ (𝑀𝑘𝑐 ) && 𝑦𝑘𝑡 == 𝑀𝑘𝑐 . && 𝑦𝑘𝑡 ≠ ∅) ∥ (𝑑𝑖𝑠𝑡𝑐 ≤ (𝑀𝑘𝑐 ) &&𝑀𝑘𝑐 . == ∅) then
37 𝑓𝑘 ← 𝑀𝑘𝑐 .𝑢𝑝𝑑𝑎𝑡𝑒𝑃 𝑟𝑜(𝑥𝑘𝑡 , 𝑚𝑐 );
38 else
39 𝑓𝑘 ← 𝑐𝑟𝑒𝑎𝑡𝑒𝑃 𝑟𝑜(𝑚𝑎𝑥𝑃 , 𝑀𝑘 , 𝑥𝑘𝑡 , 𝑓𝑘 ) using Algorithm 3;
40 𝑓𝑘 ← 𝑓𝑘 .𝑟𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦𝑈 𝑝𝑑𝑎𝑡𝑒(𝑀𝑘 ) with Equation (1) ;
41 Server
42  ← 𝐹 .𝑟𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦𝑈 𝑝𝑑𝑎𝑡𝑒(𝑔 ) ;
43 𝐴𝑐𝑐 ← 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦(𝐿, 𝑦𝑘𝑡 );

10
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

4.5. Prototype masking with additive secret sharing

Our proposed algorithm preserves the privacy of the client representative prototypes uploaded to the server by using additive
secret sharing [39] masks. The intuition of using additive secret is to lower the computation and communication overhead. A two-
step procedure does this. In the first step, the server sends a large threshold random mask to each client, denoted by a random
number 𝑅. In the second step, clients generate additive secret sharing of individual prototypes to be uploaded to the server for
privacy protection against potential collisions between the server and the clients. In order to do so, client 𝑘 sends a masked version
of its local representative prototype to each 𝑘 − 1 client participating in the distributed data stream learning. Specifically, if client
𝑘 wants to upload its prototypes, it preserves the privacy of key cluster features of each prototype. Assuming client 𝑘 wants to
preserve cluster feature , it generates 𝐾 − 1 random numbers and represents them as  𝑛 , 𝑛 = {1, ..., 𝐾}, 𝑛 ≠ 𝑘. Client 𝑘 then sends
these shares 𝑠 = ( 1 , ...,  𝑛 ,  ′ ) to each corresponding client without revealing the intrinsic information of the original prototype

features, where  ′ = ((𝑅 − 𝐾 𝑛≠𝑘  𝑛 mod 𝑅) + ) is 𝑘 − 𝑡ℎ own share.
For a prototype to be recovered by an attacker, the attacker needs to know all the random shares and the random key sent to each
client by the server for the masking process. This makes it difficult for an attacker to breach the privacy of the distributed learning
setup.

At the server-side, we recover  = (( ′ + 𝐾 𝑛≠𝑘  𝑛 ) mod 𝑅) when all shares have been received. We utilize this intuitive but
straightforward additive secret sharing scheme in this paper. This simple scheme is not only valid for scalars but also for vectors and
matrices. Since online data streams learning needs a low computational overhead and low latency, the secret-sharing technique is
only applied for masking the prototypes during server uploads and unmasking at the server side when all shares are received from
all clients.

4.6. Time and space complexity

We analyzed the time complexity of our proposed method, Algorithm 4. In analyzing the time complexity, we took four major
components of the framework into consideration. At the model and client initialization stage, each client 𝑘 partitions its 𝐷𝑘𝑖𝑛𝑖𝑡 into 𝐶
disjoint sets, each comprising of 𝑁𝑐 class instances. The 𝐾-means clustering algorithm is then performed on each 𝑁𝑐 class instance
into  clusters. The time complexity for the initialization stage is (|𝑁𝑐 |𝐾𝐶), where |𝑁𝑐 | is the total number of class instances in
𝐶, and 𝐾 is the number of clients. For the inter-client consistency check, the worst case has the time complexity to be (𝐾|𝑀𝑝𝑘 |𝑑 3 )
since the server has to train 𝐾 Gaussian models on each clients representative prototype 𝑀𝑝𝑘 . Here 𝑑 is the dimension of the
radius cluster feature, and  is the number of Gaussian components. For the online concept maintenance stage, both the local and
global models are updated dynamically for each incoming data instance. The running time for the maintenance of the prototypes is
(𝐾(𝑚𝑎𝑥𝑃𝑘 ) + 𝑚𝑎𝑥𝑃𝑔 ), where 𝑚𝑎𝑥𝑃𝑘 and 𝑚𝑎𝑥𝑃𝑔 are the maximum prototypes for the local and global model respectively. In creating
a new prototype during the online maintenance stage, the worst-case time complexity is (𝐾(𝑚𝑎𝑥𝑃𝑘2 )) because we have to transverse
the closest pair of prototypes of the local model for merging. The time complexity for the additive secret sharing masking of the
reliable prototypes is (𝐾(𝑚𝑎𝑥𝑃𝑘 )). The reduced time complexity of the proposed algorithm is (𝐾(|𝑁𝑐 |𝐶 + |𝑀𝑝𝑘 |𝑑 3  + 𝑚𝑎𝑥𝑃𝑘2 )).
The worst case space complexity of the proposed algorithm is also analyzed. Taking into consideration the definition of a single
prototype 𝑀𝑘𝑖 ← (, , , , , , 𝑁, 𝑇 , ), only 𝐿𝑆 and 𝑆𝑆 are d-dimensional vectors, whiles , , , , 𝑁, 𝑇 , and  are
variables holding a single value. For a single prototype, the space complexity is (2 × 𝑉𝑑 + 7), where 𝑉𝑑 is the dimension of the
 and . To this end, the total space complexity for the proposed algorithm is ((𝐾 × 𝑚𝑎𝑥𝑃𝑘 × 𝑀𝑘 ) + |𝑔 |), where 𝑔 is the
prototypes at the client side.

5. Experiments

5.1. Dataset

We demonstrate the performance of our proposed method by conducting several experiments on nine publicly available bench-
mark data sets (both real-word and synthetic). The experiments were performed on five real-world data sets and four synthetic data
sets. The data sets are summarized in Table 2.

5.1.1. Synthetic dataset


The experiments were performed on five synthetic datasets. The synthetic data sets simulate the various concept drifts (e.g.,
incremental, gradual, and sudden concept drifts.).

• CRE4V21 : There are 183, 000 samples in this data set. It consists of four classes that rotate in two-dimensional space as they
expand.
• CR41 : This data set comprises four distinct classes that rotate in two-dimensional space independently. There are 144, 400 samples
in total.

1
https://sites.google.com/site/nonstationaryarchive/datasets.

11
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

Table 2
Data set description. Drift type [I: Incremental Drift, G: Gradual Drift, S: Sudden Drift, U: Unknown,
V: Virtual Drift, R: Recurring Drift], #C: number of classes, #F: number of features.

Data sets #Instances #F #C Drift Type

Synthetic Dataset
CR41 144,400 2 4 I&G
CRE4V21 183,000 2 4 I&G
FG2C2D1 200,000 2 2 I&G
GEAR2C2D1 200,000 2 2 I&G

Real Dataset
Forest Covtype2 581,012 54 7 U
Electricity2 45,312 8 2 U
Shuttle3 43,500 9 7 U
GSD3 13,910 128 6 U
KDDcup993 494,021 42 23 G&S
1
https://sites.google.com/site/nonstationaryarchive/datasets.
2
https://moa.cms.waikato.ac.nz/datasets/.
3
https://archive.ics.uci.edu/ml/datasets.php.

• GEAR2C2D1 : There are 200, 000 samples in this data collection. It is made up of two rotating gears that are represented by two
classes.
• FG2C2D1 : There are 200, 000 samples in this data collection. In 2-dimensional space, it comprises two bi-dimensional groups as
four Gaussians.

5.1.2. Real-world data


Five real-world data sets were selected to experimentally validate our method. The description of the data sets is as follows:

• KDDcup992 : This is one of the most commonly used data sets for testing various data stream classification algorithms. There are
494,021 samples of 42 features spread across 23 distinct categories. Each instance is assigned to one of two categories: regular
or attacked. In our experiment, we removed the eight categorical features and use the 34 continuous features.
• Shuttle2 : The data set has 58,000 data instances belonging to one of seven classes. Each sample contains 9 real-valued features.
• Forest Cover Type (FCT)3 : This data set defines seven categories of forest cover from various geographical locations and mostly
used for the evaluation of numerous data stream classification methods. It has 581,012 samples, and each sample has 54 distinct
features.
• Electricity3 : This data set contains 45,312 samples, each with eight attributes. The samples are divided into two categories,
each reflecting a positive or negative change in the price of electricity.
• Gas Sensor Array Drift (GSD)2 : This data set includes 13,910 samples of 16 chemicals having different concentrations to
identify 6 distinct gases (classes).

5.2. Comparison methods

In evaluating the performance of our proposed method, we selected six semi-supervised federated learning algorithms and eight
supervised data streams algorithms. We fine-tuned the model for all methods, modified the model’s input pipeline and output
prediction layer. We present a summary of all the baseline methods in this current section.
The brief description of the FL algorithms is provided below:

• FedSem [1]: A semi-supervised federated learning method that exploits unlabeled data.
• FedFixMatch [19]: A classic neural network ensemble approach for combining predictions of multiple learners.
• FedSiam [30]: A personalization layer approach for federated training of deep feed forward neural networks.
• FedMatch [47]: A Bayesian nonparametric framework for federated learning with neural networks.
• FedAvg [24]: Federated Averaging, a proposed algorithm on a typical distributed federated learning setting.
• FedProx [25]: A learning framework that tackles statistical heterogeneity in federated networks.

The description of the typical data streams algorithms is provided below:

• OZAG [35]: A bagging and boosting method well-known for ensemble learning.

2
https://archive.ics.uci.edu/ml/datasets.php.
3
https://moa.cms.waikato.ac.nz/datasets/.

12
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

• ADWIN [3]: A time and memory-efficient algorithm which uses adaptive sliding window to handle concept drift.
• DWM [23]: Dynamic weighted majority (DWM) is an ensemble method that handles concept drift in response to changes in
performance.
• LNSE [10]: An ensemble-based learning approach for tackling concept drift using incremental learning.
• ADDEXP [22]: It is an additive expert ensemble online algorithm for handling drifting concepts.
• AWEC [43]: A general framework for mining concept-drifting data streams using weighted ensemble classifiers.
• HTC [18]: An efficient algorithm for mining decision trees from continuously-changing data streams.
• SRP [13]: This is a Streaming Random Patches algorithm which uses an ensemble method for stream classification.
• ClientSplit Stream: A single data stream is partitioned to simulate a distributed setting with its own local classifier.

5.3. Experimental result

We evaluated the performance of our proposed method and other competing methods by reporting the classification accuracy
of each method in the distributed semi-supervised settings. The performance accuracy was calculated using the interleaved test-
then-train evaluation method, which uses a single stream instance to test and train the classifier. For the semi-supervised method,
we experimented with a label-ratio Y = 0.10, 0.15 and 0.20. The number of clients used in this current study are 𝐾 = 10 and 30.
Also, we compared our method with other supervised methods. The hyper-parameter settings used for the experimental set-up are
𝜆 = 2 × 10−6 , 𝐷𝑘𝑖𝑛𝑖𝑡 = 1% of total stream instances,  = 2, 𝑚𝑎𝑥𝑃 = { 200 for our approach, and 1000 for the client stream version without
collaboration, and 1000 for the global prototypes},  = 50, 𝑊𝑡 = 0.6, where 𝑊𝑡 is the reliability threshold. The hyperparameters of
the model were largely determined through an initial grid search to identify the optimal values. In particular, the values of  and
𝑚𝑎𝑥𝑃 were chosen to be large but optimal, in order to maintain a sufficient number of both local and global prototypes.
For each comparison method, we fine-tuned the model parameters. We modified the data set input pipeline and the number of
models’ output predictions to fit with the data sets.
We conducted the experiments on a Linux machine with three Xeon E5-2678 V3 12 core CPUs, processor speed of 2.50 GHz,
125 GB main memory, and Nvidia Geforce RTX 2080Ti GPU of 11 GB memory. The source code for the proposed method is publicly
available on github.4
The evaluation of the performance of all methods is grouped based on the semi-supervised and supervised settings. We define the
null hypothesis as no statistical difference among all competing methods in all settings. We reject the null hypothesis if the Friedman
rank-sum reports a statistical difference. The Nemenyi post-hoc test is further used to calculate the differences if the null hypothesis
is rejected.

5.3.1. Performance evaluation of semi-supervised federated learning methods


From Table 3 and Table 4, we report the classification performance of each algorithm on both the real-world and synthetic data
sets. It could be seen that our algorithm outperforms almost all the competing algorithms for the different label ratios and client
numbers. Table 5 reports the average accuracy and ranks of all the semi-supervised methods. Our method achieved an average per-
formance difference of 10.24% and 55.04% for the best typical FL algorithms on the real-world and synthetic-dataset, respectively.
For the overall average difference between the second-best algorithm, we report an average accuracy of 5.15%. In analyzing the sta-
tistical significance of all competing methods, we analyze the average accuracy and average rank of all results as reported in Table 5.
The p-value reported is 1.35𝑒−32 . This p-value confirms that our proposed method differs statistically from all competing methods.
The Nemenyi post-hoc test is then applied to measure the actual difference between the competing algorithms by constructing the
critical difference diagram as shown in Fig. 7 by using the average ranks. As shown in Fig. 7, the Nemenyi test produced a critical
difference of 3.71 on the average ranks of the competing algorithms, which revealed which algorithms were statistically significantly
different from each other. Specifically, SFLEDS was ranked higher than the other algorithms.
We further present the performance of each algorithm for different client numbers in Fig. 6. As depicted in the figure, the
performance of our proposed algorithm (SFLEDS) remained almost constant as the number of clients increased. While FedSem
and FedProx also exhibited almost constant performance with increasing number of clients, our proposed algorithm demonstrated
superior performance. It can be observed that the competing algorithms performed poorly on the synthetic data sets due to the
inability of these algorithms to learn all the possible high drifting concepts independently amidst the limited labels. The performance
of our method for different clients with different label ratios is depicted in Fig. 5. It can be observed that the performance of the
proposed method for different clients on the real-world dataset increases as the number of available labels of data streams increases.
In the worst case scenario, the proposed method demonstrates robustness, as the performance remains constant with increasing label
ratio, as illustrated for the synthetic dataset.

5.3.2. Performance evaluation of supervised learning methods


The performance of the supervised learning methods for different clients is reported in Table 6. The average accuracy and the
ranks are shown in Table 7. Our proposed approach outperformed all the supervised competing methods again with a surplus
average accuracy of 5.88% from the second-best method. The p-value reported is 6.55𝑒−07 , which also rejects the null hypothesis.
The constructed critical difference diagram of the comparing algorithms is shown in Fig. 8. The Nemenyi post-hoc test, which was

4
https://github.com/mvisionai/FedLimited.

13
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

Table 3
Prediction accuracy of different client number 𝐾 of different methods on five real-world sets.

Label Ratio K Datasets Our ClientStream FedAvg FedProx FedSem FedMatch FedFixMatch FedSiam

Y=0.10 10 Electricity 78.83% 71.22% 69.46% 67.17% 68.05% 66.11% 71.08% 64.66%
Shuttle 99.02% 98.63% 96.14% 96.20% 96.03% 95.58% 95.95% 96.31%
GSD 94.97% 92.70% 92.37% 92.84% 92.52% 17.79% 95.15% 95.07%
Forest Covtype 89.99% 85.84% 60.75% 60.72% 60.57% 60.17% 58.67% 11.71%
KDDcup99 99.79% 99.72% 92.51% 95.62% 98.79% 25.96% 92.72% 62.26%

30 Electricity 79.53% 69.25% 75.43% 75.49% 74.97% 69.23% 69.36% 67.82%


Shuttle 99.36% 98.66% 95.36% 95.34% 95.31% 95.59% 96.33% 95.40%
GSD 93.60% 86.46% 89.64% 90.83% 89.86% 91.87% 10.46% 93.06%
Forest Covtype 89.72% 71.39% 60.11% 60.54% 58.45% 61.66% 61.75% 61.69%
KDDcup99 99.70% 99.47% 88.40% 91.92% 78.11% 25.96% 36.99% 71.03%

Y=0.15 10 Electricity 80.38% 71.54% 66.05% 66.36% 66.15% 66.45% 67.47% 63.78%
Shuttle 99.13% 98.62% 96.28% 96.41% 96.27% 96.47% 95.12% 95.05%
GSD 95.94% 92.64% 92.73% 93.17% 94.68% 17.39% 94.71% 94.78%
Forest Covtype 91.10% 85.88% 60.13% 60.31% 60.74% 63.19% 60.04% 15.99%
KDDcup99 99.81% 99.72% 98.19% 98.53% 95.93% 25.96% 66.85% 67.09%

30 Electricity 80.24% 69.35% 71.00% 70.24% 73.78% 68.18% 65.15% 65.90%


Shuttle 99.40% 98.65% 95.98% 95.43% 95.92% 95.50% 95.99% 96.0%
GSD 94.60% 86.46% 90.83% 91.91% 94.06% 92.63% 94.96% 93.70%
Forest Covtype 90.36% 80.02% 60.66% 60.46% 60.54% 61.37% 62.14% 15.35%
KDDcup99 99.80% 99.46% 92.40% 98.31% 98.68% 25.96% 33.74% 73.48%

Y=0.20 10 Electricity 81.23% 71.65% 66.37% 65.68% 65.30% 66.92% 66.51% 66.51%
Shuttle 99.27% 98.62% 95.89% 95.25% 95.29% 96.39% 95.84% 95.87%
GSD 95.94% 92.81% 94.50% 93.06% 94.78% 17.79% 13.91% 94.93%
Forest Covtype 91.84% 85.88% 59.94% 60.01% 60.98% 61.02% 61.54% 23.30%
KDDcup99 99.85% 99.72% 98.93% 98.96% 98.70% 25.96% 93.68% 35.59%

30 Electricity 81.01% 69.45% 70.87% 69.57% 71.22% 67.18% 67.11% 66.98%


Shuttle 98.59% 98.63% 96.07% 96.06% 96.12% 96.40% 95.77% 96.24%
GSD 94.50% 86.51% 91.80% 91.26% 94.06% 91.26% 91.94% 89.46%
Forest Covtype 89.62% 80.13% 60.55% 59.77% 60.16% 61.63% 61.32% 10.59%
KDDcup99 99.23% 99.45% 93.08% 91.92% 94.37% 25.96% 32.86% 43.52%

Table 4
Prediction accuracy of different client number 𝐾 with different methods on four synthetic sets.

Label Ratio K Datasets Our ClientStream FedAvg FedProx FedSem FedMatch FedFixMatch FedSiam

Y=0.10 10 CR4 99.83% 99.66% 0.63% 0.64% 0.63% 0.66% 0.63% 0.64%
CRE4V2 87.35% 83.01% 1.19% 1.15% 1.56% 1.77% 1.77% 1.76%
FG2C2D 94.15% 93.61% 58.35% 59.50% 56.75% 61.46% 61.41% 58.96%
GEAR2C2D 99.64% 97.71% 95.70% 95.70% 95.72% 95.72% 95.72% 95.72%

30 CR4 97.50% 83.79% 0.54% 0.53% 0.60% 0.62% 0.61% 0.60%


CRE4V2 81.71% 58.08% 0.95% 1.09% 0.94% 1.49% 1.54% 1.21%
FG2C2D 94.26% 93.30% 58.87% 57.93% 57.86% 56.17% 56.17% 55.29%
GEAR2C2D 98.89% 95.24% 95.71% 95.71% 95.71% 95.71% 95.71% 95.71%

Y=0.15 10 CR4 99.82% 99.63% 0.59% 0.60% 0.61% 0.63% 0.62% 0.62%
CRE4V2 88.0% 82.25% 0.97% 0.90% 1.05% 0.97% 0.98% 1.20%
FG2C2D 94.15% 93.71% 57.97% 59.08% 58.57% 56.32% 60.91% 57.27%
GEAR2C2D 99.68% 97.73% 95.72% 95.71% 95.71% 95.71% 95.71% 95.71%

30 CR4 98.02% 93.81% 0.61% 0.62% 0.61% 0.62% 0.62% 0.61%


CRE4V2 83.01% 58.48% 1.06% 1.18% 0.89% 1.02% 0.99% 1.10%
FG2C2D 94.18% 93.34% 61.05% 57.99% 58.54% 59.84% 59.99% 58.70%
GEAR2C2D 98.88% 95.25% 95.71% 95.73% 95.70% 95.72% 95.73% 95.72%

Y=0.20 10 CR4 99.81% 99.62% 0.71% 0.69% 0.69% 0.72% 0.71% 0.73%
CRE4V2 88.40% 82.56% 1.14% 1.54% 2.13% 1.0% 1.02% 1.32%
FG2C2D 94.29% 93.75% 59.32% 59.37% 61.33% 61.78% 61.78% 58.86%
GEAR2C2D 99.70% 97.78% 95.71% 95.72% 95.71% 95.73% 95.74% 95.73%

30 CR4 99.13% 94.01% 0.63% 0.62% 0.63% 0.64% 0.65% 0.64%


CRE4V2 84.06% 58.05% 0.91% 1.00% 0.90% 0.95% 0.94% 1.08%
FG2C2D 94.21% 93.39% 57.87% 57.93% 59.07% 60.71% 60.71% 60.95%
GEAR2C2D 98.75% 95.29% 95.71% 95.72% 95.71% 95.73% 95.73% 95.72%

14
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

Table 5
Average prediction accuracy and rank with different methods on real-world and synthetic data sets.

Label Ratio Avg/Rank Our ClientStream FedAvg FedProx FedSem FedMatch FedFixMatch FedSiam

Y=0.10 Real Avg 92.42% 87.50% 81.76% 82.28% 82.10% 63.68% 71.49% 71.17%
Real Rank 1.27 3.33 5.07 4.53 5.13 5.87 5.4 5.4
Synthetic Avg 94.13% 88.41% 39.09% 39.07% 38.86% 39.32% 39.30% 39.00%
Synthetic Rank 1.08 2.92 6.33 6.21 5.83 3.92 4.46 5.25
Overall Avg 93.27% 87.96% 60.42% 60.68% 60.48% 51.5% 55.4% 55.08%
Overall Rank 1.19 3.15 5.63 5.28 5.44 5.0 4.98 5.33

Y=0.15 Real Avg 93.04% 88.09% 82.22% 82.57% 83.54% 58.63% 76.54% 72.96%
Real Rank 1.13 3.47 5.33 5.0 4.4 6.07 5.3 5.3
Synthetic Avg 94.49% 89.21% 39.12% 38.97% 38.95% 38.87% 39.27 38.87
Synthetic Rank 1.0 3.0 5.71 5.21 5.96 5.08 4.75 5.29
Overall Avg 93.77% 88.65% 60.67% 60.77% 61.25% 48.75% 57.90% 55.92%
Overall Rank 1.07 3.26 5.5 5.09 5.09 5.63 5.06 5.3

Y=0.20 Real Avg 93.24% 88.15% 83.00% 82.35% 83.31% 62.92% 66.93% 58.19%
Real Rank 1.13 3.13 4.33 5.3 4.6 5.43 5.83 6.23
Synthetic Avg 94.80% 89.15% 39.07% 39.13% 39.41% 39.60% 39.60% 39.29%
Synthetic Rank 1.0 3.0 6.83 5.58 5.96 4.54 4.21 4.88
Overall Avg 94.02% 88.65% 61.03% 60.74% 61.36% 51.26% 53.27% 48.74%
Overall Rank 1.07 3.07 5.44 5.43 5.2 5.04 5.11 5.63

Table 6
Prediction accuracy of different client number 𝐾 on real-world and synthetic data sets with supervised methods.

K Datasets Our OZAG ADDEXP DWM ADWIN LNSE HTC SRP AWEC

Real-world
10 Electricity 84.62% 73.70% 72.90% 75.50% 74.0% 67.90% 75.20% 78.20% 68.10%
Shuttle 99.44% 99.10% 92.20% 89.80% 99.0% 98.60% 92.90% 97.40% 93.40%
GSD 98.14% 87.90% 61.90% 63.90% 86.90% 52.50% 60.40% 67.40% 50.0%
Forest Covtype 88.36% 81.0% 57.70% 70.60% 81.10% 79.0% 66.10% 83.40% 68.30%
KDDcup99 99.71% 99.0% 97.77% 99.0% 99.0% 95.70% 99.0% 100.0% 38.80%

30 Electricity 81.46% 71.66% 72.73% 74.03% 71.26% 60.83% 73.73% 75.93% 60.76%
Shuttle 99.42% 98.96% 93.30% 91.33% 98.66% 92.30% 93.36% 96.30% 92.06%
GSD 96.50% 76.83% 60.30% 59.03% 74.90% 32.70% 60.43% 61.36% 36.33%
Forest Covtype 79.74% 74.40% 63.26% 65.56% 75.0% 68.16% 56.70% 77.60% 63.26%
KDDcup99 99.46% 99.0% 97.93% 98.20% 99.0% 91.46% 98.26% 99.33% 32.46%

Synthetic
10 CR4 99.68% 97.40% 25.0% 94.50% 97.90% 98.0% 34.40% 98.70% 96.50%
CRE4V2 88.66% 45.90% 24.10% 81.30% 55.50% 82.90% 37.60% 87.0% 81.0%
GEAR2C2D 99.79% 95.60% 95.90% 95.90% 95.50% 97.0% 96.0% 96.0% 95.20%
FG2C2D 94.82% 95.0% 85.0% 93.50% 95.0% 92.80% 90.60% 94.90% 93.0%

30 CR4 98.44% 35.80% 25.63% 86.73% 45.53% 90.53% 25.16% 96.26% 88.70%
CRE4V2 82.17% 29.66% 24.20% 69.73% 42.93% 51.36% 24.86% 79.63% 64.50%
GEAR2C2D 98.94% 95.30% 95.90% 95.90% 95.30% 91.20% 95.60% 95.43% 94.26%
FG2C2D 94.77% 93.0% 84.96% 92.40% 93.06% 92.06% 90.16% 93.50% 90.73%

Table 7
Average prediction accuracy and rank with supervised models.

Datasets Our OZAG ADDEXP DWM ADWIN LNSE HTC SRP AWEC

Electricity 82.73% 72.60% 72.89% 74.69% 72.52% 64.13% 74.58% 76.98% 63.59%
Shuttle 99.42% 99.04% 92.83% 90.58% 98.82% 95.57% 93.25% 96.75% 92.79%
GSD 97.20% 82.33% 61.28% 60.98% 81.10% 40.83% 60.36% 63.27% 40.99%
Forest Covtype 83.50% 77.47% 61.99% 68.04% 77.83% 73.04% 61.43% 80.25% 65.52%
KDDcup99 99.58% 99.00% 97.88% 98.52% 99.00% 94.09% 98.75% 99.78% 37.07%
CR4 99.03% 60.10% 25.53% 90.43% 66.38% 94.48% 28.22% 97.47% 91.85%
CRE4V2 85.59% 32.79% 24.13% 74.98% 43.51% 67.80% 29.12% 83.06% 73.07%
FG2C2D 99.42% 95.37% 95.88% 95.88% 95.37% 93.93% 95.80% 95.69% 94.80%
GEAR2C2D 94.81% 94.00% 84.95% 92.90% 94.09% 92.49% 90.62% 94.17% 91.81%
Avg 93.56% 80.51% 68.37% 83.16% 82.2% 79.72% 70.58% 87.69% 72.63%
Avg Rank 1.11 4.61 7.06 4.94 4.61 6.56 6.44 2.67 7.0

15
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

Fig. 3. Stream test accuracy on supervised algorithms on selected real-world data sets. “Our(0.15)” and “Our(0.2)” refer to the fact that our proposed algorithm
(SFLEDS) was run with label ratios of 0.15 and 0.20 on the respective datasets.

Fig. 4. Stream test accuracy on supervised algorithms on selected synthetic data sets. “Our(0.15)” and “Our(0.2)” refer to the fact that our proposed algorithm
(SFLEDS) was run with label ratios of 0.15 and 0.20 on the respective datasets.

performed on the various algorithms for each dataset, showed a critical difference of 4.0. The critical difference value revealed which
algorithms were significantly different from each other. As depicted in Fig. 8, our proposed algorithm, SFLEDS, statistically differed
significantly from the competing algorithms.
Figs. 3 and 4 show the performance of the supervised data stream algorithms by plotting their performance over time for both the
synthetic and real-world datasets, respectively. It can be observed that the performance of our proposed algorithm remains almost
uniform throughout the entire stream task.
It is worthy to note from Fig. 3 that although our algorithm outperformed the competing methods, it also surpassed these methods
when it used data with only 0.15 and 0.20 label ratios.
The proposed algorithm outperforms all algorithms both in the semi-supervised and supervised learning environment. Fig. 7 and
Fig. 8 further report which algorithms were statistically different from each other using the measured critical difference (CD). The
robust performance of our method can be attributed to the dynamic creation and maintenance of both local and global prototypes in a
federated semi-supervised fashion. Our algorithm showed robustness which is reflected in its performance in handling the distributed
concept drift among participating clients. These results are convincing since the proposed algorithm preserves all participating clients’
privacy and learns inter-client prototype consistency from each client. Also, it could be observed that the current federated learning
methods are not robust enough for the distributed data streams environment where high heterogeneous data is likely to exist.

16
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

Fig. 5. Average test accuracy on all data sets (real-world + synthetic) on different client number with different label ratio.

Fig. 6. Average test accuracy on all data sets (real-world + synthetic) on different client number with semi-supervised methods.

Fig. 7. Nemenyi test on data sets for semi-supervised baseline methods.

5.4. Parameter sensitivity analysis

In this section, we show the effect of increasing the number of the global maximum global prototypes 𝑚𝑎𝑥𝑃 on the different
label ratios for the proposed federated semi-supervised data streams learning. This parameter selects similar global prototypes from
the inter-client prototype consistency check whiles performing reliability maintenance on all global prototypes. We used one of the
synthetic data sets, i.e., CR4 data set, because of the high existence of concept drift to depict the parameter sensitivity analysis.
We also conducted a parameter sensitivity test on the Forest covtype real-world dataset to confirm the robustness of the proposed
method. The Forest Covtype dataset has the largest number of data instances, which makes it a good choice for this test. As shown
in Fig. 9, the performance of the proposed method remains relatively constant as the value of 𝑚𝑎𝑥𝑃 increases for each label ratio in
each dataset. This result confirms the robustness and reliability of the inter-client prototype consistency of the proposed algorithm to
select the best prototypes for complementing each client’s local prototypes irrespective of increasing 𝑚𝑎𝑥𝑃 . We selected the ideal but

17
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

Fig. 8. Nemenyi test on data sets for all supervised baseline methods.

Fig. 9. Hyperparameter 𝑚𝑎𝑥𝑃 (global prototypes) sensitivity analysis.

Fig. 10. Sensitivity analysis for different clustering methods.

minimum value for this parameter because increasing the value increases the storage and computational cost, which is not suitable
for distributed data stream settings.
We also compared the effects of using a different clustering algorithm, Density-based spatial clustering of applications with
noise (DBSCAN), instead of K-means to evaluate the impact on the initialization stage of the proposed algorithm. Based on the
experimentation on the two datasets, Fig. 10 shows that using different clustering algorithms (DBSCAN in this case) has a negligible
impact on the predictive performance of the proposed model. However, DBSCAN increases the runtime for the initialization process
since the worst-case time complexity is expressed as (𝑛2 ) compared to K-means with a time complexity of (𝑛 ∗ 𝑘 ∗ 𝑑 ∗ 𝑡). Where 𝑛 is
the number of data points, 𝑘 is the number of clusters, 𝑑 is the number of dimensions in the dataset and 𝑡 is the number of iterations
performed by K-means, which is set to a small number.
Furthermore, we also performed additional experiments for the parameter sensitivity test  for K-means in our proposed method.
The impact of the number of K-means clusters () on the predictive performance of the proposed method is shown in Fig. 12. The
results indicate that higher  values lead to improved performance, suggesting that retaining a larger number of micro-clusters
during the model initialization stage enhances the ability to capture trends more effectively. However, larger values of  could

18
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

Fig. 11. Hyperparameter 𝜆 (decay rate) sensitivity analysis.

Fig. 12. Hyperparameter  (number of clusters) sensitivity analysis.

increase the space complexity of the proposed method at the initial learning stage but this scenario is handled by the prototype
maintenance component of the proposed method.
In Fig. 11, we explored the effect of the decay rate parameter 𝜆 on prediction performance by varying 𝜆 values from 2 × 10−6 to
1 × 10−7 . This parameter is responsible for implementing a forgetting mechanism that helps the proposed model handle concept drift.
A high value of 𝜆 causes outdated micro-clusters that have not been updated over time to be quickly removed. The results indicate
that as 𝜆 values decrease, performance gradually declines specifically for the synthetic datasets which has high concept drift. This is
due to the slow forgetting rate, which can cause the model to retain outdated concepts and lead to performance degradation. Since
the existence of concept drift is minimal in the real-word dataset, decreasing the value of 𝜆 has minimal impact on the performance.

5.5. Security guarantee

In this section, we analyze the security of the proposed method. We define privacy as preventing the exposure of clients’ sen-
sitive information to third parties or unauthorized users whiles performing the collaborative distributed task. The privacy-sensitive
information of our framework is the representative prototypes of the clients communicated to the central server. Therefore, the
confidentiality of this information needs to be preserved by privacy-preserving techniques. Concerning the attack model, we defined
our threat model as an adversary who can observe some subset of the shares, but cannot directly access the private data of any
participant to obtain sensitive information from the representative prototypes communicated to the server.
Suppose we have 𝐾 clients, let  ⊆ 𝐾 be the subset of passively corrupted clients, and let  = 𝐾∖ be the set of honest clients.
With a set of  being honest, then the only information about the honest client’s that can be learned by the corrupted clients is

𝑘∈  𝑘 , sum of the cluster feature  shares, but nothing more than that.
To do this, we use the fact that the secret sharing scheme is secure against an adversary who has access to less than 𝐾 shares
[15]. We can formalize the above argument as follows: With  being the set of clients that the adversary has access to and without
loss of generality, we can assume that  = {1, 2, … , 𝑐} for some 𝑐 < 𝐾. Then, the adversary can compute:
𝐾
∑∑ ∑ 𝐾
𝑐 ∑ 𝐾 ∑
∑ 𝑐
(𝑘) (𝑘) (𝑘)
 𝑖,𝑗 =  𝑖,𝑗 =  𝑖,𝑗
𝑗∈ 𝑘=1 𝑗=1 𝑘=1 𝑘=1 𝑗=1

19
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235


However, the adversary cannot compute 𝐾 𝑘=1  𝑖,𝑘 , which is equal to the sum of all shares of client 𝑖. Therefore, the adversary
cannot learn anything about client 𝑖’s representative prototype  𝑖 from the shares held by servers in . Formally, we can re-write
this as:

𝐾
∑∑ 𝐾

(𝑘)
∀ ⊆ {1, 2, … , 𝐾}, || < 𝐾, ∀𝑖 ∈ {1, 2, … , 𝑛} ∶  𝑖,𝑗 ≠  𝑖,𝑘
𝑗∈𝑆 𝑘=1 𝑘=1

(𝑘)
where  𝑖,𝑘 is the 𝑘th share of client 𝑖 and  𝑖,𝑗 is the 𝑘th share of client 𝑖 held by adversary 𝑗.
Since this argument holds for any client and any set of clients with fewer than 𝐾 servers, we can conclude that the adversary
cannot learn anything about the clients’ representative prototypes from shares held by fewer than 𝐾.
Furthermore, based on experimental results, we found that the price paid for protecting the privacy of the clients is an increased
computational time as compared to no privacy protection for the clients. Using the CR4 dataset, the average runtime (in minutes)
for execution with and without the proposed privacy approach are 78 and 59, respectively. It can be deduced that the computational
time is the price paid when using the privacy-preserving approach. Hence, there is a trade-off between predictive performance and
computational time when protecting the privacy of the participating clients. However, the predictive performance of the proposed
method remains the same with or without the privacy preservation approach.

5.6. Discussion

In this section, we provide a brief overview of the characteristics and applicability of the proposed method. The proposed
prototype-based semi-supervised federated learning on evolving data streams (SFLEDS) method is a framework for collaboratively
training a global model in a federated setting while handling distributed concept drifts and the lack of labels in evolving data
streams. This method involves training local models at each client using a set of learned local prototypes and a global model that
adapts to both local and distributed changing concepts. The performance of the global model is enhanced through the following
techniques: (1) comparing the distances returned by the local and global models for their predicted closest prototypes and selecting
the smaller distance as the predicted prototype for the current stream instance, (2) efficiently performing robust prototype techniques
such as merging and reliability updates to counter label scarcity, (3) an error-driven prototype update for handling both client and
global concept drift, (4) a global inter-client probabilistic consistency check to select complementary global prototypes, and (5) an
additive secret privacy-preserving technique on the shared global prototypes. These characteristics of the proposed method ensure
that SFLEDS performs robustly compared to conventional semi-supervised and supervised competing algorithms.
The proposed method may be applicable in scenarios where: (1) a distributed data stream mining task is required and the
underlying data distribution is constantly changing, such as in real-time monitoring or data analytics; (2) semi-supervised learning is
used to allow the model to learn from both labeled and unlabeled data, making it more robust and efficient in situations where labeled
data is scarce; and (3) a federated learning task needs to be performed by multiple resource-constrained devices. Our prototype-
based approach allows for more efficient and effective learning from data streams, as it reduces the amount of data that needs to be
processed and stored.
Despite the effectiveness of the proposed algorithm, it has the limitation of not being fully asynchronous in semi-supervised
federated learning on evolving data streams. This can be inefficient in situations where some clients may delay in uploading their
representative prototypes to the server for the global model update process. To stabilize the training process in the future, we
aim to design a more efficient semi-supervised federated learning framework by exploring synchronization mechanisms and fast
training convergence. Furthermore, while our results demonstrate promising performance, it is possible that the performance could
deteriorate in scenarios with uneven data flow. In real-world scenarios, the assumption of fair distribution among client streams may
not hold due to uneven data flow. In our proposed approach, we used a sequential and alternating time interval data partitioning
strategy to distribute data instances among participating clients. However, this strategy may not be optimal for handling scenarios
with uneven data flow. Future research could explore more sophisticated data partitioning strategies that can handle uneven data
flow and improve the robustness of federated stream learning algorithms.

6. Conclusion and future works

In this study, we propose a new federated learning framework, called SFLEDS, aims to handle the label scarcity, varying concept
drifts, and client privacy breach on the evolving data streams environment simultaneously. To this end, the incoming data instances
are summarized as prototypes, and are dynamically maintained to capture the varying concepts of each client based on error-driven
representativeness learning. An inter-client prototype consistency check is proposed to select similar prototypes from other clients
to augment the prediction performance of each local client while preserving privacy. Extensive experiments on real-world and
synthetic data sets show that our proposed method outperforms both the state-of-the-art semi-supervised and supervised learning
algorithms. Extensive experiments further demonstrated that our proposed algorithm is reliable, robust, and preserves the data
privacy of participating clients. In future research, we aim to design a more efficient semi-supervised federated learning framework
by exploring the synchronization mechanism and fast training convergence.

20
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

CRediT authorship contribution statement

Cobbinah B. Mawuli: Conceptualization, Methodology, Software, Writing – original draft. Jay Kumar: Data curation, Formal
analysis. Ebenezer Nanor: Software, Validation. Shangxuan Fu: Software, Validation. Liangxu Pan: Software, Validation. Qinli
Yang: Writing – review & editing. Wei Zhang: Software, Validation. Junming Shao: Supervision.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have influenced
the work reported in this manuscript. They warrant that this manuscript is their original work, has not received prior publication
and is not under consideration for publication elsewhere.

Data availability

Data will be made available on request.

Acknowledgement

This work is supported by Sichuan Science and Technology Program (2022YFG0260), The Fundamental Research Funds for
the Central Universities (ZYGX2019Z014), National Natural Science Foundation of China (61976044, 52079026), Fok Ying-Tong
Education Foundation (161062), and Sichuan Science and Technology Program (2022YFG0260, 2020YFH0037).

References

[1] A. Albaseer, B.S. Ciftler, M. Abdallah, A. Al-Fuqaha, Exploiting unlabeled data in smart cities using federated edge learning, in: 2020 International Wireless
Communications and Mobile Computing (IWCMC), IEEE, 2020, pp. 1666–1671.
[2] D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, C.A. Raffel, Mixmatch: a holistic approach to semi-supervised learning, in: Advances in Neural
Information Processing Systems, vol. 32, 2019.
[3] A. Bifet, R. Gavalda, Learning from time-changing data with adaptive windowing, in: Proceedings of the 2007 SIAM International Conference on Data Mining,
SIAM, 2007, pp. 443–448.
[4] K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konečnỳ, S. Mazzocchi, B. McMahan, et al., Towards federated learning
at scale: system design, in: Proceedings of Machine Learning and Systems, vol. 1, 2019, pp. 374–388.
[5] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H.B. McMahan, S. Patel, D. Ramage, A. Segal, K. Seth, Practical secure aggregation for federated learning on
user-held data, arXiv preprint, arXiv:1611.04482, 2016.
[6] D. Caldarola, M. Mancini, F. Galasso, M. Ciccone, E. Rodolà, B. Caputo, Cluster-driven graph federated learning over multiple domains, in: Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2749–2758.
[7] G. Cassales, H. Gomes, A. Bifet, B. Pfahringer, H. Senger, Improving the performance of bagging ensembles for data streams through mini-batching, Inf. Sci. 580
(2021) 260–282.
[8] B. Denham, R. Pears, M.A. Naeem, Hdsm: a distributed data mining approach to classifying vertically distributed data streams, Knowl.-Based Syst. 189 (2020)
105114.
[9] S.U. Din, J. Shao, J. Kumar, C.B. Mawuli, S. Mahmud, W. Zhang, Q. Yang, Data stream classification with novel class detection: a review, comparison and
challenges, Knowl. Inf. Syst. 63 (2021) 2231–2276.
[10] R. Elwell, R. Polikar, Incremental learning of concept drift in nonstationary environments, IEEE Trans. Neural Netw. 22 (2011) 1517–1531.
[11] C. Fahy, S. Yang, M. Gongora, Scarcity of labels in non-stationary data streams: a survey, ACM Comput. Surv. 55 (2022) 1–39.
[12] H. Fereidooni, S. Marchal, M. Miettinen, A. Mirhoseini, H. Möllering, T.D. Nguyen, P. Rieger, A.R. Sadeghi, T. Schneider, H. Yalame, et al., Safelearn: secure
aggregation for private federated learning, in: 2021 IEEE Security and Privacy Workshops (SPW), IEEE, 2021, pp. 56–62.
[13] H.M. Gomes, J. Read, A. Bifet, Streaming random patches for evolving data stream classification, in: 2019 IEEE International Conference on Data Mining (ICDM),
IEEE, 2019, pp. 240–249.
[14] H. Guo, H. Li, Q. Ren, W. Wang, Concept drift type identification based on multi-sliding windows, Inf. Sci. 585 (2022) 1–23.
[15] X. Guo, Z. Liu, J. Li, J. Gao, B. Hou, C. Dong, T. Baker, V eri fl: communication-efficient and fast verifiable aggregation for federated learning, IEEE Trans. Inf.
Forensics Secur. 16 (2020) 1736–1751.
[16] M. Hahsler, M. Bolaños, Clustering data streams based on shared density between micro-clusters, IEEE Trans. Knowl. Data Eng. 28 (2016) 1449–1461.
[17] W. Huang, T. Li, D. Wang, S. Du, J. Zhang, T. Huang, Fairness and accuracy in horizontal federated learning, Inf. Sci. 589 (2022) 170–185.
[18] G. Hulten, L. Spencer, P. Domingos, Mining time-changing data streams, in: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, 2001, pp. 97–106.
[19] W. Jeong, J. Yoon, E. Yang, S.J. Hwang, Federated semi-supervised learning with inter-client consistency & disjoint learning, arXiv preprint, arXiv:2006.12097,
2020.
[20] P. Kairouz, H.B. McMahan, B. Avent, A. Bellet, M. Bennis, A.N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al., Advances and open problems
in federated learning, Found. Trends Mach. Learn. 14 (2021) 1–210.
[21] I. Kholod, M. Kuprianov, E. Titkov, A. Shorov, E. Postnikov, I. Mironenko, S. Sokolov, Training normal Bayes classifier on distributed data, Proc. Comput. Sci.
150 (2019) 389–396.
[22] J.Z. Kolter, M.A. Maloof, Using additive expert ensembles to cope with concept drift, in: Proceedings of the 22nd International Conference on Machine Learning,
2005, pp. 449–456.
[23] J.Z. Kolter, M.A. Maloof, Dynamic weighted majority: an ensemble method for drifting concepts, J. Mach. Learn. Res. 8 (2007) 2755–2790.
[24] J. Konečnỳ, H.B. McMahan, F.X. Yu, P. Richtárik, A.T. Suresh, D. Bacon, Federated learning: strategies for improving communication efficiency, arXiv preprint,
arXiv:1610.05492, 2016.
[25] T. Li, A.K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, V. Smith, Federated optimization in heterogeneous networks, in: Proceedings of Machine Learning and
Systems, vol. 2, 2020, pp. 429–450.
[26] X. Li, M. Jiang, X. Zhang, M. Kamp, Q. Dou, FedBN: federated learning on non-IID features via local batch normalization, in: International Conference on
Learning Representations, 2021, https://openreview.net/forum?id=6YEQUn0QICG.

21
C.B. Mawuli, J. Kumar, E. Nanor et al. Information Sciences 643 (2023) 119235

[27] Z. Li, V. Sharma, S.P. Mohanty, Preserving data privacy via federated learning: challenges and solutions, IEEE Consum. Electron. Mag. 9 (2020) 8–16.
[28] Y. Liu, Z. Xu, C. Li, Distributed online semi-supervised support vector machine, Inf. Sci. 466 (2018) 236–257.
[29] Z. Long, L. Che, Y. Wang, M. Ye, J. Luo, J. Wu, H. Xiao, F. Ma, Fedsemi: an adaptive federated semi-supervised learning framework, 2020, arXiv e-prints,
arXiv–2012.
[30] Z. Long, L. Che, Y. Wang, M. Ye, J. Luo, J. Wu, H. Xiao, F. Ma, Fedsiam: Towards adaptive federated semi-supervised learning, 2020.
[31] X. Miao, Y. Wu, J. Wang, Y. Gao, X. Mao, J. Yin, Generative semi-supervised learning for multivariate time series imputation, in: Proceedings of the AAAI
Conference on Artificial Intelligence, 2021, pp. 8983–8991.
[32] T.K. Moon, The expectation-maximization algorithm, IEEE Signal Process. Mag. 13 (1996) 47–60.
[33] T.T. Nguyen, T.C. Phan, H.T. Pham, T.T. Nguyen, J. Jo, Q.V.H. Nguyen, Example-based explanations for streaming fraud detection on graphs, Inf. Sci. 621
(2023) 319–340.
[34] X. Nie, M. Fan, X. Huang, W. Yang, B. Zhang, X. Ma, Online semisupervised active classification for multiview polsar data, IEEE Trans. Cybern. (2020).
[35] N.C. Oza, S.J. Russell, Online bagging and boosting, in: International Workshop on Artificial Intelligence and Statistics, PMLR, 2001, pp. 229–236.
[36] B. Parker, A.M. Mustafa, L. Khan, Novel class detection and feature via a tiered ensemble approach for stream mining, in: 2012 IEEE 24th International
Conference on Tools with Artificial Intelligence, IEEE, 2012, pp. 1171–1178.
[37] B. Pishgoo, A.A. Azirani, B. Raahemi, A hybrid distributed batch-stream processing approach for anomaly detection, Inf. Sci. 543 (2021) 309–327.
[38] C.E. Rasmussen, et al., The infinite gaussian mixture model, in: NIPS, Citeseer, 1999, pp. 554–560.
[39] A. Shamir, How to share a secret, Commun. ACM 22 (1979) 612–613.
[40] D. Soemers, T. Brys, K. Driessens, M. Winands, A. Nowé, Adapting to concept drift in credit card transaction data streams using contextual bandits and decision
trees, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
[41] J. Sui, Z. Liu, L. Liu, A. Jung, X. Li, Dynamic sparse subspace clustering for evolving high-dimensional data streams, IEEE Trans. Cybern. (2020).
[42] J. Tanha, N. Samadi, Y. Abdi, N. Razzaghi-Asl, Cpssds: conformal prediction for semi-supervised classification on data streams, Inf. Sci. 584 (2022) 212–234.
[43] H. Wang, W. Fan, P.S. Yu, J. Han, Mining concept-drifting data streams using ensemble classifiers, in: Proceedings of the Ninth ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, 2003, pp. 226–235.
[44] H. Wang, M. Yurochkin, Y. Sun, D. Papailiopoulos, Y. Khazaeni, Federated learning with matched averaging, arXiv preprint, arXiv:2002.06440, 2020.
[45] K. Wei, J. Li, M. Ding, C. Ma, H.H. Yang, F. Farokhi, S. Jin, T.Q. Quek, H.V. Poor, Federated learning with differential privacy: algorithms and performance
analysis, IEEE Trans. Inf. Forensics Secur. 15 (2020) 3454–3469.
[46] J. Yoon, W. Jeong, G. Lee, E. Yang, S.J. Hwang, Federated continual learning with adaptive parameter communication, 2020.
[47] M. Yurochkin, M. Agarwal, S. Ghosh, K. Greenewald, N. Hoang, Y. Khazaeni, Bayesian nonparametric federated learning of neural networks, in: International
Conference on Machine Learning, PMLR, 2019, pp. 7252–7261.
[48] C. Zhang, Y. Xie, H. Bai, B. Yu, W. Li, Y. Gao, A survey on federated learning, Knowl.-Based Syst. 216 (2021) 106775.
[49] Z. Zhang, Y. Yang, Z. Yao, Y. Yan, J.E. Gonzalez, K. Ramchandran, M.W. Mahoney, Improving semi-supervised federated learning by reducing the gradient
diversity of models, in: 2021 IEEE International Conference on Big Data (Big Data), IEEE, 2021, pp. 1214–1225.
[50] X. Zhu, Z. Ghahramani, J.D. Lafferty, Semi-supervised learning using gaussian fields and harmonic functions, in: Proceedings of the 20th International Conference
on Machine Learning (ICML-03), 2003, pp. 912–919.

22

You might also like