Professional Documents
Culture Documents
Qian 2019
Qian 2019
Abstract—Fog platform brings the computing power from the Convolutional Neural Network (CNN), which is one sub-
remote cloud-side closer to the edge devices to reduce latency, as type of the artificial neural network and models the system
the unprecedented generation of data causes ineligible latency to by observing all the data during training at a single place.
process the data in a centralized fashion at the Cloud. In this
new setting, edge devices with distributed computing capability, Notably, the emergence of CNN [27] introduces the common
such as sensors, surveillance camera, can communicate with fog usage of the neural network as it is more efficient concerning
nodes with less latency. Furthermore, local computing (at edge computation cost comparing with (fully connected) neural
side) may improve privacy and trust. In this paper, we present network. However, to train the model, we need to access all
a new method, in which, we decompose the data processing, by the data, which may create communication bottlenecks and
dividing them between edge devices and fog nodes, intelligently.
We apply active learning on edge devices; and federated learning privacy issues [28]. Generally, to train a CNN model, users
on the fog node which significantly reduces the data samples to send all the data to a single machine or a cluster of machines
train the model as well as the communication cost. To show in a data center. However, this sharing operation with the
the effectiveness of the proposed method, we implemented and central authority in data centers costs both network usages and
evaluated its performance on a benchmark images data set. breaching of user privacy, since the user doesn’t want to share
Index Terms—Fog Computing, Edge Computing, Active Learn-
ing, Federated Learning, Bayesian Neural Network
personal, sensitive information, e.g., legally protected medical
data.
I. I NTRODUCTION FL [6] allows the centralized server training models without
moving the data to a central location. In particular, FL is
Internet of Thing (IoT) is growing in adoption to become an used in a distributed way, in which the model is built without
essential element for the evolution of connected products and direct access to training data, and indeed the data remains
related services. To enlarge IoT adoption and its applicability in its original location, which provides both data locality
in many different contexts, there is a need for the shift from and privacy. In the beginning, a server coordinates a set of
the cloud-based paradigm towards a fog computing paradigm. nodes, each with training data that cannot be accessed by
In cloud-based paradigm, the devices send all the information server directly. These nodes train a local model and share
to a centralized authority, which processes the data. However, individual models with the server. The server uses these
in Fog Computing (FC, [1]) or Edge Computing (EC, [37]) the individual models to create a federated model and sends the
data processing computation will be distributed among edge model back to the nodes. Then, another round of local training
devices, fog devices, and the cloud server. FC and EC are takes place, and the process continues. Nevertheless, this extra
interchangeable. This emergence of EC is mainly in response work on edge devices has to be minimized by selecting the
to the heavy workload at the cloud side and the significant most important data samples needed to build the local model.
latency at the user side. To reduce the delay in fog computing, In this context, we want to use Active Learning (AL) as a
the concept of fog node is introduced. The Fog Node (FN) more effective learning framework. AL chooses training data
[2] is essentially a platform placed between Cloud and Edge efficiently when labeling data becomes drastically expensive.
Devices (ED) as middleware, and it will further facilitate the Motivated by the above mentioned, the research issues
’things’ to realize their potentials [3]. This change of paradigm and possible direction, we propose a new scheme. In liter-
will help application domains, such as industrial automation, ature, there exist some papers that discuss the application of
robotics or autonomous vehicles, where real-time decision machine-learning algorithm directly on the Fog node platform
making by using machine learning approaches is crucial. [30] [31]. As we have already discussed, to efficiently use
The research leading to these results has received funding from the cloud and fog infrastructure, we need to delegate the work
European Unions Horizon 2020 research and innovation program under the among them. Hence, in this paper, we propose a new efficient
Marie Sklodowska-Curie grant agreement No. 764785, FORA-Fog Computing privacy-preserving data analytic scheme in Fog environment.
for Robotics and Industrial Automation. This work was further supported by
the Danish Innovation Foundation through the DanishCenter for Big Data We offer using federated learning at the centralized fog devices
Analytics and Innovation (DABAI). to create the initial training model. To improve the perfor-
CNNs commonly have a huge number of parameters, and it C. Active Machine Learning
will lead to huge communication costs by sending updates for Active learning (AL) is a particular case of machine learning
these many values to a server leads. Thus, a simple approach in which a learning algorithm interactively queries the user
to sharing weight updates is not feasible for larger models. to obtain the desired outputs at new data points. Typically,
Since uploads are typically much slower than downloads, it AL achieves higher performance with the same number of
is acceptable that users have to download the current model, data samples, using the same method, for example, support
222
where
223
III. P ROPOSED SCHEME
In this section, we discuss the functioning of the different
components of the proposed scheme. The primary objective of
the proposed scheme is to generate the model by an iterative
and distributed way using the fog nodes and edge devices. The fog
fog
node ......... fog
node1 node n
m
overall scheme is shown in Fig. 2, where between edge devices
and the cloud server, the fog nodes work as a middleware.
Here, a fog node is connected to the edge devices which have a Edge Edge
Edge Edge
......... device device
similar task to implement the specific application. For instance, device1 device2
m n
224
Fig. 4. Active Learning Vs Random Sample (20 Acquisitions).
Fig. 6. Learning Curve of Edge Devices for 20 acquisitions of data.
225
training process, namely, after one device completes training, [13] Tong, Simon, and Edward Chang. ”Support vector machine active
learning for image retrieval.” Proceedings of the ninth ACM international
shares the weights with the close device. conference on Multimedia. ACM, 2001.
[14] Houlsby, Neil, et al. ”Bayesian active learning for classification and
V. C ONCLUSION preference learning.” arXiv preprint arXiv:1112.5745 (2011).
[15] Freeman, Linton C. Elementary applied statistics: for students in behav-
In this paper, we for the first time discussed Active Learning ioral science. John Wiley and Sons, 1965.
in a distributed setting tailored to the so-called Fog platform [16] Gal, Yarin, Riashat Islam, and Zoubin Ghahramani. ”Deep bayesian
consisting of distributed edge devices and a centralized fog active learning with image data.” arXiv preprint arXiv:1703.02910
(2017).
node. We implemented active learning in edge devices to down [17] McMahan, H. Brendan, et al. ”Communication-efficient learning of deep
scale the necessary training set and reduce the label cost. We networks from decentralized data.” arXiv preprint arXiv:1602.05629
presented evidence that it performs similarly to centralized (2016).
[18] Konen, Jakub, Brendan McMahan, and Daniel Ramage. ”Federated
computing with a reduced communication overhead, latency optimization: Distributed optimization beyond the datacenter.” arXiv
and harvesting the potential privacy benefits. In the future, we preprint arXiv:1511.03575 (2015).
will do more experiments on different setting (large number [19] Konecn, Jakub, et al. ”Federated optimization: Distributed machine
learning for on-device intelligence.” arXiv preprint arXiv:1610.02527
of edge devices) and offering the corresponding solution. In (2016).
addition, we will study the additional acquisition functions and [20] Hoi, Steven CH, et al. ”Batch mode active learning and its application
also address privacy issues in more details. to medical image classification.” Proceedings of the 23rd international
conference on Machine learning. ACM, 2006.
[21] Liu, Ying. ”Active learning with support vector machine applied to
TABLE I gene expression data for cancer classification.” Journal of chemical
F OG N ODE P ERFORMANCE WITH / WITHOUT F EDERATED L EARNING information and computer sciences 44.6 (2004): 1936-1941.
[22] Thompson, Cynthia A., Mary Elaine Califf, and Raymond J. Mooney.
Acq 10 Acq 20 Acq 30 Acq 40 ”Active learning for natural language parsing and information extrac-
FN without FL tion.” ICML. 1999.
400 800 1200 1600 [23] Osugi, Thomas, Deng Kim, and Stephen Scott. ”Balancing exploration
(No. of train data)
FN with FL and exploitation: A new algorithm for active machine learning.” Data
100 200 300 400 Mining, Fifth IEEE International Conference on. IEEE, 2005.
(No. of train data)
Accuracy [24] Lakshminarayanan, B., Pritzel, A. and Blundell, C., 2017. Simple and
0.73 0.882 0.811 0.901 scalable predictive uncertainty estimation using deep ensembles. In
(FN without FL)
Accuracy Advances in Neural Information Processing Systems (pp. 6402-6413).
0.8 0.909 0.881 0.918 [25] Osband, I., Blundell, C., Pritzel, A. and Van Roy, B., 2016. Deep
(FN with FL (ave))
Accuracy exploration via bootstrapped DQN. In Advances in neural information
0.854 0.915 0.944 0.963 processing systems (pp. 4026-4034).
(FN with FL (opt))
[26] Gal, Y. and Ghahramani, Z., 2016, June. Dropout as a Bayesian
approximation: Representing model uncertainty in deep learning. In
international conference on machine learning (pp. 1050-1059).
[27] LeCun, Y., Haffner, P., Bottou, L. and Bengio, Y., 1999. Object recog-
R EFERENCES nition with gradient-based learning. In Shape, contour and grouping in
[1] Bonomi, Flavio, et al. ”Fog computing and its role in the internet of computer vision (pp. 319-345). Springer, Berlin, Heidelberg.
things.” Proceedings of the first edition of the MCC workshop on Mobile [28] Dwork, C., 2008, April. Differential privacy: A survey of results. In
cloud computing. ACM, 2012. International Conference on Theory and Applications of Models of
[2] Bonomi, Flavio, et al. ”Fog computing: A platform for internet of things Computation (pp. 1-19). Springer, Berlin, Heidelberg.
and analytics.” Big data and internet of things: A roadmap for smart [29] Settles, B., 2012. Active learning. Synthesis Lectures on Artificial
environments. Springer, Cham, 2014. 169-186. Intelligence and Machine Learning, 6(1), pp.1-114.
[3] Dastjerdi, Amir Vahid, and Rajkumar Buyya. ”Fog computing: Helping [30] Tang, B., Chen, Z., Hefferman, G., Wei, T., He, H. and Yang, Q., 2015,
the Internet of Things realize its potential.” Computer 49.8 (2016): 112- October. A hierarchical distributed fog computing architecture for big
116. data analysis in smart cities. In Proceedings of the ASE BigData and
[4] Settles, Burr. ”Active learning literature survey. 2010.” Computer Sci- SocialInformatics 2015 (p. 28). ACM.
ences Technical Report 1648 (2014). [31] Diro, A.A. and Chilamkurti, N., 2018. Distributed attack detection
[5] Gal, Yarin, and Zoubin Ghahramani. ”Dropout as a Bayesian approxi- scheme using deep learning approach for Internet of Things. Future
mation: Representing model uncertainty in deep learning.” international Generation Computer Systems, 82, pp.761-768.
conference on machine learning. 2016. [32] Gal, Y. and Ghahramani, Z., 2015. Bayesian convolutional neural net-
[6] Konen, Jakub, et al. ”Federated learning: Strategies for improving works with Bernoulli approximate variational inference. arXiv preprint
communication efficiency.” arXiv preprint arXiv:1610.05492 (2016). arXiv:1506.02158.
[7] Rasmussen, Carl Edward. ”Gaussian processes in machine learning.” [33] Konecn, J., McMahan, H.B., Ramage, D. and Richtrik, P., 2016.
Advanced lectures on machine learning. Springer, Berlin, Heidelberg, Federated optimization: Distributed machine learning for on-device
2004. 63-71. intelligence. arXiv preprint arXiv:1610.02527.
[8] LeCun, Yann, Corinna Cortes, and C. J. Burges. ”MNIST handwritten [34] Hong, D. and Si, L., 2012, August. Mixture model with multiple
digit database.” AT &T Labs [Online]. Available: http://yann. lecun. centralized retrieval algorithms for result merging in federated search.
com/exdb/mnist 2 (2010). In Proceedings of the 35th international ACM SIGIR conference on
[9] LeCun, Yann, et al. ”Gradient-based learning applied to document Research and development in information retrieval (pp. 821-830). ACM.
recognition.” Proceedings of the IEEE 86.11 (1998): 2278-2324. [35] Blei, D.M., Kucukelbir, A. and McAuliffe, J.D., 2017. Variational
inference: A review for statisticians. Journal of the American Statistical
[10] Shannon, Claude Elwood. ”A mathematical theory of communication.”
Association, 112(518), pp.859-877.
Bell system technical journal 27.3 (1948): 379-423.
[36] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z.,
[11] Wang, Keze, et al. ”Cost-effective active learning for deep image
Lin, Z., Desmaison, A., Antiga, L. and Lerer, A., 2017. Automatic
classification.” IEEE Transactions on Circuits and Systems for Video
differentiation in pytorch.
Technology 27.12 (2017): 2591-2600.
[37] Shi, W., Cao, J., Zhang, Q., Li, Y. and Xu, L., 2016. Edge computing:
[12] Tong, Simon, and Daphne Koller. ”Support vector machine active
Vision and challenges. IEEE Internet of Things Journal, 3(5), pp.637-
learning with applications to text classification.” Journal of machine
646.
learning research 2.Nov (2001): 45-66.
226