Professional Documents
Culture Documents
Residual Network Leveraged Vehicle Thrown Waste
Residual Network Leveraged Vehicle Thrown Waste
Residual-Network-Leveraged Vehicle-Thrown-Waste
Identification in Real-Time Traffic
Surveillance Videos
Pengjiang Qian , Senior Member, IEEE, Kai Yuan, Jian Yao, Chao Fan,
Hua Zhang, Yuan Liu , and Xianling Lu
Abstract— We attempt to intelligently identify violations of Index Terms— Throwing waste from vehicles (TWV), deep
throwing waste from vehicles (TWV) in real-time traffic sur- learning, smart city, ResNet, waste inspection, intelligent traffic.
veillance videos. In addition to polluting the environment, TWV
easily causes injury to sanitation workers responsible for cleaning
roads by passing vehicles. However, manual inspection is still I. I NTRODUCTION
the commonest way to recognize such uncivilized behavior in
videos with very high time and labor-consuming. In answer to
these challenges, we design a novel 20-layer residual network
(Nov-ResNet-20) for training the vehicle-thrown-waste identi-
N OWADAYS, throwing waste from vehicles (TWV) has
become one of the intractable challenges for the smart
traffic management in cities. In addition to polluting the
fication model (VTWIM). Then, incorporating Nov-ResNet-20, environment, TWV easily causes traffic accidents, particu-
Selective Search, and Non-Maximum Suppression (NMS), we pro- larly for sanitation workers who are responsible for cleaning
pose the deep-residual-network-leveraged vehicle-thrown-waste roads, as TWV increases their workloads and work-difficulty
identification method (DRN-VTWI). Our method first splits one
video frame into several regions matching suspected objects and thereby raises their probability to be injured by pass-
marked with location boxes via Selective Search. Then, in terms ing vehicles [1]. Governments have issued many relevant
of the VTWIM trained by Nov-ResNet-20 our method identifies laws and traffic regulations to prohibit and punish such
the regions containing TWV. Last, our method removes the uncivilized behavior. Accordingly, many traffic video surveil-
redundant location boxes for each recognized, vehicle-thrown lance systems [2], [3], including both the hardware facilities
waste and only keeps the best one. The significance of our work
is four-fold: 1) Nov-ResNet-20 has a moderate depth: 6 convolu- (e.g., cameras and transmission and storage devices) and
tional layers, 7 residual layers, and in total 20 weight layers. Due software systems (e.g., video transmission and management
to the joint contribution of the residual, batch normalization, programs), have been deployed to monitor the vehicle move
dropout, and cross-entropy loss, it is eligible to identify TWV in real time. Most of the current traffic video surveillance
using a small quantity of manually-annotated training samples. systems in use, however, only have the functions to record as
2) Selective Search diversely marks all possible, suspected objects
in video frames, whereas NMS keeps the best location box for well as save traffic videos or pictures, without the desirable,
each recognized vehicle-thrown waste, removing all redundancies. intelligent analysis and explanation on the vehicle behavior.
In this way, DRN-VTWI finds potential violations of TWV as Besides, manual inspection is still the commonest way to
many as possible and optimally annotates vehicle-thrown wastes affirm traffic violations, despite the fact that it is quite time
in frames as well. 3) Combining the power of Nov-ResNet- and labor-consuming and even inefficient [4], [5]. Therefore,
20, Selective Search, and NMS, DRN-VTWI well solves the
challenging, intelligent identification of vehicle-thrown wastes for methods capable of effectively, intelligently identifying the
real-time traffic surveillance. Experimental studies conducted on TWV behavior are of great significance in real-time traffic
real-time traffic surveillance videos demonstrate the effectiveness surveillance.
as well as superiority of our efforts. Compared with other topics of intelligent traffic, such as
vehicle license-plate recognition [6], [7], vehicle route opti-
Manuscript received February 18, 2020; revised July 9, 2020; accepted
July 30, 2020. This work was supported in part by the National Natural mization [8], [9], and traffic flow forecasting [10], [11], studies
Science Foundation of China under Grant 61772241 and Grant 61702225, on waste identification in videos are relatively few. We briefly
in part by the Natural Science Foundation of Jiangsu Province under Grant review some relevant studies as follows. Aziz et al. [12] jointly
BK20160187, and in part by the Science and Technology Demonstra-
tion Project of Social Development of Wuxi under Grant WX18IVJN002. used the Support Vector Machine (SVM) [13], [51]–[55] and
The Associate Editor for this article was H. Gao. (Corresponding author: Hidden Markov Model (HMM) [14] to design a collection
Xianling Lu.) schedule of multiple waste bins in which SVM is used to
Pengjiang Qian, Kai Yuan, Chao Fan, Hua Zhang, and Yuan Liu are
with the School of Artificial Intelligence and Computer Science, Jiangnan classify the waste level in bins and HMM to determine the
University, Wuxi 214122, China, and also with the Jiangsu Key Labora- number of days remaining before the waste is collected.
tory of Media Design and Software Technology, Jiangnan University, Wuxi Liu et al. [15] proposed an automatic decoration garbage
214122, China (e-mail: qianpjiang@jiangnan.edu.cn; 1070523578@qq.com;
fanchao@jiangnan.edu.cn; a_go@jiangnan.edu.cn; lyuan1800@sina.com). detection system based on the improved YOLOv2 network and
Jian Yao is with the Blockchain Sub-Center, Wuxi IoT Innovation Center narrowband Internet of things (NBIoT) [16]. Niu et al. [5]
Company Ltd., Wuxi, China (e-mail: 1786779000@qq.com). designed an automated river trash monitoring system based
Xianling Lu is with the School of Internet of Things Engineering, Jiangnan
University, Wuxi 214122, China (e-mail: jnluxl@jiangnan.edu.cn). on the YOLOv3 model that aims to run faster than the
Digital Object Identifier 10.1109/TITS.2020.3015530 conventional Convolution Neural Networks (CNN) [17], [18].
1524-9050 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on November 29,2020 at 07:09:09 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Rad et al. [19] presented a fully automated computer Algorithm 1 Object Location Extraction in Selective Search
vision application for littering quantification based on images Input: Image data
taken from the streets and sidewalks. They enlisted the Output: Object location boxes
GoogLeNet [18] framework to localize and classify different Obtain initial region R ={r1 ,…, rn } using the graph-based
types of wastes. In addition, Wang and Zhang [20] modified segmentation algorithm;
the Faster R-CNN [21], [22] object detection framework Initialize similarity set S = Ø;
by incorporating the residual network (ResNet) [23]–[25] to foreach neighboring region pair (rk , rl ) do
automatically detect the garbage from urban images in the Calculate similarity s(rk , rl );
application of intelligent urban management. Surveying the S = S ∪ s(rk ,rl );
small amount of existing literature, we have two observa- While S =Ø do
tions. One is that a few investigators have been studying the Get the highest similarity s(ri ,r j ) = max(S);
waste inspection in videos, but the direct research on TWV Merge corresponding regions rt = ri ∪ r j ;
identification is rarely reported. The other relates to deep Remove similarities regarding ri : S = S\s(ri ,r∗ );
learning [23], [26], [27]. That is, some well-established deep Remove similarities regarding r j : S = S\s(r∗ ,r j );
learning models, e.g., VGGNet [37], [38], YOLO [5], [15], Remove ri and r j from R: R = R\ {ri , r j };
ResNet, etc., have been used due to the fact that deep Calculate similarity set St between rt and its neighbors;
learning proves the well-accepted superiority on image and S = S ∪ St ;
video processing. Nevertheless, for the TWV identification, R = R ∪ rt ;
we realize that a novel deep network, which has an appro- Extract object location boxes from all regions in R
priate network depth as well as nice discrimination and
generalization abilities under the condition of a small num-
ber of training examples, is needed as manually-annotating 2) The greedy Selective Search diversely marks all possi-
TWV examples for training the network is fairly time and ble, suspected objects in video frames, whereas NMS keeps
labor-consuming. the best location box for each recognized vehicle-thrown
Therefore, to address the challenging identification of TWV waste object with removing all redundancies. As such, our
in real-time traffic videos, derived from VGG-16 [37], we first DRN-VTWI is able not only to inspect potential TWV
design a novel, dedicated 20-layer ResNet model referred violations as many as possible but also to annotate the
to as Nov-ResNet-20, and then incorporating the Selective vehicle-thrown wastes ideally in frames.
Search [28], [29] and Non-Maximum Suppression (NMS) 3) Combining the strength of Nov-ResNet-20, Selective
[30], [31] into Nov-ResNet-20, we propose the deep-residual- Search, and NMS, our proposed DRN-VTWI method is
network-leveraged vehicle-thrown waste identification method component in the challenging identification of vehicle-thrown
(DRN-VTWI for short) eventually. Our DRN-VTWI method wastes in real-time traffic surveillance videos.
trains the deep Nov-ResNet-20 using given traffic waste The rest of this manuscript is organized as follows. Related
samples, such as bottle, can, paper, and fruit peel, and work, such as Selective Search, NMS, and CNN & ResNet,
thus achieves the vehicle-thrown-waste identification model are briefly introduced in Section 2. The proposed Nov-ResNet-
(VTWIM). Then, our method inspects the real-time traffic 20 model as well as DRN-VTWI method are introduced in
video stream in frames. Specially, we first split one video detail in Section 3. Our experimental studies as well as result
frame into multiple small images corresponding to the location analyses are presented in Section 4. The conclusion is given
boxes of all suspected objects obtained using the Selective in Section 5.
Search algorithm. Second, we input all of these small images
II. R ELATED W ORK
into the obtained VTWIM and acquire those recognized as
TWV. Third, regarding each identified thrown waste in the A. Selective Search
video frame, by means of the NMS algorithm we merely Uijlings et al. [28], [29] proposed the Selective Search
keep one location box that surrounds it and has the highest algorithm to address the problem of generating possible object
confidence value, i.e., the highest probability of belonging to locations for use in image object recognition. Selective Search
TWV predicted using VTWIM. In this way, our DRN-VTWI combines the strength of both the graph segmentation and
method is able to identify the TWV scenes in the whole video exhaustive search indeed. Namely, it utilizes the graph-based
stream. segmentation algorithm [32] to create initial regions and then
In summary, in this article our contributions lie primarily in uses the hierarchical clustering to iteratively group regions
the following three points: together. Specifically, first the similarities between all neighbor
1) Nov-ResNet-20 is composed of 6 convolutional layers regions are calculated. Then the two most similar regions are
as well as 7 residual layers, and in total 20 weight layers. merged, and new similarities are calculated between the newly-
With such an appropriate depth, and benefiting from the merged region and its neighbors. Operations of merging the
organic incorporation of ResNet, batch normalization [39], most similar regions are repeated until the whole image
dropout [43], and cross-entropy loss measurement [42], becomes a single region. The overall procedure is detailed
the dedicated Nov-ResNet-20 is qualified to preferably identify in Algorithm 1.
TWV using a small quantity of manually-annotated training Selective Search attempts to capture all possible object
samples. locations. To this end, instead of a single pathway to generate
Authorized licensed use limited to: Carleton University. Downloaded on November 29,2020 at 07:09:09 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Algorithm 2 NMS
Input: Candidate object location boxes, threshold ε
Output: The best location box
Suppose there are N boxes, each of which is measured and
assigned a score Si (1≤ i ≤ N).
Step 1: Construct the set H for candidate boxes processed,
Fig. 1. Structure of VGG-19.
and initialize it to contain all N boxes; Build the
set M to store the optimal box and initialize it as
an empty set;
Step 2: Sort all boxes in H , select the box m having the
highest score, and move it from H to M;
Step 3: Calculate the value of Intersection over Union
(IoU) [33] between box m and any box in H . If the
value is higher than the threshold ε, the box is Fig. 2. Structure of VGG-16.
considered to overlap with box m, and this box is
removed from set H ;
Step 4: Go back to step 1 and iterate until set H is empty.
Finally, the boxes in set M are what we want.
Authorized licensed use limited to: Carleton University. Downloaded on November 29,2020 at 07:09:09 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE I TABLE II
S TRUCTURE OF R ES N ET-50 S TRUCTURE C OMPARISONS AMONG VGG-16,
VGG-19, AND N OV-R ES N ET-20
Authorized licensed use limited to: Carleton University. Downloaded on November 29,2020 at 07:09:09 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2) Module II: Run Selective Search on Video Frames and IV. T HE E XPERIMENTAL S TUDIES
Obtain Identification-Needed Image Regions That Contain
A. Setup
Suspected Objects: Differing from Module I, both Modules
II and III are online. In Module II, first we extract the Due to the fact that, currently, the open repository of traffic
video frames, according to the sampling sequence (denoted videos containing TWV is unavailable, we had to get the traffic
as vfss), from the surveillance video. Second, on each frame, waste samples as well as testing videos ready by ourselves for
we run the Selective Search algorithm to locate all suspected the experimental studies. To this end, two types of training
objects and accordingly split the video frame into several small samples were prepared by manually photographing, capturing
identification-needed images in terms of the obtained location from traffic videos, downloading from Internet (e.g., the VOC
boxes. Third, we resize these identification-needed images into data set [44]), etc. The one includes four categories of common
the same size (denoted as ini_size). Last, we input them into traffic waste: bottle, can, paper, and fruit peel. The other is
the VTWIM that was already generated by Nov-ResNet-20 in associated with some non-waste images commonly appearing
Module I. in traffic videos, such as walker, car, and scenery. We have
3) Module III: Identify the Images Containing TWV via gathered 200 samples for each category, both waste-belonged
VTWIM and Remove all Redundant Boxes Using NMS: In and non-waste-belonged, thus we have 1000 labeled samples
Module III, first we know which identification-needed images of traffic objects in total. All of these traffic images were
contain the waste objects via the VTWIM. Each recognized pre-processed into the same size: 224 × 224, and were labeled
image would have a confidence value, i.e., the probability of as from 1 to 5, respectively, as shown in Table III. In addition,
the waste object contained belonging to a certain waste type. we have downloaded four traffic videos from Internet as the
Then all waste objects in this frame together with their location testing videos, describing the scenes of throwing the bottle,
boxes are filtered through the NMS algorithm. Consequently, can, fruit peel, and paper from vehicles, respectively, for
for each waste object, only one location box having the highest verifying the actual identification capability of TWV. As an
confidence value can be kept and all of the other redundant example, the video of throwing a plastic bottle from a car is
are removed. shown in Fig. 7.
As such, repeating Modules II & III, our proposed DRN- In addition to our designed Nov-ResNet-20, four well-
VTWI Method is capable of inspecting the TWV violations established classification techniques, including two CNNs (i.e.,
in the surveillance video and all recognized vehicle-thrown VGG-16 and VGG-19), one ResNet (i.e., NesNet-50), and one
wastes would be optimally marked with surrounding location state-of-the-art non-deep classification method (i.e., Extreme
boxes in the video stream. Learning Machine (ELM) [45]–[49]), were enlisted to train
Authorized licensed use limited to: Carleton University. Downloaded on November 29,2020 at 07:09:09 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE III
T RAINING S AMPLE C ATEGORIES I NVOLVED IN OUR E XPERIMENTS
the VTWIM (see Fig. 6, Module I), respectively, for the TABLE IV
performance comparisons. T IME C ONSUMPTION OF D IFFERENT M ODULES
OF DRN-VTWI (O N AVERAGE )
The core system parameters involved in the proposed
DRN-VTWI method include: Parameters scale, sigma, and
min_size used in Selective Search; Parameters vfss for sam-
pling videos and ini_size for resizing training images in Mod-
ule II; Parameter IoU in NMS; and Parameters dropout_ratio,
batch_size, and training_step used for training the Nov-
ResNet-20 model. The trial intervals or recommended values
of these parameters are additionally listed in Table VI. Also,
the primary parameters used in the other four methods are
given therein.
Our experimental studies were carried out on a workstation 10 times with different, randomly-selected training samples.
with an Intel i7-6850K 3.60GHz CPU, 128GB of RAM, The average prediction accuracies of these classification tech-
NVIDIA TITAN XP (12 GB) GPU, Ubuntu16.04 (64 bit), niques are listed in Table V.
Python 2.7, and Tensorflow 1.12.0 (GPU). We have also recorded the running time of the proposed
DRN-VTWI method in terms of its three modules, as listed
B. Experiments and Analyses in Table IV. Specifically, on average we spent 1284 seconds in
First, we would like to validate the TWV identification training Nov-ResNet-20 with the suggested system parameters,
performance of our designed Nov-ResNet-20 model. For this 42 seconds in annotating the location boxes on every video
purposed, we randomly chose 800 labeled traffic samples from frame, and 26 seconds in identifying TWV and optimizing
the total 1000 ones to constitute the training set, and the train- the location annotation via NMS.
ing set was fed to ELM, VGG-16, VGG-19, ResNet-50, and Table V reveals that, for the TWV detection, 800 training
Nov-ResNet-20 to train their VTWIM models, respectively. samples are generally insufficient for ELM, a traditional
The leftover 200 samples was used to test their detection classification method featuring low computation burden, to fig-
accuracies regarding TWV. We have repeated such procedure ure out satisfactory TWV classifiers. In this context, deep
Authorized licensed use limited to: Carleton University. Downloaded on November 29,2020 at 07:09:09 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: Carleton University. Downloaded on November 29,2020 at 07:09:09 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 10. Illustration of complete TWV identification via DRN-VTWI on a surveillance video.
the real-time TWV detection of DRN-VTWI. Generally, Mod- which is dependent on the specific image size of video frames.
ule II spends 42 seconds in marking all of the suspected This actually effects the practicability of our DRN-VTWI to
location boxes by means of Selective Search and Module III a certain extent. Therefore, to reduce the time consumption
costs 26 seconds in recognizing the TWV regions using of Selective Search or to propose a more efficient alternative
the obtained VTWIM as well as optimally annotating their method is our next study afoot.
location boxes via NMS. Such time burden of Models II&III
is acceptable if from the perspective of academic study.
R EFERENCES
These results demonstrate that our proposed Nov-ResNet-
20 deep learning model as well as the overall framework of [1] V. F. Carvalho, M. D. Silva, L. M. S. Silva, C. J. Borges, L. A. Silva,
the DRN-VTWI method are feasible for the TWV detection and M. L. C. C. Robazzi, “Occupational risks and work accidents:
Perceptions of garbage collectors,” J. Nursing UFPE Online, vol. 10,
in real-time traffic surveillance videos. no. 4, pp. 1185–1193, 2016.
[2] C. Kim and J.-N. Hwang, “Object-based video abstraction for video
V. C ONCLUSION surveillance systems,” IEEE Trans. Circuits Syst. Video Technol., vol. 12,
no. 12, pp. 1128–1138, Dec. 2002.
For the purpose of effectively detecting the uncivilized [3] Y. Meng and H. Wu, “Highway visibility detection method based on
behavior of TWV for intelligent traffic management, we first surveillance video,” in Proc. IEEE 4th Int. Conf. Image, Vis. Comput.
(ICIVC), Jul. 2019, pp. 197–202.
propose the dedicated deep network — Nov-ResNet-20. [4] G. P. Arya, D. P. Chauhan, and V. Garg, “Design & implementation of
Then, combining Nov-ResNet-20, Selective Search, and NMS, traffic violation monitoring system,” Int. J. Comput. Sci. Inf. Technol.,
we put forward the desirable DRN-VTWI method for identi- vol. 6, no. 3, pp. 2384–2386, 2015.
[5] G. Niu, J. Li, S. Guo, M.-O. Pun, L. Hou, and L. Yang, “SuperDock:
fying TWV in real-time traffic videos. The framework of our A deep learning-based automated floating trash monitoring system,”
DRN-VTWI method includes three modules. Module I gener- in Proc. IEEE Int. Conf. Robot. Biomimetics (ROBIO), Dec. 2019,
ates the waste identification model, VTWIM, via Nov-ResNet- pp. 1035–1040.
[6] N. Wang, X. Zhu, and J. Zhang, “License plate segmentation and
20. Module II runs the Selective Search algorithm on one video recognition of chinese vehicle based on BPNN,” in Proc. 12th Int. Conf.
frame and obtains several small, identification-needed image Comput. Intell. Secur. (CIS), Dec. 2016, pp. 403–406.
regions that contain all suspected objects both waste-contained [7] A. H. Ashtari, M. J. Nordin, and M. Fathy, “An iranian license plate
and non-waste-contained. Module III identifies all of the recognition system based on color features,” IEEE Trans. Intell. Transp.
Syst., vol. 15, no. 4, pp. 1690–1705, Aug. 2014.
waste-contained regions via VTWIM and eventually in the [8] D. Kosmanos et al., “Route optimization of electric vehicles based on
video frame it helps to keep the best location box for each dynamic wireless charging,” IEEE Access, vol. 6, pp. 42551–42565,
recognized vehicle-thrown waste, removing all redundancies 2018.
[9] J. J. Q. Yu, W. Yu, and J. Gu, “Online vehicle routing with neural
using NMS. Our experimental studies verified the effectiveness combinatorial optimization and deep reinforcement learning,” IEEE
as well as superiority of our proposed method. Trans. Intell. Transp. Syst., vol. 20, no. 10, pp. 3806–3817, Oct. 2019.
Last, we would like to mention the limitation of our pro- [10] H.-F. Yang, T. S. Dillon, and Y.-P.-P. Chen, “Optimized structure of
the traffic flow forecasting model with a deep learning approach,”
posed DRN-VTWI method. That is, the embedded Selective IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 10, pp. 2371–2381,
Search is a little bit time-consuming, the running time of Oct. 2017.
Authorized licensed use limited to: Carleton University. Downloaded on November 29,2020 at 07:09:09 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[11] J. S. Zhu, “An ensemble learning short-term traffic flow forecasting with [38] T. Sun, Y. Wang, J. Yang, and X. Hu, “Convolution neural networks with
transient traffic regimes,” Appl. Mech. Mater., vols. 97–98, pp. 849–853, two pathways for image style recognition,” IEEE Trans. Image Process.,
Sep. 2011. vol. 26, no. 9, pp. 4102–4113, Sep. 2017.
[12] F. Aziz et al., “Waste level detection and HMM based collection [39] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
scheduling of multiple bins,” PLoS ONE, vol. 13, no. 8, Aug. 2018, network training by reducing internal covariate shift,” in Proc. 32nd Int.
Art. no. e0202092. Conf. Int. Conf. Mach. Learn., Ithaca, NY, USA, Jul. 2015, pp. 448–456.
[13] P. Qian et al., “SSC-EKE: Semi-supervised classification with extensive [40] B. Xu, N. Wang, T. Chen, and M. Li, “Empirical evaluation of rec-
knowledge exploitation,” Inf. Sci., vol. 422, pp. 51–76, Jan. 2018. tified activations in convolutional network,” 2015, arXiv:1505.00853.
[14] D. Jurafsky, J. H. Martin, Speech and Language Processing: An Intro- [Online]. Available: http://arxiv.org/abs/1505.00853
duction to Natural Language Processing, Computational Linguistics, [41] D. Yu, H. Wang, P. Chen, and Z. Wei, “Mixed pooling for convolutional
and Speech Recognition, 2nd ed. Upper Saddle River, NJ, USA: neural networks,” in Proc. Int. Conf. Rough Sets Knowl. Technol.,
Prentice-Hall, 2009, pp. 123–249. Shanghai, China, Oct. 2014, pp. 364–375.
[15] Y. Liu, Z. Ge, G. Lv, and S. Wang, “Research on automatic garbage [42] P. Golik, P. Doetsch, and H. Ney, “Cross-entropy vs. squared error
detection system based on deep learning and narrowband Internet of training: A theoretical and experimental comparison,” in Proc. 14th
Things,” J. Phys., Conf. Ser., vol. 1069, Aug. 2018, Art. no. 012032. Annu. Conf. Int. Speech Commun. Assoc., Lyon, France, Aug. 2013,
[16] S. Popli, R. K. Jha, and S. Jain, “A survey on energy efficient narrowband pp. 1756–1760.
Internet of Things (NBIoT): Architecture, application and challenges,” [43] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
IEEE Access, vol. 7, pp. 16739–16776, 2019. R. Salakhutdinov, “Dropout: A simple way to prevent neural networks
[17] K. Alex, I. Sutskever, and G. E. Hinton, “ImageNet classification with from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958,
deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. 2014.
Syst., vol. 60, no. 6, Dec. 2012, pp. 84–90. [44] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and
[18] C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE A. Zisserman, “The Pascal visual object classes (VOC) challenge,” Int.
Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 1–9. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, Jun. 2010.
[19] M. Saeed Rad et al., “A computer vision system to localize and classify [45] G. Huang, G.-B. Huang, S. Song, and K. You, “Trends in extreme learn-
wastes on the streets,” 2017, arXiv:1710.11374. [Online]. Available: ing machines: A review,” Neural Netw., vol. 61, pp. 32–48, Jan. 2015.
http://arxiv.org/abs/1710.11374 [46] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning
[20] Y. Wang and X. Zhang, “Autonomous garbage detection for intelligent machine: Theory and applications,” Neurocomputing, vol. 70,
urban management,” in Proc. MATEC Web Conf., vol. 232, 2018, nos. 1–3, pp. 489–501, Dec. 2006.
Art. no. 01056. [47] J. Tang, C. Deng, and G.-B. Huang, “Extreme learning machine for
[21] R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis. multilayer perceptron,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27,
(ICCV), Dec. 2015, pp. 1440–1448. no. 4, pp. 809–821, Apr. 2016.
[22] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards [48] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine:
real-time object detection with region proposal networks,” IEEE Trans. A new learning scheme of feedforward neural networks,” in Proc. IEEE
Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017. Int. Joint Conf. Neural Netw., Jul. 2004, pp. 985–990.
[23] K. Xu et al., “Multichannel residual conditional GAN-leveraged abdom- [49] G. Huang, S. Song, J. N. D. Gupta, and C. Wu, “Semi-supervised and
inal pseudo-CT generation via dixon MR images,” IEEE Access, vol. 7, unsupervised extreme learning machines,” IEEE Trans. Cybern., vol. 44,
pp. 163823–163830, 2019. no. 12, pp. 2405–2417, Dec. 2014.
[24] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for [50] T. Zia and S. Razzaq, “Residual recurrent highway networks for learning
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. deep sequence prediction models,” J. Grid Comput., vol. 18, no. 1,
(CVPR), Jun. 2016, pp. 770–778. pp. 169–176, Mar. 2020, doi: 10.1007/s10723-018-9444-4.
[51] V. Christianini and J. Shawe-Taylor, An Introduction to Support Vector
[25] E. Rezende, G. Ruppert, T. Carvalho, F. Ramos, and P. de Geus,
Machines. Cambridge, U.K.: Cambridge Univ. Press, 2002.
“Malicious software classification using transfer learning of ResNet-50
[52] K. I. Kim, K. Jung, S. H. Park, and H. J. Kim, “Support vector machines
deep neural network,” in Proc. 16th IEEE Int. Conf. Mach. Learn. Appl.
for texture classification,” IEEE Trans. Pattern Anal. Mach. Intell.,
(ICMLA), Dec. 2017, pp. 1011–1014.
vol. 24, no. 11, pp. 1542–1550, Nov. 2002.
[26] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
[53] C.-F. Lin and S.-D. Wang, “Fuzzy support vector machines,” IEEE
pp. 436–444, May 2015.
Trans. Neural Netw., vol. 13, no. 2, pp. 464–471, Mar. 2002.
[27] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, Cambridge, MA, [54] H. Qu, Y. Oussar, G. Dreyfus, and W. Xu, “Regularized recurrent least
USA: MIT 2016, pp. 327–399. squares support vector machines,” in Proc. Int. Joint Conf. Bioinf., Syst.
[28] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and Biol. Intell. Comput., Aug. 2009, pp. 508–511.
A. W. M. Smeulders, “Selective search for object recognition,” Int. J. [55] X. Wang, F.-L. Chung, and S. Wang, “On minimum class locality
Comput. Vis., vol. 104, no. 2, pp. 154–171, Apr. 2013. preserving variance support vector machine,” Pattern Recognit., vol. 43,
[29] K. E. A. van de Sande, J. R. R. Uijlings, T. Gevers, and no. 8, pp. 2753–2762, Aug. 2010.
A. W. M. Smeulders, “Segmentation as selective search for
object recognition,” in Proc. Int. Conf. Comput. Vis., Nov. 2011,
pp. 1879–1886.
[30] S. Qiu, G. Wen, Z. Deng, J. Liu, and Y. Fan, “Accurate non-maximum
suppression for object detection in high-resolution remote sensing
images,” Remote Sens. Lett., vol. 9, no. 3, pp. 237–246, Mar. 2018.
[31] A. Neubeck and L. Van Gool, “Efficient non-maximum suppression,” in
Proc. 18th Int. Conf. Pattern Recognit. (ICPR), Aug. 2006, pp. 850–855.
[32] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-based
image segmentation,” Int. J. Comput. Vis., vol. 59, no. 2, pp. 167–181, Pengjiang Qian (Senior Member, IEEE) received
Sep. 2004. the Ph.D. degree from Jiangnan University, Wuxi,
[33] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierar- Jiangsu, China, in March 2011. He is currently a
chies for accurate object detection and semantic segmentation,” in Proc. Full Professor with the School of Artificial Intel-
IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 580–587. ligence and Computer Science, Jiangnan Univer-
[34] J. Gu et al., “Recent advances in convolutional neural networks,” 2015, sity. He has authored or coauthored more than
arXiv:1512.07108. [Online]. Available: http://arxiv.org/abs/1512.07108 80 papers published in international/national jour-
[35] N. Passalis and A. Tefas, “Training lightweight deep convolutional nals and conferences, e.g., the IEEE T RANSAC -
neural networks using Bag-of-Features pooling,” IEEE Trans. Neural TIONS ON N EURAL N ETWORKS AND L EARNING
Netw. Learn. Syst., vol. 30, no. 6, pp. 1705–1715, Jun. 2019. S YSTEMS (TNNLS), the IEEE T RANSACTIONS ON
[36] Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks S YSTEMS , M AN , AND C YBERNETICS —PART B:
and applications in vision,” in Proc. IEEE Int. Symp. Circuits Syst., C YBERNETICS (TSMC-B), the IEEE T RANSACTIONS ON C YBERNETICS ,
Jun. 2010, pp. 253–256. the IEEE T RANSACTIONS ON F UZZY S YSTEMS (TFS), PR, InS, and KBS.
[37] Y. Tang and X. Wu, “Scene text detection and segmentation based on His research interests include data mining, pattern recognition, bioinformatics
cascaded convolution neural networks,” IEEE Trans. Image Process., and their applications, such as analysis and processing for medical imaging,
vol. 26, no. 3, pp. 1509–1520, Mar. 2017. intelligent traffic dispatching, and advanced business intelligence in logistics.
Authorized licensed use limited to: Carleton University. Downloaded on November 29,2020 at 07:09:09 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Kai Yuan is currently pursuing the M.S. degree with Hua Zhang received the M.S. degree of public
the School of Artificial Intelligence and Computer administration from Tongji University in July 2008.
Science, Jiangnan University, Wuxi, Jiangsu, China. His research interests include informatization con-
His research interest includes intelligent algorithms struction, computer science, and artificial intelli-
and their applications. gence. He is currently the Supreme Leader with
the School of Artificial Intelligence and Computer
Science, Jiangnan University. He also has the honor
to serve as the Deputy Secretary-General of the
Educational Informatization Branch of China Asso-
ciation of Higher Education.
Jian Yao is a Senior Expert and the Deputy Director Yuan Liu received the M.S. degree from the Wuxi
of the Blockchain Sub-Center, Wuxi IoT Innova- University of Light Industry in 1998. He is cur-
tion Center Company Ltd. He once worked as rently the Dean as well as a Full Professor of the
the Research and Development Expert with Tencent School of Artificial Intelligence and Computer Sci-
Cloud and China Mobile, and has served as the ence, Jiangnan University, Wuxi, Jiangsu, China. His
Technical Director of TIPS Intelligent Resident Sys- main researches focus on the software development
tem in Dongfang Credit Union, a NASDAQ-listed of network information systems, network security,
company. He is currently the Special Consultant of and digital media applications. His current research
Wuxi Telecom and Wuxi Mobile. He has participated interests include network traffic measurement, social
in the design of the overall plan of Jiangsu provin- network, and digital media. He has published more
cial government external network, the trusted data than 100 academic articles in the authoritative and
model of Block Chain, and the trusted data exchange scheme of Wuxi new core journals. He is also a member of the 863 Expert panel in the information
district. He has the national certifications of CDCS data center expert, senior security technology domain of the Ministry of Science and Technology,
information analyst, communication engineer, and senior network planners. a Senior Member of the China Computer Federation (CCF), and a member
of the CyberSecurity Association of China (CSAC).
Chao Fan received the Ph.D. degree from the Xianling Lu received the B.S. and M.S. degrees in
University of Tokyo in 2018. He is currently a computer science and applications from the Nanjing
Lecturer with the School of Artificial Intelligence University of Aeronautics and Astronautics (NUAA)
and Computer Science, Jiangnan University. His in 1999 and 2004, respectively, and the Ph.D. degree
research interests include artificial intelligence and from the Nanjing University of Science and Tech-
complex networks science. nology University (NUST) in 2009. He is currently
a Professor with the Department of Computer Sci-
ence and Technology, Jiangnan University, China.
His research interests focus on deep learning, data
mining, and wireless sensor network systems.
Authorized licensed use limited to: Carleton University. Downloaded on November 29,2020 at 07:09:09 UTC from IEEE Xplore. Restrictions apply.